Glossary

Anonymization

Anonymization means changing personal data in such a way that the link to the data subject is permanently removed or can only be restored with disproportionate effort. This distinguishes anonymization from pseudonymization, where it remains possible to re-identify individuals.

Further information on our webpages: Ethics and data protection, Publishing data and code

Based on: Swiss Personalized Health Network, Glossary | Cessda Data Management Expert Guide

Artificial Intelligence (AI)

Artificial intelligence describes the ability of a machine to imitate human abilities such as thinking and creativity. An example of an AI model are large language models (LLMs), which respond to human language and can generate text themselves, such as ChatGPT (OpenAI), Llama (Meta), or Gemini (Google). In the context of research data, it should be considered that processing confidential research data with AI is only possible under certain circumstances for data protection reasons (e.g., consent in the case of personal data).

Further information: Information from the European Parliament | Redhat: What are Large Language Models (LLMs)? | UniBE Guidelines - Research on and with AI (2024)

Backup

A backup is a security copy of data that is created at a specific point in time so that it can be restored in the event of loss or damage. The aim is to ensure long-term data protection (of specific versions) of the data. The 3-2-1 rule is recommended, which states that 3 copies of the data should be made on 2 different data carriers, with 1 copy stored at an external location. The term is sometimes confused with synchronization.

Further information: forschungsdaten.info | Synology Blog (German)

CARE Principles

The CARE Principles (CARE stands for Collective Benefit, Authority to Control, Responsibility, Ethics) were introduced by the Global Indigenous Data Alliance (GIDA). They describe good practices for handling data from Indigenous communities with respect to Indigenous rights and interests and complement the FAIR Principles.

Further information: CARE Principles (GIDA) | CARE Principles (ARDC)

Data Steward

Data stewards are experts in research data management. They support researchers in the sustainable handling of research data. In addition, data stewards act as a link between researchers and (research) software engineers, IT and other infrastructures. Other tasks include counselling, training and raising awareness of good practices in research data management.

Based on: https://forschungsdaten.info/praxis-kompakt/glossar/

Data Transfer and Use Agreement (DTUA)

Data Transfer and Use Agreements (DTUA, DUA) are contracts that regulate the exchange of data between two parties. A DTUA governs the disclosure of a specific data set, authorised uses and all data protection and security requirements relating to the receipt and processing of the data. For these purposes, DTUAs assign appropriate responsibilities to the recipients of the data.

Based on: https://www.purdue.edu/business/sps/contractmgmt/DataTransferUseAgreement.html

Data management plan

A data management plan (DMP) is a structured document in which the handling of research data in a project is systematically described. It should contain information on how and with which tools (e.g. hardware and software) data is collected, processed, documented, stored, backed up, maintained, archived and, if necessary, published. The DMP also documents the necessary resources and responsibilities. Ideally, a DMP is drafted during the planning phase of a research project, but should be regularly updated and supplemented as the project progresses. The DMP is therefore an instrument for work organisation and project planning, but can also help third parties to interpret and reuse the relevant research data.

For specific help with writing a DMP, see our website.

Based on: Glossary, Leibniz University Hannover (uni-hannover.de)

Data provenance

Data provenance documents the provenance or origin of research data, and the processes, methods, tools and algorithms used to produce it. Information on the provenance of research data is crucial to ensure transparency/reproducibility of research and thus strengthen its credibility and trust in it. The relevant information can be recorded in readme files or metadata. Provenance information is central to the implementation of the FAIR Data principles.

Based on: eResearch Alliance: Data Provenance

Informed Consent

A declaration of consent (or informed consent) includes informing participants about what is planned with their data as part of a research project, the purposes for which the data is to be collected and published, and the consent of the participants. Informed consent forms the basis for participation in scientific studies and any subsequent use of the data. It is therefore the basis of research that implements legal provisions and ethical principles.

For further information and support, see our website on ethics and data protection.

Documentation

Documentation means descriptive information about the creation process, structure, and content of research data. In research data management, it aims to make data understandable, verifiable, and reusable—both for the people who collect the data and for others outside the project where the data was created. Extensive data documentation is fundamental to FAIR Data. An easy way to document data is to use a ReadMe file. This is a separate text file containing concise and structured information about the data. ReadMe template on our website.

Further information: forschungadaten.info, Datendokumentation | Glossar (German)

Electronic lab notebooks

Electronic lab notebooks (ELN) and laboratory inventory management systems (LIMS) are digital tools that facilitate laboratory work, among other things. ELNs are used to store and record unstructured data, e.g. to organise protocols, notes and data from experiments. LIMS, on the other hand, are intended for structured and repetitive data that follow specific patterns, e.g. when tracking samples from precisely defined, repeated and routine tests.

Based on: https://www.scinote.net/blog/eln-vs-lims-how-to-choose/

Encryption

Encryption is a security measure that converts readable data into unreadable text using an encryption key or password. It ensures data can only be restored to its original form with the correct key, offering additional protection if password security is compromised. Free and open-source Tools for encryption that are frequently recommended include 7zip (7-zip.org, pre-installed on Windows devices), Veracrypt (veracrypt.io), and Cryptomator (cryptomator.org).

Further information: Wikipedia | Cessda Data Management Expert Guide | Electronic Frontier Foundation

FAIR Data

The term FAIR (Findable, Accessible, Interoperable and Reusable) Data was first coined in 2016 by the FORCE11 community for sustainable research data management. The main aim of the FAIR principles is to optimise the preparation of research data, which should therefore be findable, accessible, interoperable and reusable for both humans and machines.

Based on: FDM Glossary, Freie Universität Berlin (fu-berlin.de); Wilkinson, Mark, et al. 2016. ‘The FAIR Guiding Principles for Scientific Data Management and Stewardship’. Scientific Data 3 (1): 160018. https://doi.org/10.1038/sdata.2016.18.

Interoperability

Interoperability describes the ability of systems, applications, or data to communicate with each other and exchange information seamlessly. In the context of research data management, this means that data is provided in standardized, open formats and with clear metadata so that it can be used by different platforms and tools. The aim of interoperability is to facilitate the exchange, reuse, and integration of research data across disciplinary and institutional boundaries. Interoperability is a key aspect of the FAIR data principles.

Further information: forschungsdaten.info (German)

Intellectual Property Right in Data

Research data are protected by copyright if they represent an intellectual creation and have an individual character. Raw measurements and facts do not usually fall within this scope of protection, but their processed form may be protected if it has a creative and individual character. This can be the case, for example, with enriched, visualized, or otherwise processed data. Copyright law contains certain provisions that allow the use of protected work for specific purposes, such as quotations and use in teaching at schools or universities. Copyright law forms the basis for the granting of licenses, e.g., in accordance with the Creative Commons model.

Further information: Copyright Act (CopA) | ETH Research Data Management Wiki

Research data

Research data refers to all data that is generated or used in the course of scientific work. It forms the basis of current and potential future scientific findings and is generally regarded in the scientific community as necessary for documenting and validating research results.

A distinction can be made between primary data (data collected specifically to answer a research question) and secondary data (data collected in a different context and used in research).

Metadata and the documentation of data collection and processing within a research project are essential for the (re-)usability of research data (FAIR Data). If research data is published under open licences, it is considered Open Research Data.

Research data repository

A research data repository is an online platform for the publication of research data. In line with the principles of FAIR Data and Open Research Data, metadata and documentation (e.g. ReadMe file, codebook, protocol) should be entered alongside the research data in order to make the data easier to find, understand and reuse. Licences can be issued to regulate the subsequent use of the data. Access to sensitive data (e.g. personal data) can be restricted and regulated via a Data Transfer and Use Agreement (DTUA).

At the University of Bern, BORIS Portal is available as a research data repository. For more specific information on publishing data, see our website

Long-term archiving

Long-term archiving means securing data and its usability across several generations of hardware, software and file formats.

Based on: University Library Bern, BerDA

Licences for data

A licence is a contractually agreed right of use. The rights holder thus authorises their contractual partner to use a work in various ways (e.g. to copy, save or make it digitally accessible). Standardised Creative Commons licences or instruments are generally recommended in the area of research data for which copyright claims exist, in particular CC BY and CC0.

Based on: https://forschungsdaten.info/praxis-kompakt/glossar / Creative Commons

Metadata

Metadata is a highly structured, standardised description of objects (including data). It provides compressed information on content, structure, technical properties, usage rights and other properties. Standardised metadata makes information findable and usable for machines (e.g. algorithms, search engines). It is therefore central to the implementation of the FAIR Data principles.

Source: https://en.wikipedia.org/wiki/Metadata

Open (Research) Data

‘Open data’ in the broad sense refers to all openly accessible and reusable data sets. In a narrower sense, the term is often used synonymously with ‘open government data’ (open administrative data) and in contrast to ‘open research data’ (open research data). Data is open if it is made accessible with as few legal and technical restrictions as possible. A highly restrictive licence or access barriers (e.g. payment or registration barriers) can lead to research results not being traceable and the subsequent use of the data being made difficult or impossible.

Based on: https://opendatahandbook.org/guide/de/what-is-open-data/

Persistent identifiers (DOI etc.)

Persistent identifiers (PID) are permanent identifiers that are assigned to a digital object. In contrast to other identifiers such as URLs, PIDs always refer to the object itself. In this way, the identifier does not change, even if the location of the object (usually a website) changes. This ensures permanent traceability. Examples of PIDs are Digital Object Identifiers (DOI), Archival Resource Keys (ARKs), Handle.

Based on: https://forschungsdaten.info, CODATA Research Data Management Terminology

Personal data

According to the cantonal Data Protection Act (cDSG), personal data is any information relating to an identified or identifiable natural or legal person. Personal data can be directly identifying information (e.g. name, address, IP address), but also information that can only identify a person in combination with other information (e.g. profession, place of residence).  

Based on: Art. 2/3 Data Protection Act of the Canton of Bern

Personal data, sensitive

According to the data protection law of the canton of Bern, particularly sensitive personal data includes data on religious, ideological, political, or trade union views or activities, data on ethnic origin, health and privacy, genetic data, biometric data, data on social assistance or child and adult protection measures, and data on administrative and criminal proceedings or sanctions.

In Switzerland, different terms are used for this type of data from canton to canton. For example, the term used in the canton of Basel-Stadt is “special personal data”.

Based on: Art. 3 Data Protection Law of the Canton of Bern / §3 Data Protection Law of the Canton of Basel-Stadt

Pre-registration

Preregistration means publishing the plan for a research project before or at the start of the project. In specialised fields such as psychology, the procedure is used to strengthen the method-led approach and increase the quality and transparency of research. The aim is to avoid dubious scientific practice (such as the subsequent adjustment of the research question to the results obtained).

Based on: https://help.osf.io/article/145-preregistration

Pseudonymization

Pseudonymization means removing identifying information about individuals or replacing it with a pseudonym, while retaining the key used for re-identification. For example, a name is replaced with a numerical code, and the document that maps the numerical code to the original names (= key) is stored separately. Pseudonymized data is still considered personal data. In anonymization, on the other hand, all identifying information is irrevocably removed, making it impossible to re-identify individuals.

Based on: Swiss Personalized Health Network, Glossary | Finnish Social Science Data Archive

Reproducibility

Research results are reproducible if identical results are produced when identical analytical procedures are applied to the same data. This requires that the methods and procedures used are documented correctly and precisely and that all steps of the scientific work are documented. Reproducible results enable transparency and traceability. In contrast, replicability means that the results of the replicated study can be confirmed with new data and the same or other methods. However, the definitions of the terms reproducibility and replicability can vary depending on the discipline.

Based on: https://book.the-turing-way.org/reproducible-research/reproducible-research / Plesser HE (2018) Reproducibility vs. Replicability: A Brief History of a Confused Terminology. Front. Neuroinform. 11:76. doi: 10.3389/fninf.2017.00076

Requirements of the research funders

Large research funders such as the Swiss National Science Foundation, other national research funders or the European Union generally attach the condition of submitting a data management plan and making research data publicly accessible (open (research) data) to the award of project funding, provided there are no legal or ethical obstacles.

Based on: https://www.snf.ch/de/dMILj9t4LNk8NwyR/thema/open-research-data

Research Data Management

Research data management (RDM) encompasses all planning and measures that ensure that research data is discoverable, accessible, interoperable, and reusable (FAIR) throughout its entire life cycle—from planning and collection to analysis and storage to dissemination and long-term archiving. This includes, among other things, clear file naming, structured folder organization, meaningful metadata and documentation, secure storage, and regular backups. RDM helps to prevent data loss, ensure the traceability or reproducibility of research, and enable future reuse of the data – both by the researchers themselves and by others. In addition, RDM is an important contribution to the implementation of ethical guidelines and data protection regulations.

Further information: forschungsdaten.info (German)

Supplementary Material

Supplementary material (or supplementary data) is material (including data) that cannot be integrated into the main text of a scientific article due to space limitations. This material is not directly necessary to understand the results and conclusions of the article, but may still be relevant to the reader for contextualisation or further research. It is recommended to publish Supplementary Material on a publication or research data repository that assigns DOIs or other PIDs.

Based on: International Journal of Epidemiology

Synchronization

Synchronization automatically updates data between two or more storage locations so that the latest version is available everywhere. This is useful, for example, when working with the same data on different devices. The term is sometimes confused with backup.

Further information: forschungsdaten.info | Synology Blog (German)

University Library of Bern UB

Glossary

Anonymization

Artificial Intelligence (AI)

Backup

CARE Principles

Data Steward

Data Transfer and Use Agreement (DTUA)

Data management plan

Data provenance

Informed Consent

Documentation

Electronic lab notebooks

Encryption

FAIR Data

Interoperability

Intellectual Property Right in Data

Research data

Research data repository

Long-term archiving

Licences for data

Metadata

Open (Research) Data

Persistent identifiers (DOI etc.)

Personal data

Personal data, sensitive

Pre-registration

Pseudonymization

Reproducibility

Requirements of the research funders

Research Data Management

Supplementary Material

Synchronization