Managing data

Research Data Management (RDM) includes clear file naming, structured folder organization, suitable metadata and documentation, secure storage, and regular backups. RDM is essential to prevent data loss, ensure transparency and reproducibility, and enable future reuse of data by yourself and by other researchers. For sensitive or confidential data, RDM also involves ensuring compliance with data protection regulations and safeguarding against unauthorized access. A data management plan (DMP) is crucial for effective research data management.

In case of any questions, contact us at researchdata@unibe.ch!

File organization

To avoid errors, confusion, and long search times later on, it is advisable to establish a clear and systematic file and folder structure at the beginning of a project. This is especially important when working with other research groups. All project members should agree on a common structure and apply it consistently. The folder structure and naming conventions should be documented in a reference file that can be used throughout the project and included as documentation when the data are published or archived.

Group related files in folders (e.g. according to archives used, measurements, interview partners, methods, or project phases)

Use clear, unique folder names

Use a hierarchical folder structure (N.B.: too many nested levels result in long and complicated file paths)

Keep active and completed work in separate folders and delete any temporary files that are no longer required.

Use an archive folder for files that are outdated but should be kept for future reference.

Here is an example folder structure from the UK Data Service.

File names

Make sure you use file names that are unique and are also meaningful for people who are not involved in the project. File names should generally include the following elements:

Creation date (YYYY-MM-DD)

Project reference/name

Description of the content (keywords)

Name of creator (initials or whole name)

Name of research team/department

Version number

To avoid operating system constraints, use the following character/naming conventions:

Short names (32 characters max.)

No special characters (: & * % $ £ ] { ! @)

Use underscores _ rather than blank spaces or dots

Include a file suffix wherever possible (such as .txt, .csv)

Do not rely on uppercase/lowercase distinctions

Some examples of good naming conventions can be found on the website of the Geneva Graduate Institute

File formats

Choosing an appropriate file format helps ensure that files remain usable over the long term and significantly facilitates the reuse of research data. When selecting a format, several factors should be considered:

Future-proofing: how many software products can read the data format?

Open access to documentation

No legal constraints (patents, commercial licenses)

No technical constraints (encryption, Digital Rights Management (DRM))

Established in the respective scholarly community

The file formats for research data can vary widely depending on the discipline in question. The following file formats are recommended:

Images: TIFF, TIF

Documents: TXT, PDF/A

Tabular data: CSV

Audio files: WAV

Databases: SQL, XML

Structured data: XML, JSON, YAML

Further information about which file formats are recommended for long-term preservation can be found here.

Version control

Version control is a key aspect of data management. It should be used for datasets that change over the course of a project. This ensures that changes are traceable and can be undone if need be. Here are some best practices for manual versioning:

Never change the raw data files; you may want to go back to the original state in the course of your work. Keep the original data file as a golden copy.

Create working copies and save milestone versions regularly. Milestone versions represent an intermediate state of your data, e.g. once a predefined processing step (such as transcription, anonymization, or cleaning) has been completed.

Individual datasets should be named sequentially, and the names should include the save date (YYYY-MM-DD) along with the version number.

Maintaining a version table in which all versions (including file names, changes, reasons for changes, responsible collaborator) can help keep track of datasets, especially if you are dealing with complex datasets.

For managing code, specific version control systems such as Git should be used.

Further information and best practices here.

Data Storage

Where you should store your research data during your research projects depends on several factors:

available funding

sensitivity of the data: Does your data contain personal information? Are there any licenses or contractual agreements that require secure storage?

access requirements: How frequently do you need to access and change your data? How many collaborators need access?

The University of Bern offers a variety of storage facilities for research data, including network-attached storage (NAS) for various use-cases, and cloud storage provided by Microsoft (Sharepoint/OneDrive). For details, contact the IT support at your institute, department or faculty.

Data backup

Backups are essential for restoring your data in the event of loss or accidental changes.

Preferably, adopt the 3-2-1 backup strategy:

3 copies of the data (1 original + 2 backups)

Store on 2 different types of media (e.g. external hard drives, cloud)

1 copy off-site (e.g. external drive at home, cloud storage)

Backups should ideally be performed automatically at regular intervals. Check that the backup was successful and that the data can be restored, if necessary.

To automate back-up on your personal device, we recommend using open-source tools, such as Duplicati.

To check that the backups you created are not corrupted, you can use checksum tools, such as MD5 Summer.

If you use the university’s IT systems such as Campus Storage, your data is backed up automatically (see information in UniBE intranet [German only]).

Documentation

Comprehensive documentation is essential to enable correct interpretation and reuse of the data at a later date. Documentation is primarily aimed at human readers and is a crucial tool to implement the FAIR data principles.

Documentation should include information such as

Data creator(s)

The project for which the data were generated, and any other context information necessary to understand and re-use the data

information on data re-use, such as the license, or any restrictions on data access and re-use

time and place the data was collected

methods used

tools and software used to collect or create the data

information on the structure and layout of the data, e.g. variables, codes, missing values, nomenclature, abbreviations and acronyms.

Provide this information in a separate file accompanying your dataset, e.g. a ReadMe file, using our template:

Readme_Template_EN.txt (3KB)

Find general information on ReadMe files here, and check the datasets on BORIS Portal to see some examples of datasets with documentation.

General guidelines for data documentation at the University of Bern are provided in the Recommendation on research data documentation from the Open Science Team 

Metadata

Metadata is information about objects (including data) in a structured and machine-readable form. Metadata helps you and other researchers find and reuse data, and machines/algorithms to analyze and process it. Metadata are a crucial component in implementing the FAIR data principles (cf. our glossary).

When publishing data, you automatically generate metadata when you fill in the repository’s input mask. Repositories usually have implemented metadata standards which optimize the findability and interoperability of metadata. For example, BORIS Portal, the repository of the University of Bern, uses the Dublin Core metadata standard.

To describe your research data in your every-day data management, or to prepare your data for archiving, use tools such as the Dublin Core generator or the Data Cite Metadata Generator. These tools generate machine-readable files that you can store alongside your research data so that they can be identified, understood and re-used later.

University Library of Bern UB

Managing data

File organization

File organization

File names

File names

File formats

File formats

Version control

Version control

Data Storage

Data Storage

Data backup

Data backup

Documentation

Documentation

Metadata

Metadata

Guidelines for research data management

Guidelines for research data management