Managing data

Icon Figuren, die an einem Tisch sitzen

Research Data Management (RDM) includes clear file naming, structured folder organization, suitable metadata and documentation, secure storage, and regular backups. RDM is essential to prevent data loss, ensure transparency and reproducibility, and enable future reuse of data by yourself and by other researchers. For sensitive or confidential data, RDM also involves ensuring compliance with data protection regulations and safeguarding against unauthorized access. A data management plan (DMP) is crucial for effective research data management. 

In case of any questions, contact us at researchdata@unibe.ch

 

To avoid errors, confusion, and long search times later on, it is advisable to establish a clear and systematic file and folder structure at the beginning of a project. This is especially important when working with other research groups. All project members should agree on a common structure and apply it consistently. The folder structure and naming conventions should be documented in a reference file that can be used throughout the project and included as documentation when the data are published or archived. 

  • Group related files in folders (e.g. according to archives used, measurements, interview partners, methods, or project phases) 

  • Use clear, unique folder names 

  • Use a hierarchical folder structure (N.B.: too many nested levels result in long and complicated file paths) 

  • Keep active and completed work in separate folders and delete any temporary files that are no longer required. 

  • Use an archive folder for files that are outdated but should be kept for future reference. 

Here is an example folder structure from the UK Data Service. 

Make sure you use file names that are unique and are also meaningful for people who are not involved in the project. File names should generally include the following elements: 

  • Creation date (YYYY-MM-DD) 

  • Project reference/name 

  • Description of the content (keywords) 

  • Name of creator (initials or whole name) 

  • Name of research team/department 

  • Version number 

 
To avoid operating system constraints, use the following character/naming conventions: 

  • Short names (32 characters max.) 

  • No special characters (: & * % $ £ ] { ! @) 

  • Use underscores _ rather than blank spaces or dots 

  • Include a file suffix wherever possible (such as .txt, .csv) 

  • Do not rely on uppercase/lowercase distinctions 

 

Some examples of good naming conventions can be found on the website of the Geneva Graduate Institute 

Choosing an appropriate file format helps ensure that files remain usable over the long term and significantly facilitates the reuse of research data. When selecting a format, several factors should be considered: 

  • Future-proofing: how many software products can read the data format? 

  • Open access to documentation 

  • No legal constraints (patents, commercial licenses) 

  • No technical constraints (encryption, Digital Rights Management (DRM)) 

  • Established in the respective scholarly community 

 
The file formats for research data can vary widely depending on the discipline in question. The following file formats are recommended: 

  • Images: TIFF, TIF 

  • Documents: TXT, PDF/A 

  • Tabular data: CSV 

  • Audio files: WAV 

  • Databases: SQL, XML 

  • Structured data: XML, JSON, YAML 

 
Further information about which file formats are recommended for long-term preservation can be found here

Version control is a key aspect of data management. It should be used for datasets that change over the course of a project. This ensures that changes are traceable and can be undone if need be. Here are some best practices for manual versioning: 

  • Never change the raw data files; you may want to go back to the original state in the course of your work. Keep the original data file as a golden copy. 

  • Create working copies and save milestone versions regularly. Milestone versions represent an intermediate state of your data, e.g. once a predefined processing step (such as transcription, anonymization, or cleaning) has been completed. 

  • Individual datasets should be named sequentially, and the names should include the save date (YYYY-MM-DD) along with the version number. 

  • Maintaining a version table in which all versions (including file names, changes, reasons for changes, responsible collaborator) can help keep track of datasets, especially if you are dealing with complex datasets. 

For managing code, specific version control systems such as Git should be used. 

Further information and best practices here

Where you should store your research data during your research projects depends on several factors: 

  • available funding 

  • sensitivity of the data: Does your data contain personal information? Are there any licenses or contractual agreements that require secure storage? 

  • access requirements: How frequently do you need to access and change your data? How many collaborators need access? 

The University of Bern offers a variety of storage facilities for research data, including network-attached storage (NAS) for various use-cases, and cloud storage provided by Microsoft (Sharepoint/OneDrive). For details, contact the IT support at your institute, department or faculty. 

Backups are essential for restoring your data in the event of loss or accidental changes. 

Preferably, adopt the 3-2-1 backup strategy: 

  • 3 copies of the data (1 original + 2 backups) 

  • Store on 2 different types of media (e.g. external hard drives, cloud) 

  • 1 copy off-site (e.g. external drive at home, cloud storage) 

Backups should ideally be performed automatically at regular intervals. Check that the backup was successful and that the data can be restored, if necessary. 

To automate back-up on your personal device, we recommend using open-source tools, such as Duplicati

To check that the backups you created are not corrupted, you can use checksum tools, such as MD5 Summer

If you use the university’s IT systems such as Campus Storage, your data is backed up automatically (see information in UniBE intranet [German only]). 

Comprehensive documentation is essential to enable correct interpretation and reuse of the data at a later date. Documentation is primarily aimed at human readers and is a crucial tool to implement the FAIR data principles

Documentation should include information such as 

  • Data creator(s) 

  • The project for which the data were generated, and any other context information necessary to understand and re-use the data 

  • information on data re-use, such as the license, or any restrictions on data access and re-use 

  • time and place the data was collected 

  • methods used 

  • tools and software used to collect or create the data 

  • information on the structure and layout of the data, e.g. variables, codes, missing values, nomenclature, abbreviations and acronyms. 

Provide this information in a separate file accompanying your dataset, e.g. a ReadMe file, using our template: 

Readme_Template_EN.txt (3KB) 

Find general information on ReadMe files here, and check the datasets on BORIS Portal to see some examples of datasets with documentation.  

General guidelines for data documentation at the University of Bern are provided in the Recommendation on research data documentation from the Open Science Team 

Metadata is information about objects (including data) in a structured and machine-readable form. Metadata helps you and other researchers find and reuse data, and machines/algorithms to analyze and process it. Metadata are a crucial component in implementing the FAIR data principles (cf. our glossary). 

When publishing data, you automatically generate metadata when you fill in the repository’s input mask. Repositories usually have implemented metadata standards which optimize the findability and interoperability of metadata. For example, BORIS Portal, the repository of the University of Bern, uses the Dublin Core metadata standard. 

To describe your research data in your every-day data management, or to prepare your data for archiving, use tools such as the Dublin Core generator or the Data Cite Metadata Generator. These tools generate machine-readable files that you can store alongside your research data so that they can be identified, understood and re-used later.