Documentation and Metadata

File organization

To avoid errors, mix-ups and long search times in future, it is worth investing some time in creating a systematically organized file and folder structure already at the start of a project. This is especially important if you are collaborating with other research groups. Everyone involved in a project should agree to a scheme and stick to it. It is advisable to record the organizational and naming scheme in a document which you subsequently deposit with the published data as an accompanying document.

Group related files in folders (e.g. for measurements, methods or project phases)
Use clear, unique folder names
Use a hierarchical folder structure (N.B.: too many nested levels results in long and complicated filepaths)
Keep active and completed work in separate folders and delete any temporary files that are no longer required.

File names

Make sure you use file names that are unique and are also meaningful for people who are not involved in the project. General elements that can form part of a name:

Creation date (YYYY-MM-DD)
Project reference/name
Description of the content
Name of creator (initials or whole name)
Name of research team/department
Version number

To avoid operating system constraints, use the following character/naming conventions:

Short names
No special characters (: & * % $ £ ] { ! @)
Use underscores _ rather than blank spaces or dots
Include a file suffix wherever possible (.txt, .xls, etc.)
Do not rely on uppercase/lowercase distinctions

File formats

The careful choice of a file format can ensure that files can still be used after many years and consequently greatly facilitate reuse of the research data. When choosing a suitable format, various factors should be taken into consideration:

Future-proofing: how many software products can read the data format?
Open access to documentation
No legal constraints (patents)
No technical constraints (encryption, DRM)
Established in community

The file formats for research data can vary widely depending on the discipline in question. The following file formats are recommended:

Images: TIFF, TIF
Documents: TXT, ASC, PDF/A
Tabular data: CSV
Audio files: WAV
Databases: SQL, XML
Structured data: XML, JSON, YAML

Further information about which file formats are recommended for long-term preservation can be found at here.

Version control

It is essential to use version control, especially for datasets that change over the course of a project. Individual datasets should be named sequentially and the names should include the save date (YYYY-MM-DD) along with the version number. The final version should be indicated as such. Maintaining a version table in which all changes and new names are recorded can help keep track of the datasets.

Especially when working with a number of different people, it may be advisable to regularly save a milestone version of the file which then must not be changed or deleted.

To summarize, forschungsdaten.info recommends:

Use sequential numbering
Include the date and version number in the name
Use a version control table
Specify who is responsible for providing the final files
Use version control software for large data volumes
Save milestone versions

Further information and best practices

Wilson, G. et al. (2017): Good enough practices in scientific computing. PLoS Comput Biol 13(6): e1005510 https://doi.org/10.1371/journal.pcbi.1005510
Free version control software

Data backup

We recommend you back up your data using the university's IT system as it collects the data campus-wide and redundantly backs it up to two state-of-the-art tape libraries.

Click here for more information: Campus Backup/Archive (access only via campus network)

You should always adopt the 3-2-1 backup strategy:

3 copies of the data (1 original + 2 backups)
Stored on 2 different types of media (external hard drives, USB sticks, SD cards, CDs, DVDs, Cloud)
1 copy off-site

Backup should be automated to run at regular intervals. Check that the backup was successful and that the data can be retrieved again if necessary.

Documentation

Comprehensive documentation is essential to enable correct interpretation and reuse of the data at a later date. Among other things, the documentation should include details about the time and place the data was collected, the methods, tools, software and statistics models used, as well as information about the parameters chosen and any missing values, along with nomenclature and acronyms. This information can be added complementary to your dataset, e.g., in the form of supplementary documentation in a ReadMe file.

Further information on data documentation can be found here and on ReadMe files here.

Readme_Template_EN.txt (3KB)

Recommendation on research data documentation from the Open Science, Research Data Management Support

Recommendation on research data documentation from the Open Science Team (PDF, 141KB)

Metadata

Metadata is information about data which is created in a structured and machine-readable form. The metadata helps other researchers find and reuse data. Depending on the particular discipline, there are various commonly used metadata standards and tools that can be used to describe datasets in different domains.

The repository of the University of Bern (BORIS Publications) (BORIS) uses the Dublin Core metadata element set. This metadata is automatically generated by filling in a form when depositing a dataset in the repository.

Data quality and metadata standards. The presentation link is under the BORIS Publications.

University Library of Bern UB

Documentation and Metadata

File organization

File organization

File names

File names

File formats

File formats

Version control

Version control

Data backup

Data backup

Documentation

Documentation

Metadata

Metadata