Tools

A full range of tools for data-driven research is available. In addition to many free and open-source services, there are also proprietary platforms. UB Bern develops its own tools as necessary, and offers licenses and guidance concerning text and data mining platforms. 

DS Digital Toolbox

The DS Digital Toolbox of the University Library Bern offers Jupyter Notebooks for typical tasks when working with data: - Use of APIs of catalogues, full-text platforms and databases: Swisscovery, E-Rara, E-Manuscripta, E-Periodica, Crossref, OpenAlex, Swissdox@LiRI - Data cleansing of data spreadsheets - Segmentation of documents in preparation for OCR - Reading text from PDFs and text recognition (OCR) - Natural Language Processing (NLP) Basics - Querying and analysing library metadata using SRU

HathiTrust Research Center (HTRC)

The HTRC enables the application of TDM methods to the contents of the HathiTrust Digital Library, which contains over 18 million digitised volumes from 1700 onwards. Corpora can be created according to your own criteria and processed with implemented text analysis routines. It is also possible to use your own algorithms. Various tools and comprehensive documentation are available for this purpose. To use HathiTrust Research Centre (HTRC), authentication via SWITCH edu-ID is required and a personal account must be created with HathiTrust/HTRC.

OpenRefine

OpenRefine is an open-source software with an intuitive user interface for the manipulation of tabular data. OpenRefine provides extensive functions for data cleansing and transformation, which are easy to document and reproduce thanks to the processing history. A special feature is the "Reconciliation" function, which can be used to check and enrich your own data against external data providers (e.g. Wikidata, Gemeinsame Normdatei, FactGrid, ORCID, Getty). OpenRefine is available for several operating systems and can be tested online without having to be installed.

Jupyter

Jupyter is an open-source integrated development environment (IDE) for various programming languages such as R and Python from the field of data science. Jupyter follows the literature programming approach, in which code, documentation and output are summarised in one document (Jupyter Notebook). Analysis steps can thus be explained in detail, visualisations can be integrated directly and the content can be exported in various formats. Jupyter can be used locally, online via JupyterLite or with a Google account in Google Colab. For members of Swiss universities and research institutions, EPFL provides an online JupyterHub environment (login via SWITCH edu ID).

SRU

Search/Retrieve via URL (SRU) is a protocol for search queries on the Internet using CQL, making it possible to search a catalogue in a browser directly via a URL (e.g. without swisscovery). A Jupyter Notebook is available to extract the desired control and subfields from the MARCXML.

Digital Scholarship Tool Collections
Digital Scholarship Tool Collections
Text analysis, Natural Language Processing (NLP), literature analysis 
Digital Humanities