Data sources for text and data mining
Text and data mining (TDM) refers to algorithm-based processes to automatically extract information from unstructured or semi-structured text data (text mining) and structured data (data mining).
On this page you will find text and data mining resources – ordered by content category – which are either freely available on the web or through UB Bern’s licenses.
Unless other contact details are provided, please refer to UB Bern if you are interested in obtaining data.
Licensed data, text and image collections
Resource | Contents | Detailed information |
---|---|---|
Swiss Media content:
Swissdox@LiRI (general information on the Swissdox database) |
|
|
International Media content: Nexis Data Lab |
|
|
Books International: HathiTrust Research Center |
|
|
WBIS Online (DeGruyter) (general information about the database) |
|
|
Germanistik Online (DeGruyter) (general information about the database) |
|
|
Romance Studies Bibliography (DeGruyter) (general information about the database) |
|
|
English-language periodicals (Gale Cengage) | ||
English-language periodicals (ProQuest) |
|
|
English-language monographs (Gale Cengage) |
|
|
UK Parliamentary Papers (ProQuest) |
|
|
Cambridge Histories (CUP) |
|
Freely accessible data, text and image collections
Platform | Contents | Detailed information |
---|---|---|
CLARIN Resource Families |
|
Partly available for free, various licenses |
e-rara |
|
Overview of data interfaces and terms |
e-manuscripta |
|
Overview of data interfaces and terms |
e-periodica |
|
Overview of data interfaces and terms |
GLAM Workbench |
|
Freely accessible, various licenses |
Chronicling America |
|
Freely accessible, public domain |
Internet Archive |
|
Freely accessible, various licenses, sometimes not specified |
Project Gutenberg |
|
Freely accessible, public domain |
OpenGLAM Survey |
|
Freely accessible, public domain or open licenses |
Text Creation Partnership |
|
Freely accessible, public domain |
Legal aspects
The resources and their interfaces are subject to various legal and technical terms of use. Please consult these before any automated access. In particular, automated access is often excluded for licensed content that is not listed here and may cause the provider to block access to the database. Please contact us to check the legality of access if you are in any doubt.
According to the Swiss Federal Act on Copyright and Related Rights, duplication and storage of legally accessible content for scientific purposes as in the context of TDM is permitted.