de-Corp: A Corpus of German-language Fiction and Non-Fiction (1780–1930)

Rohrbacher K (2025)


Publication Type: Journal article

Publication year: 2025

Journal

Book Volume: 11

Article Number: 51

DOI: 10.5334/johd.350

Abstract

de-Corp is a corpus of ~5000 German-language fiction and non-fiction texts published between 1780 and 1930 and 1940 respectively, compiled from the German and U.S. Project Gutenberg libraries. It includes detailed metadata on genre, publication year, and author gender, offering over 300 million tokens across 1,400+ unique authors. The dataset supports large-scale historical and literary analysis and is especially valuable for research in Computational Literary Studies and Computational Linguistics.

Authors with CRIS profile

How to cite

APA:

Rohrbacher, K. (2025). de-Corp: A Corpus of German-language Fiction and Non-Fiction (1780–1930). Journal of Open Humanities Data, 11. https://doi.org/10.5334/johd.350

MLA:

Rohrbacher, Katrin. "de-Corp: A Corpus of German-language Fiction and Non-Fiction (1780–1930)." Journal of Open Humanities Data 11 (2025).

BibTeX: Download