Rohrbacher K (2025)
Publication Type: Journal article
Publication year: 2025
Book Volume: 11
Article Number: 51
DOI: 10.5334/johd.350
de-Corp is a corpus of ~5000 German-language fiction and non-fiction texts published between 1780 and 1930 and 1940 respectively, compiled from the German and U.S. Project Gutenberg libraries. It includes detailed metadata on genre, publication year, and author gender, offering over 300 million tokens across 1,400+ unique authors. The dataset supports large-scale historical and literary analysis and is especially valuable for research in Computational Literary Studies and Computational Linguistics.
APA:
Rohrbacher, K. (2025). de-Corp: A Corpus of German-language Fiction and Non-Fiction (1780–1930). Journal of Open Humanities Data, 11. https://doi.org/10.5334/johd.350
MLA:
Rohrbacher, Katrin. "de-Corp: A Corpus of German-language Fiction and Non-Fiction (1780–1930)." Journal of Open Humanities Data 11 (2025).
BibTeX: Download