Bridging the Gaps: Integrating Bibliographic Metadata Into Wikidata for Literary Corpora

Rohrbacher K, Schrittesser D (2026)


Publication Type: Journal article

Publication year: 2026

Journal

Book Volume: 12

Article Number: 37

DOI: 10.5334/johd.483

Abstract

This paper presents a case study on enhancing literary-corpus metadata by integrating large-scale bibliographic resources with Wikidata. Digital libraries such as Project Gutenberg or HathiTrust often provide only minimal metadata (e.g., author name and title). For large-scale literary analysis, however, it is crucial to include additional information such as year of publication, author gender, genre, or publisher. Conversely, using Wikidata to enrich existing literary-corpus metadata is challenging, as significant gaps in coverage remain. In this case study, we draw on the metadata of a large literary corpus to address these gaps. We conduct a feasibility analysis to determine how a workflow can be established that integrates metadata from bibliographic catalogues into Wikidata as a step in the digital-humanities pipeline. We explore both procedural approaches and existing software tools and discuss resulting challenges and limitations. Our methods are documented and open-source; the full Python scripts and data processing workflows are publicly available on GitHub.1 The goal is to develop reproducible methods for sharing and improving metadata availability across open platforms.

Authors with CRIS profile

Involved external institutions

How to cite

APA:

Rohrbacher, K., & Schrittesser, D. (2026). Bridging the Gaps: Integrating Bibliographic Metadata Into Wikidata for Literary Corpora. Journal of Open Humanities Data, 12. https://doi.org/10.5334/johd.483

MLA:

Rohrbacher, Katrin, and David Schrittesser. "Bridging the Gaps: Integrating Bibliographic Metadata Into Wikidata for Literary Corpora." Journal of Open Humanities Data 12 (2026).

BibTeX: Download