Combining Visual and Linguistic Models for a Robust Recipient Line Recognition in Historical Documents

Mayr M, Felker A, Maier A, Christlein V (2022)


Publication Type: Conference contribution

Publication year: 2022

Journal

Publisher: Springer Science and Business Media Deutschland GmbH

Book Volume: 13237 LNCS

Pages Range: 598-612

Conference Proceedings Title: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Event location: La Rochelle, FRA

ISBN: 9783031065545

DOI: 10.1007/978-3-031-06555-2_40

Abstract

Automatically extracting targeted information from historical documents is an important task in the field of document analysis and eases the work of historians when dealing with huge corpora. In this work, we investigate the idea of retrieving the recipient transcriptions from the Nuremberg letterbooks of the 15th century. This task can be solved with fundamentally different ways of approaching it. First, detecting recipient lines solely based on visual features and without any explicit linguistic feedback. Here, we use a vanilla U-Net and an attention-based U-Net as representatives. Second, linguistic feedback can be used to classify each line accordingly. This is done on the one hand with handwritten text recognition (HTR) for predicting the transcriptions and on top of it a light-wight natural language processing (NLP) model distinguishing whether the line is a recipient line or not. On the other hand, we adapt a named entity recognition transformer model. The system jointly performs the line transcription and the recipient line recognition. For improving the performance, we investigated all the possible combinations with the different methods. In most cases the combined output probabilities outperformed the single approaches. The best combination achieved on the hard test set an F1 score of 80% and recipient line recognition accuracy of about 96% while the best single approach only reached about 74% and 94%, respectively.

Authors with CRIS profile

Related research project(s)

How to cite

APA:

Mayr, M., Felker, A., Maier, A., & Christlein, V. (2022). Combining Visual and Linguistic Models for a Robust Recipient Line Recognition in Historical Documents. In Seiichi Uchida, Elisa Barney, Véronique Eglin (Eds.), Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (pp. 598-612). La Rochelle, FRA: Springer Science and Business Media Deutschland GmbH.

MLA:

Mayr, Martin, et al. "Combining Visual and Linguistic Models for a Robust Recipient Line Recognition in Historical Documents." Proceedings of the 15th IAPR International Workshop on Document Analysis Systems, DAS 2022, La Rochelle, FRA Ed. Seiichi Uchida, Elisa Barney, Véronique Eglin, Springer Science and Business Media Deutschland GmbH, 2022. 598-612.

BibTeX: Download