Mayr M, Neumeier K, Krenz J, Bürcky S, Kordon F, Seuret M, Zöllner J, Wu F, Maier A, Christlein V (2025)
Publication Type: Journal article
Publication year: 2025
URI: https://link.springer.com/article/10.1007/s11042-024-20545-9
DOI: 10.1007/s11042-024-20545-9
Traditional methods in handwritten text recognition primarily focus on generating basic transcriptions, which often fall short for in-depth humanities research. Our study enhances this by providing diplomatic transcriptions for German studies, meticulously reproducing the original manuscripts, including layout and expanded abbreviations. State-of-the-art sequence-to-sequence approaches for handwritten text recognition predominantly use Connectionist Temporal Classification (CTC) as an auxiliary loss of the encoder output to improve robustness and accuracy. This is not possible in this task due to the great differences in the length of diplomatic transcriptions. We propose using the basic transcription instead of the diplomatic one as an additional target for the CTC feedback. Additionally, we introduce positional encoding at the intersection between the encoder and decoder to resolve the conflict of competing encoder objectives, balancing CTC loss reduction with the maintenance of implicit positional encoding for the decoder. Our empirical tests on the newly created dataset “Nuremberg Letterbooks” demonstrate significant data efficiency improvements. With only 4000 training lines (about 130 transcribed pages), we achieve a Character Error Rate (CER) of 9.39% without expanded abbreviations and 12.07% with expanded abbreviations, outperforming the baseline errors of 14.26% and 68.21%, respectively.
APA:
Mayr, M., Neumeier, K., Krenz, J., Bürcky, S., Kordon, F., Seuret, M.,... Christlein, V. (2025). Data-efficient handwritten text recognition of diplomatic historical text. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-024-20545-9
MLA:
Mayr, Martin, et al. "Data-efficient handwritten text recognition of diplomatic historical text." Multimedia Tools and Applications (2025).
BibTeX: Download