Labeling, cutting, grouping: An efficient text line segmentation method for medieval manuscripts

Alberti M, Vogtlin L, Pondenkandath V, Seuret M, Ingold R, Liwicki M (2019)


Publication Type: Conference contribution

Publication year: 2019

Publisher: IEEE Computer Society

Pages Range: 1200-1206

Conference Proceedings Title: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR

Event location: Sydney, NSW AU

ISBN: 9781728128610

DOI: 10.1109/ICDAR.2019.00194

Abstract

This paper introduces a new way for text-line extraction by integrating deep-learning based pre-classification and state-of-the-art segmentation methods. Text-line extraction in complex handwritten documents poses a significant challenge, even to the most modern computer vision algorithms. Historical manuscripts are a particularly hard class of documents as they present several forms of noise, such as degradation, bleed-through, interlinear glosses, and elaborated scripts. In this work, we propose a novel method which uses semantic segmentation at pixel level as intermediate task, followed by a text-line extraction step. We measured the performance of our method on a recent dataset of challenging medieval manuscripts and surpassed state-of-the-art results by reducing the error by 80.7%. Furthermore, we demonstrate the effectiveness of our approach on various other datasets written in different scripts. Hence, our contribution is two-fold. First, we demonstrate that semantic pixel segmentation can be used as strong denoising pre-processing step before performing text line extraction. Second, we introduce a novel, simple and robust algorithm that leverages the high-quality semantic segmentation to achieve a text-line extraction performance of 99.42% line IU on a challenging dataset.

Authors with CRIS profile

Involved external institutions

How to cite

APA:

Alberti, M., Vogtlin, L., Pondenkandath, V., Seuret, M., Ingold, R., & Liwicki, M. (2019). Labeling, cutting, grouping: An efficient text line segmentation method for medieval manuscripts. In Proceedings of the International Conference on Document Analysis and Recognition, ICDAR (pp. 1200-1206). Sydney, NSW, AU: IEEE Computer Society.

MLA:

Alberti, Michele, et al. "Labeling, cutting, grouping: An efficient text line segmentation method for medieval manuscripts." Proceedings of the 15th IAPR International Conference on Document Analysis and Recognition, ICDAR 2019, Sydney, NSW IEEE Computer Society, 2019. 1200-1206.

BibTeX: Download