Dataset of pages from early printed books with multiple font groups

Seuret M, Limbach S, Weichselbaumer N, Maier A, Christlein V (2019)


Publication Type: Conference contribution

Publication year: 2019

Publisher: Association for Computing Machinery

Pages Range: 1-6

Conference Proceedings Title: ACM International Conference Proceeding Series

Event location: Sydney, NSW AU

ISBN: 9781450376686

DOI: 10.1145/3352631.3352640

Open Access Link: http://www.weichselbaumer.info/s/HIP2019_type_groups_dataset.pdf

Abstract

Based on contemporary scripts, early printers developed a large variety of different fonts. While fonts may slightly differ from one printer to another, they can be divided into font groups, such as Textura, Antiqua, or Fraktur. The recognition of font groups is important for computer scientists to select adequate OCR models, and of high interest to humanities scholars studying early printed books and the history of fonts. In this paper, we introduce a new, public dataset for the recognition of font groups in early printed books, and evaluate several state-of-the-art CNNs for the font group recognition task. The dataset consists of more than 35 600 page images, each page showing up to five different font groups, of which ten are considered in this dataset.

Authors with CRIS profile

Involved external institutions

How to cite

APA:

Seuret, M., Limbach, S., Weichselbaumer, N., Maier, A., & Christlein, V. (2019). Dataset of pages from early printed books with multiple font groups. In ACM International Conference Proceeding Series (pp. 1-6). Sydney, NSW, AU: Association for Computing Machinery.

MLA:

Seuret, Mathias, et al. "Dataset of pages from early printed books with multiple font groups." Proceedings of the 5th International Workshop on Historical Document Imaging and Processing, HIP 2019, held in conjunction with ICDAR 2019, Sydney, NSW Association for Computing Machinery, 2019. 1-6.

BibTeX: Download