Definition and Evaluation of the NEOCR Dataset for Natural-Image Text Recognition

Nagy R, Meyer-Wegener K, Dicker A (2011)

Publication Language: English

Publication Type: Other publication type

Publication year: 2011

Series: Department Informatik Technical Reports

Pages Range: 1 - 31

Journal Issue: CS-2011-07

URI: http://www.opus.ub.uni-erlangen.de/opus/volltexte/2011/2859/

Abstract

Recently growing attention has been paid to recognizing text in natural images. Natural image text OCR is far more complex than OCR in scanned documents. Text in real world environments appears in arbitrary colors, font sizes and typefaces, often affected by perspective distortion, lighting effects, textures or occlusion. Currently there is no dataset publicly available that covers all aspects of natural image OCR. A comprehensive well-annotated configurable dataset for optical character recognition in natural images is defined and created for the evaluation and comparison of approaches tackling with natural-image text OCR. Furthermore, current open source and commercial OCR tools have been analyzed in various test scenarios using the proposed NEOCR dataset. Based on the results further steps to be addressed by the OCR community are concluded towards all-embracing natural-image text recognition.

Authors with CRIS profile

Robert Nagy Chair for Computer Science 6 (Data Management) Klaus Meyer-Wegener Chair for Computer Science 6 (Data Management)

How to cite

APA:

Nagy, R., Meyer-Wegener, K., & Dicker, A. (2011). Definition and Evaluation of the NEOCR Dataset for Natural-Image Text Recognition.

MLA:

Nagy, Robert, Klaus Meyer-Wegener, and Anders Dicker. Definition and Evaluation of the NEOCR Dataset for Natural-Image Text Recognition. 2011.

BibTeX: Download