Part-of-Speech Tagging – A Solved Task? An evaluation of POS taggers for the Web as corpus

Giesbrecht E, Evert S (2009)


Publication Language: English

Publication Type: Conference contribution, Conference Contribution

Publication year: 2009

City/Town: San Sebastian, Spain

Pages Range: 27-35

Conference Proceedings Title: Proceedings of the 5th Web as Corpus Workshop (WAC5)

URI: http://purl.org/stefan.evert/PUB/GiesbrechtEvert2009_Tagging.pdf

Open Access Link: https://www.sigwac.org.uk/attachment/wiki/WAC5/WAC5_proceedings.pdf

Abstract

Part-of-speech (POS) tagging is an important preprocessing step in natural language processing. It is often considered to be a “solved task”, with published tagging accuracies around 97%. Our evaluation of five state-of-the-art POS taggers on German Web texts shows that such high accuracies can only be achieved under artificial cross-validation conditions. In a real-life scenario, accuracy drops below 93% with enormous variation between different text genres, making the taggers unsuitable for fully automatic processing. We find that HMM taggers are more robust and much faster than advanced machine-learning approaches such as MaxEnt. Promising directions for future research are unsupervised learning of a tagger lexicon from large unannotated corpora, as well as developing adaptive tagging models.

Authors with CRIS profile

How to cite

APA:

Giesbrecht, E., & Evert, S. (2009). Part-of-Speech Tagging – A Solved Task? An evaluation of POS taggers for the Web as corpus. In Alegria I, Leturia I, Sharoff S (Eds.), Proceedings of the 5th Web as Corpus Workshop (WAC5) (pp. 27-35). San Sebastian, Spain.

MLA:

Giesbrecht, Eugenie, and Stephanie Evert. "Part-of-Speech Tagging – A Solved Task? An evaluation of POS taggers for the Web as corpus." Proceedings of the Proceedings of the 5th Web as Corpus Workshop (WAC5) Ed. Alegria I, Leturia I, Sharoff S, San Sebastian, Spain, 2009. 27-35.

BibTeX: Download