Giesbrecht E, Evert S (2009)
Publication Language: English
Publication Type: Conference contribution, Conference Contribution
Publication year: 2009
City/Town: San Sebastian, Spain
Pages Range: 27-35
Conference Proceedings Title: Proceedings of the 5th Web as Corpus Workshop (WAC5)
URI: http://purl.org/stefan.evert/PUB/GiesbrechtEvert2009_Tagging.pdf
Open Access Link: https://www.sigwac.org.uk/attachment/wiki/WAC5/WAC5_proceedings.pdf
Part-of-speech (POS) tagging is an important preprocessing step in natural language processing. It is often considered to be a “solved task”, with published tagging accuracies around 97%. Our evaluation of five state-of-the-art POS taggers on German Web texts shows that such high accuracies can only be achieved under artificial cross-validation conditions. In a real-life scenario, accuracy drops below 93% with enormous variation between different text genres, making the taggers unsuitable for fully automatic processing. We find that HMM taggers are more robust and much faster than advanced machine-learning approaches such as MaxEnt. Promising directions for future research are unsupervised learning of a tagger lexicon from large unannotated corpora, as well as developing adaptive tagging models.
APA:
Giesbrecht, E., & Evert, S. (2009). Part-of-Speech Tagging – A Solved Task? An evaluation of POS taggers for the Web as corpus. In Alegria I, Leturia I, Sharoff S (Eds.), Proceedings of the 5th Web as Corpus Workshop (WAC5) (pp. 27-35). San Sebastian, Spain.
MLA:
Giesbrecht, Eugenie, and Stephanie Evert. "Part-of-Speech Tagging – A Solved Task? An evaluation of POS taggers for the Web as corpus." Proceedings of the Proceedings of the 5th Web as Corpus Workshop (WAC5) Ed. Alegria I, Leturia I, Sharoff S, San Sebastian, Spain, 2009. 27-35.
BibTeX: Download