The_Illiterati: Part-of-Speech Tagging for Magahi and Bhojpuri Without Even Knowing the Alphabet

Proisl T, Uhrig P, Heinrich P, Blombach A, Mammarella S, Dykes N, Kabashi B (2019)

Publication Language: English

Publication Type: Conference contribution, Original article

Publication year: 2019

Publisher: Association for Computational Linguistics

Pages Range: 73-79

Conference Proceedings Title: Proceedings of the First International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2019)

Event location: Trento

URI: https://www.aclweb.org/anthology/2019.nsurl-1.11

Open Access Link: https://www.aclweb.org/anthology/2019.nsurl-1.11

Abstract

In this paper, we describe the part-of-speech-tagging experiments for Magahi and Bhojpuri that we conducted for our participation in the NSURL 2019 shared tasks 9 and 10 (Low-level NLP Tools for (Magahi|Bhojpuri) Language). We experiment with three different part-of-speech taggers and evaluate the impact of additional resources such as Brown clusters, word embeddings and transfer learning from additional tagged corpora in related languages. In a 10-fold cross-validation on the training data, our best-performing models achieve accuracies of 90.70% for Magahi and 94.08% for Bhojpuri. Accuracy increased to 94.79% for Magahi and dropped to 78.68% for Bhojpuri on the test data.

Authors with CRIS profile

Thomas Proisl Lehrstuhl für Korpus- und Computerlinguistik Peter Uhrig Chair of English Linguistics Philipp Heinrich Lehrstuhl für Korpus- und Computerlinguistik Andreas Blombach Lehrstuhl für Korpus- und Computerlinguistik Nathan Dykes Lehrstuhl für Korpus- und Computerlinguistik Besim Kabashi Lehrstuhl für Korpus- und Computerlinguistik

How to cite

APA:

Proisl, T., Uhrig, P., Heinrich, P., Blombach, A., Mammarella, S., Dykes, N., & Kabashi, B. (2019). The_Illiterati: Part-of-Speech Tagging for Magahi and Bhojpuri Without Even Knowing the Alphabet. In Proceedings of the First International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2019) (pp. 73-79). Trento, IT: Association for Computational Linguistics.

MLA:

Proisl, Thomas, et al. "The_Illiterati: Part-of-Speech Tagging for Magahi and Bhojpuri Without Even Knowing the Alphabet." Proceedings of the First International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2019), Trento Association for Computational Linguistics, 2019. 73-79.

BibTeX: Download