Proisl T, Uhrig P, Heinrich P, Blombach A, Mammarella S, Dykes N, Kabashi B (2019)
Publication Language: English
Publication Type: Conference contribution, Original article
Publication year: 2019
Publisher: Association for Computational Linguistics
Pages Range: 73-79
Conference Proceedings Title: Proceedings of the First International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2019)
URI: https://www.aclweb.org/anthology/2019.nsurl-1.11
Open Access Link: https://www.aclweb.org/anthology/2019.nsurl-1.11
In this paper, we describe the part-of-speech-tagging experiments for Magahi and Bhojpuri that we conducted for our participation in the NSURL 2019 shared tasks 9 and 10 (Low-level NLP Tools for (Magahi|Bhojpuri) Language). We experiment with three different part-of-speech taggers and evaluate the impact of additional resources such as Brown clusters, word embeddings and transfer learning from additional tagged corpora in related languages. In a 10-fold cross-validation on the training data, our best-performing models achieve accuracies of 90.70% for Magahi and 94.08% for Bhojpuri. Accuracy increased to 94.79% for Magahi and dropped to 78.68% for Bhojpuri on the test data.
APA:
Proisl, T., Uhrig, P., Heinrich, P., Blombach, A., Mammarella, S., Dykes, N., & Kabashi, B. (2019). The_Illiterati: Part-of-Speech Tagging for Magahi and Bhojpuri Without Even Knowing the Alphabet. In Proceedings of the First International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2019) (pp. 73-79). Trento, IT: Association for Computational Linguistics.
MLA:
Proisl, Thomas, et al. "The_Illiterati: Part-of-Speech Tagging for Magahi and Bhojpuri Without Even Knowing the Alphabet." Proceedings of the First International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2019), Trento Association for Computational Linguistics, 2019. 73-79.
BibTeX: Download