Piperski A (2020)
Publication Status: Published
Publication Type: Conference contribution, Conference Contribution
Publication year: 2020
Publisher: ABBYY PRODUCTION LLC
Pages Range: 615-627
DOI: 10.28995/2075-7182-2020-19-615-627
This paper discusses the use of most widely-known Russian corpora, namely Russian National Corpus, ruTenTen, General Internet Corpus of Russian, and Araneum Russicum Maximum, for the theoretical study of Russian language. Based on a sample of papers from 2019, I demonstrate that scholars, especially theoretical linguists, tend to ignore the opportunities provided by a wide range of Web corpora, even though these resources are well-known to the NLP community. I present a selection of case studies to show that data from “non-classical” corpora can be used for studying various linguistic phenomena, such as: 1) variation in morphology and syntax; 2) word formation and lexical change; 3) construction grammar. I also claim that the underuse of non-classical corpora is partly due to the fact that they are (perceived as) not quite user-friendly.
APA:
Piperski, A. (2020). Russian language and corpus diversity РУССКИЙ ЯЗЫК И КОРПУСНОЕ РАЗНООБРАЗИЕ. In Proceedings of the 2020 Annual International Conference on Computational Linguistics and Intellectual Technologies, Dialogue 2020 (pp. 615-627). ABBYY PRODUCTION LLC.
MLA:
Piperski, Aleksandr. "Russian language and corpus diversity РУССКИЙ ЯЗЫК И КОРПУСНОЕ РАЗНООБРАЗИЕ." Proceedings of the 2020 Annual International Conference on Computational Linguistics and Intellectual Technologies, Dialogue 2020 ABBYY PRODUCTION LLC, 2020. 615-627.
BibTeX: Download