Collecting collocations for the Albanian language

Kabashi B (2019)


Publication Type: Conference contribution

Publication year: 2019

Publisher: Lexical Computing CZ s.r.o.

Book Volume: 2019-October

Pages Range: 478-489

Conference Proceedings Title: Proceedings of Electronic Lexicography in the 21st Century Conference

Event location: Sintra PT

Abstract

The presented paper describes the collecting of data from different sources to build a collocation data set with the aim of compiling the first contemporary collocation dictionary for the Albanian language. The work is based (1) on the analysis of empirical data, i. e. linguistic corpora, using the computational methods and tools, as well as (2) on traditional dictionaries. As empirical data we use the AlCo (Albanian Text Corpus), the AlCoPress 2017-2019, N-Grams extracted from both, methods like Log-likelihood and Dice coefficient using the IMS Open Corpus Workbench (CWB) and the Corpus Query Processor, Web version (CQPweb). Despite the enormous support, an unsupervised automated compilation of a collocation dictionary of high quality, like those created by lexicographers, seems to be impossible without intervention. In order to complete the collection of the data we additionally use lexical information extracted from traditional dictionaries. The primary goal is to create a language resource that can be used among others also for Natural Language Processing purposes. The presented work is still in progress and, of course, will change until its final version.

Authors with CRIS profile

How to cite

APA:

Kabashi, B. (2019). Collecting collocations for the Albanian language. In Iztok Kosem, Tanara Zingano Kuhn, Margarita Correia, Jose Pedro Ferreira, Maarten Jansen, Isabel Pereira, Jelena Kallas, Milos Jakubicek, Simon Krek, Carole Tiberius (Eds.), Proceedings of Electronic Lexicography in the 21st Century Conference (pp. 478-489). Sintra, PT: Lexical Computing CZ s.r.o..

MLA:

Kabashi, Besim. "Collecting collocations for the Albanian language." Proceedings of the 6th Biennial Conference on Electronic Lexicography in the 21st Century: Smart Lexicography, eLex 2019, Sintra Ed. Iztok Kosem, Tanara Zingano Kuhn, Margarita Correia, Jose Pedro Ferreira, Maarten Jansen, Isabel Pereira, Jelena Kallas, Milos Jakubicek, Simon Krek, Carole Tiberius, Lexical Computing CZ s.r.o., 2019. 478-489.

BibTeX: Download