E-VIEW-Alation – a Large-Scale Evaluation Study of Association Measures for Collocation Identification

Beitrag bei einer Tagung
(Originalarbeit)


Details zur Publikation

Autorinnen und Autoren: Evert S, Uhrig P, Bartsch S, Proisl T
Herausgeber: Iztok K, Carole T, Miloš J, Jelena K, Simon K, and Vít B
Verlag: Lexical Computing
Verlagsort: Brno
Jahr der Veröffentlichung: 2017
Tagungsband: Electronic Lexicography in the 21st Century. Proceedings of the eLex 2017 Conference
Seitenbereich: 531–549
ISSN: 2533-5626
Sprache: Englisch


Abstract


Statistical association measures (AM) play an important role in the automatic extraction of collocations and multiword expressions from corpora, but many parameters governing their performance are still poorly understood. Systematic evaluation studies have produced conflicting recommendations for an optimal AM, and little attention has been paid to other parameters such as the underlying corpus, the size of the co-occurrence context, or the application of a frequency threshold.



Our paper presents the results of a large-scale evaluation study covering 13 corpora, eight context sizes, four frequency thresholds, and 20 AMs against two different gold standards of lexical collocations. While the optimal choice of an AM depends strongly on the particular gold standard used, other parameters prove much more robust: (i) small co-occurrence contexts are better than larger spans, and the best results are usually obtained from syntactic dependencies; (ii) corpus quality is more important than sheer size, but large Web corpora prove to be a valid substitute for the British National Corpus; (iii) frequency thresholds seem to be unnecessary in most situations, as the statistical AMs successfully weed out rare and unreliable candidates; (iv) there is little interaction between the choice of AM and the other parameters.



In order to provide complete evidence for our observations to readers, we created an interactive Web-based application that allows users to manipulate all evaluation parameters and dynamically updates evaluation graphs and summaries.



FAU-Autorinnen und Autoren / FAU-Herausgeberinnen und Herausgeber

Evert, Stefan Prof. Dr.
Lehrstuhl für Korpus- und Computerlinguistik
Proisl, Thomas
Lehrstuhl für Korpus- und Computerlinguistik
Uhrig, Peter Dr.
Lehrstuhl für Anglistik, insbesondere Linguistik


Zitierweisen

APA:
Evert, S., Uhrig, P., Bartsch, S., & Proisl, T. (2017). E-VIEW-Alation – a Large-Scale Evaluation Study of Association Measures for Collocation Identification. In Iztok K, Carole T, Miloš J, Jelena K, Simon K, and Vít B (Eds.), Electronic Lexicography in the 21st Century. Proceedings of the eLex 2017 Conference (pp. 531–549). Leiden, NL: Brno: Lexical Computing.

MLA:
Evert, Stefan, et al. "E-VIEW-Alation – a Large-Scale Evaluation Study of Association Measures for Collocation Identification." Proceedings of the eLex 2017, Leiden Ed. Iztok K, Carole T, Miloš J, Jelena K, Simon K, and Vít B, Brno: Lexical Computing, 2017. 531–549.

BibTeX: 

Zuletzt aktualisiert 2018-11-08 um 00:08