Conference contribution
(Original article)


E-VIEW-Alation – a Large-Scale Evaluation Study of Association Measures for Collocation Identification


Publication Details
Author(s): Evert S, Uhrig P, Bartsch S, Proisl T
Editor(s): Iztok K, Carole T, Miloš J, Jelena K, Simon K, and Vít B
Publisher: Lexical Computing
Publishing place: Brno
Publication year: 2017
Conference Proceedings Title: Electronic Lexicography in the 21st Century. Proceedings of the eLex 2017 Conference
Pages range: 531–549
ISSN: 2533-5626

Event details
Event: eLex 2017
Event location: Leiden
Start date of the event: 19/09/2017
End date of the event: 21/09/2017
Language: English

Abstract

Statistical association measures (AM) play an important role in the automatic extraction of collocations and multiword expressions from corpora, but many parameters governing their performance are still poorly understood. Systematic evaluation studies have produced conflicting recommendations for an optimal AM, and little attention has been paid to other parameters such as the underlying corpus, the size of the co-occurrence context, or the application of a frequency threshold.

Our paper presents the results of a large-scale evaluation study covering 13 corpora, eight context sizes, four frequency thresholds, and 20 AMs against two different gold standards of lexical collocations. While the optimal choice of an AM depends strongly on the particular gold standard used, other parameters prove much more robust: (i) small co-occurrence contexts are better than larger spans, and the best results are usually obtained from syntactic dependencies; (ii) corpus quality is more important than sheer size, but large Web corpora prove to be a valid substitute for the British National Corpus; (iii) frequency thresholds seem to be unnecessary in most situations, as the statistical AMs successfully weed out rare and unreliable candidates; (iv) there is little interaction between the choice of AM and the other parameters.

In order to provide complete evidence for our observations to readers, we created an interactive Web-based application that allows users to manipulate all evaluation parameters and dynamically updates evaluation graphs and summaries.



How to cite
APA: Evert, S., Uhrig, P., Bartsch, S., & Proisl, T. (2017). E-VIEW-Alation – a Large-Scale Evaluation Study of Association Measures for Collocation Identification. In Iztok K, Carole T, Miloš J, Jelena K, Simon K, and Vít B (Eds.), Electronic Lexicography in the 21st Century. Proceedings of the eLex 2017 Conference (pp. 531–549). Leiden, NL: Brno: Lexical Computing.

MLA: Evert, Stefan, et al. "E-VIEW-Alation – a Large-Scale Evaluation Study of Association Measures for Collocation Identification." Proceedings of the eLex 2017, Leiden Ed. Iztok K, Carole T, Miloš J, Jelena K, Simon K, and Vít B, Brno: Lexical Computing, 2017. 531–549.

BibTeX: Download
Share link
Last updated on 2018-04-19 at 04:33
PDF downloaded successfully