Collocation Candidate Extraction from Dependency-Annotated Corpora: Exploring Differences across Parsers and Dependency Annotation Schemes

Article in Edited Volumes
(Essay)


Publication Details

Author(s): Uhrig P, Evert S, Proisl T
Editor(s): Cantos-Gómez P, Almela-Sánchez M
Title edited volumes: Lexical Collocation Analysis: Advances and Applications
Publisher: Springer International Publishing
Publishing place: Cham
Publication year: 2018
Pages range: 111–140
ISBN: 978-3-319-92582-0
Language: English


Abstract

Collocation candidate extraction from dependency-annotated corpora has become more and more mainstream in collocation research over the past years. In most studies, however, the results of one parser are compared to those of relatively “dumb” window-based approaches only. To date, the impact of the parser used and its parsing scheme has not been studied systematically to the best of our knowledge. This chapter evaluates a total of 8 parsers on 2 corpora with 20 different association measures plus several frequency thresholds for 6 different types of collocations against the Oxford Collocations Dictionary for Students of English (2nd edition; 2009). We find that the parser and parsing scheme both play a role in the quality of the collocation candidate extraction. The performance of different parsers can differ substantially across different collocation types. The filters used to extract different types of collocations from the corpora also play an important role in the trade-off between precision and recall we can observe. Furthermore, we find that carefully sampled and balanced corpora (such as the BNC) seem to have considerable advantages in precision, but of course for total coverage, larger, less balanced corpora (such as the web corpus used in this study) take the lead. Overall, log-likelihood is the best association measure, but for some specific types of collocation (such as adjective-noun or verb-adverb), other measures perform even better.


FAU Authors / FAU Editors

Evert, Stefan Prof. Dr.
Lehrstuhl für Korpus- und Computerlinguistik
Proisl, Thomas
Lehrstuhl für Korpus- und Computerlinguistik
Uhrig, Peter Dr.
Lehrstuhl für Anglistik, insbesondere Linguistik


How to cite

APA:
Uhrig, P., Evert, S., & Proisl, T. (2018). Collocation Candidate Extraction from Dependency-Annotated Corpora: Exploring Differences across Parsers and Dependency Annotation Schemes. In Cantos-Gómez P, Almela-Sánchez M (Eds.), Lexical Collocation Analysis: Advances and Applications (pp. 111–140). Cham: Springer International Publishing.

MLA:
Uhrig, Peter, Stefan Evert, and Thomas Proisl. "Collocation Candidate Extraction from Dependency-Annotated Corpora: Exploring Differences across Parsers and Dependency Annotation Schemes." Lexical Collocation Analysis: Advances and Applications Ed. Cantos-Gómez P, Almela-Sánchez M, Cham: Springer International Publishing, 2018. 111–140.

BibTeX: 

Last updated on 2018-27-08 at 16:38