Collocation Candidate Extraction from Dependency-Annotated Corpora: Exploring Differences across Parsers and Dependency Annotation Schemes

Beitrag in einem Sammelwerk
(Aufsatz)


Details zur Publikation

Autorinnen und Autoren: Uhrig P, Evert S, Proisl T
Herausgeber: Cantos-Gómez P, Almela-Sánchez M
Titel Sammelwerk: Lexical Collocation Analysis: Advances and Applications
Verlag: Springer International Publishing
Verlagsort: Cham
Jahr der Veröffentlichung: 2018
Seitenbereich: 111–140
ISBN: 978-3-319-92582-0
Sprache: Englisch


Abstract

Collocation candidate extraction from dependency-annotated corpora has become more and more mainstream in collocation research over the past years. In most studies, however, the results of one parser are compared to those of relatively “dumb” window-based approaches only. To date, the impact of the parser used and its parsing scheme has not been studied systematically to the best of our knowledge. This chapter evaluates a total of 8 parsers on 2 corpora with 20 different association measures plus several frequency thresholds for 6 different types of collocations against the Oxford Collocations Dictionary for Students of English (2nd edition; 2009). We find that the parser and parsing scheme both play a role in the quality of the collocation candidate extraction. The performance of different parsers can differ substantially across different collocation types. The filters used to extract different types of collocations from the corpora also play an important role in the trade-off between precision and recall we can observe. Furthermore, we find that carefully sampled and balanced corpora (such as the BNC) seem to have considerable advantages in precision, but of course for total coverage, larger, less balanced corpora (such as the web corpus used in this study) take the lead. Overall, log-likelihood is the best association measure, but for some specific types of collocation (such as adjective-noun or verb-adverb), other measures perform even better.


FAU-Autorinnen und Autoren / FAU-Herausgeberinnen und Herausgeber

Evert, Stefan Prof. Dr.
Lehrstuhl für Korpus- und Computerlinguistik
Proisl, Thomas
Lehrstuhl für Korpus- und Computerlinguistik
Uhrig, Peter Dr.
Lehrstuhl für Anglistik, insbesondere Linguistik


Zitierweisen

APA:
Uhrig, P., Evert, S., & Proisl, T. (2018). Collocation Candidate Extraction from Dependency-Annotated Corpora: Exploring Differences across Parsers and Dependency Annotation Schemes. In Cantos-Gómez P, Almela-Sánchez M (Eds.), Lexical Collocation Analysis: Advances and Applications. (pp. 111–140). Cham: Springer International Publishing.

MLA:
Uhrig, Peter, Stefan Evert, and Thomas Proisl. "Collocation Candidate Extraction from Dependency-Annotated Corpora: Exploring Differences across Parsers and Dependency Annotation Schemes." Lexical Collocation Analysis: Advances and Applications. Ed. Cantos-Gómez P, Almela-Sánchez M, Cham: Springer International Publishing, 2018. 111–140.

BibTeX: 

Zuletzt aktualisiert 2019-01-01 um 11:10