Proisl T, Dykes N, Heinrich P, Kabashi B, Blombach A, Evert S (2020)
Publication Type: Conference contribution, Original article
Publication year: 2020
Publisher: European Language Resources Association (ELRA)
Pages Range: 6142-6148
Conference Proceedings Title: LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings
ISBN: 9791095546344
URI: https://www.aclweb.org/anthology/2020.lrec-1.754
Open Access Link: https://www.aclweb.org/anthology/2020.lrec-1.754
The EmpiriST corpus (Beißwenger et al., 2016) is a manually tokenized and part-of-speech tagged corpus of approximately 23,000 tokens of German Web and CMC (computer-mediated communication) data. We extend the corpus with manually created annotation layers for word form normalization, lemmatization and lexical semantics. All annotations have been independently performed by multiple human annotators. We report inter-annotator agreements and results of baseline systems and state-of-the-art off-the-shelf tools.
APA:
Proisl, T., Dykes, N., Heinrich, P., Kabashi, B., Blombach, A., & Evert, S. (2020). EmpiriST Corpus 2.0: Adding Manual Normalization, Lemmatization and Semantic Tagging to a German Web and CMC Corpus. In Nicoletta Calzolari, Frederic Bechet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis (Eds.), LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings (pp. 6142-6148). Marseille, FR: European Language Resources Association (ELRA).
MLA:
Proisl, Thomas, et al. "EmpiriST Corpus 2.0: Adding Manual Normalization, Lemmatization and Semantic Tagging to a German Web and CMC Corpus." Proceedings of the 12th International Conference on Language Resources and Evaluation, LREC 2020, Marseille Ed. Nicoletta Calzolari, Frederic Bechet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, European Language Resources Association (ELRA), 2020. 6142-6148.
BibTeX: Download