Corpus tools and language technology


Organisation:
Lehrstuhl für Korpus- und Computerlinguistik

Description:


We develop algorithms and software tools for the automatic linguistic annotation, efficient indexing, flexible query and quantitative analysis of large text corpora. These tools form the basis of innovative research in the digital humanities as well as practical and commercial applications in language technology.


Related Project(s)


RANT: Reconstructing Arguments from Noisy Text (DFG Priority Programme 1999: RATIO)
Prof. Dr. Stefan Evert
(01/01/2018 - 31/12/2020)
RogTCS: RogTCS – text clustering for the analysis of open questions in market research
Prof. Dr. Stefan Evert
(03/06/2013)



Assigned publications

Go to first page Go to previous page 1 of 2 Go to next page Go to last page

Evert, S., Greiner, P., Baigger, F., & Lang, B. (2016). A Distributional Approach to Open Questions in Market Research. Computers in Industry, 78, 16-28. https://dx.doi.org/10.1016/j.compind.2015.10.008
Kabashi, B., & Proisl, T. (2016). A Proposal for a Part-of-Speech Tagset for the Albanian Language. In Calzolari Nicoletta, Choukri Khalid, Declerck Thierry, Grobelnik Marko, Maegaard Bente, Mariani Joseph, Moreno Asuncion, Odijk Jan, Piperidis Stelios (Eds.), Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016) (pp. 4305–4310). Portorož, SI: Paris: European Language Resources Association (ELRA).
Evert, S., Beißwenger, M., Bartsch, S., & Würzner, K.-M. (2016). EmpiriST 2015: A Shared Task on the Automatic Linguistic Annotation of Computer-Mediated Communication and Web Corpora. In Proceedings of the 10th Web as Corpus Workshop (WAC-X) and the EmpiriST Shared Task (pp. 44-56). Berlin, DE: Berlin, Germany.
Proisl, T., & Uhrig, P. (2016). SoMaJo: State-of-the-art tokenization for German web and social media texts. In Cook P, Evert S, Schäfer R, Stemle E (Eds.), Proceedings of the 10th Web as Corpus Workshop (WAC-X) and the EmpiriST Shared Task (pp. 57-62). Berlin, DE: Berlin: Association for Computational Linguistics (ACL).
Kabashi, B. (2015). Automatische Verarbeitung der Morphologie des Albanischen. Erlangen: FAU University Press.
Plotnikova, N., Kohl, M., Volkert, K., Lerner, A., Dykes, N., Ermer, H., & Evert, S. (2015). KLUEless: Polarity Classification and Association. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015) (pp. 619--625). Denver, Colorado.
Plotnikova, N., Lapesa, G., Proisl, T., & Evert, S. (2015). SemantiKLUE: Semantic Textual Similarity with Maximum Weight Matching. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015) (pp. 111--116). Denver, Colorado.
Evert, S., & Hardie, A. (2015). Ziggurat: A new data model and indexing format for large annotated text corpora. In Proceedings of the 3rd Workshop on the Challenges in the Management of Large Corpora (CMLC-3) (pp. 21--27). Lancaster, UK.
Evert, S., Proisl, T., Greiner, P., & Kabashi, B. (2014). SentiKLUE: Updating a polarity classifier in 48 hours. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval-2014) (pp. 551–555). Dublin, Ireland.
Proisl, T., & Uhrig, P. (2012). Efficient Dependency Graph Matching with the IMS Open Corpus Workbench. In Calzolari Nicoletta, Choukri Khalid, Declerck Thierry, Doğan Mehmet Uğur, Maegaard Bente, Mariani Joseph, Moreno Asuncion, Odijk Jan, Piperidis Stelios (Eds.), Proceedings (pp. 2750–2756). Istanbul, TR: Istanbul: European Language Resources Association (ELRA).

Last updated on 2018-24-10 at 15:24