Leveraging High-Precision Corpus Queries for Text Classification via Large Language Models

Dykes N, Evert S, Heinrich P, Humml M, Schröder L (2024)


Publication Type: Conference contribution

Publication year: 2024

Publisher: ELRA and ICCL

City/Town: Torino, Italy

Pages Range: 52--57

Conference Proceedings Title: Proceedings of the First Workshop on Language-driven Deliberation Technology (DELITE) @ LREC-COLING 2024

Event location: Torino, Italy

URI: https://aclanthology.org/2024.delite-1.7

Abstract

We use query results from manually designed corpus queries for fine-tuning an LLM to identify argumentative fragments as a text mining task. The resulting model outperforms both an LLM fine-tuned on a relatively large manually annotated gold standard of tweets as well as a rule-based approach. This proof-of-concept study demonstrates the usefulness of corpus queries to generate training data for complex text categorisation tasks, especially if the targeted category has low prevalence (so that a manually annotated gold standard contains only a small number of positive examples).

Authors with CRIS profile

Related research project(s)

How to cite

APA:

Dykes, N., Evert, S., Heinrich, P., Humml, M., & Schröder, L. (2024). Leveraging High-Precision Corpus Queries for Text Classification via Large Language Models. In Hautli-Janisz A, Lapesa G, Anastasiou L, Gold V, Liddo AD, Reed C (Eds.), Proceedings of the First Workshop on Language-driven Deliberation Technology (DELITE) @ LREC-COLING 2024 (pp. 52--57). Torino, Italy: Torino, Italy: ELRA and ICCL.

MLA:

Dykes, Nathan, et al. "Leveraging High-Precision Corpus Queries for Text Classification via Large Language Models." Proceedings of the First Workshop on Language-driven Deliberation Technology (DELITE) @ LREC-COLING 2024, Torino, Italy Ed. Hautli-Janisz A, Lapesa G, Anastasiou L, Gold V, Liddo AD, Reed C, Torino, Italy: ELRA and ICCL, 2024. 52--57.

BibTeX: Download