Dykes N, Evert S, Heinrich P, Humml M, Schröder L (2024)
Publication Type: Conference contribution
Publication year: 2024
Publisher: ELRA and ICCL
City/Town: Torino, Italy
Pages Range: 52--57
Conference Proceedings Title: Proceedings of the First Workshop on Language-driven Deliberation Technology (DELITE) @ LREC-COLING 2024
Event location: Torino, Italy
URI: https://aclanthology.org/2024.delite-1.7
We use query results from manually designed corpus queries for fine-tuning an LLM to identify argumentative fragments as a text mining task. The resulting model outperforms both an LLM fine-tuned on a relatively large manually annotated gold standard of tweets as well as a rule-based approach. This proof-of-concept study demonstrates the usefulness of corpus queries to generate training data for complex text categorisation tasks, especially if the targeted category has low prevalence (so that a manually annotated gold standard contains only a small number of positive examples).
APA:
Dykes, N., Evert, S., Heinrich, P., Humml, M., & Schröder, L. (2024). Leveraging High-Precision Corpus Queries for Text Classification via Large Language Models. In Hautli-Janisz A, Lapesa G, Anastasiou L, Gold V, Liddo AD, Reed C (Eds.), Proceedings of the First Workshop on Language-driven Deliberation Technology (DELITE) @ LREC-COLING 2024 (pp. 52--57). Torino, Italy: Torino, Italy: ELRA and ICCL.
MLA:
Dykes, Nathan, et al. "Leveraging High-Precision Corpus Queries for Text Classification via Large Language Models." Proceedings of the First Workshop on Language-driven Deliberation Technology (DELITE) @ LREC-COLING 2024, Torino, Italy Ed. Hautli-Janisz A, Lapesa G, Anastasiou L, Gold V, Liddo AD, Reed C, Torino, Italy: ELRA and ICCL, 2024. 52--57.
BibTeX: Download