Comparative Evaluation of Pre-Trained Language Models for Biomedical Information Retrieval

Weber F, Toddenroth D (2024)


Publication Type: Conference contribution

Publication year: 2024

Journal

Publisher: IOS Press BV

Book Volume: 316

Pages Range: 827-831

Conference Proceedings Title: Studies in Health Technology and Informatics

Event location: Athens GR

ISBN: 9781643685335

DOI: 10.3233/SHTI240539

Abstract

Finding relevant information in the biomedical literature increasingly depends on efficient information retrieval (IR) algorithms. Cross-Encoders, SentenceBERT, and ColBERT are algorithms based on pre-trained language models that use nuanced but computable vector representations of search queries and documents for IR applications. Here we investigate how well these vectorization algorithms estimate relevance labels of biomedical documents for search queries using the OHSUMED dataset. For our evaluation, we compared computed scores to provided labels by using boxplots and Spearman's rank correlations. According to these metrics, we found that Sentence-BERT moderately outperformed the alternative vectorization algorithms and that additional fine-tuning based on a subset of OHSUMED labels yielded little additional benefit. Future research might aim to develop a larger dedicated dataset in order to optimize such methods more systematically, and to evaluate the corresponding functions in IR tools with end-users.

Authors with CRIS profile

How to cite

APA:

Weber, F., & Toddenroth, D. (2024). Comparative Evaluation of Pre-Trained Language Models for Biomedical Information Retrieval. In John Mantas, Arie Hasman, George Demiris, Kaija Saranto, Michael Marschollek, Theodoros N. Arvanitis, Ivana Ognjanovic, Arriel Benis, Parisis Gallos, Emmanouil Zoulias, Elisavet Andrikopoulou (Eds.), Studies in Health Technology and Informatics (pp. 827-831). Athens, GR: IOS Press BV.

MLA:

Weber, Franziska, and Dennis Toddenroth. "Comparative Evaluation of Pre-Trained Language Models for Biomedical Information Retrieval." Proceedings of the 34th Medical Informatics Europe Conference, MIE 2024, Athens Ed. John Mantas, Arie Hasman, George Demiris, Kaija Saranto, Michael Marschollek, Theodoros N. Arvanitis, Ivana Ognjanovic, Arriel Benis, Parisis Gallos, Emmanouil Zoulias, Elisavet Andrikopoulou, IOS Press BV, 2024. 827-831.

BibTeX: Download