Evaluation of Domain-Specific Word Vectors for Biomedical Word Sense Disambiguation

Toddenroth D (2022)


Publication Type: Book chapter / Article in edited volumes

Publication year: 2022

Publisher: IOS Press

Edited Volumes: Healthcare of the Future 2022

Series: Studies in Health Technology and Informatics

Book Volume: 292

Pages Range: 23-27

ISBN: 978-1-64368-281-5

DOI: 10.3233/SHTI220314

Abstract

Among medical applications of natural language processing (NLP), word sense disambiguation (WSD) estimates alternative meanings from text around homonyms. Recently developed NLP methods include word vectors that combine easy computability with nuanced semantic representations. Here we explore the utility of simple linear WSD classifiers based on aggregating word vectors from a modern biomedical NLP library in homonym contexts. We evaluated eight WSD tasks that consider literature abstracts as textual contexts. Discriminative performance was measured in held-out annotations as the median area under sensitivity-specificity curves (AUC) across tasks and 200 bootstrap repetitions. We find that classifiers trained on domain-specific vectors outperformed those from a general language model by 4.0 percentage points, and that a preprocessing step of filtering stopwords and punctuation marks enhanced discrimination by another 0.7 points. The best models achieved a median AUC of 0.992 (interquartile range 0.975 - 0.998). These improvements suggest that more advanced WSD methods might also benefit from leveraging domain-specific vectors derived from large biomedical corpora.

Authors with CRIS profile

How to cite

APA:

Toddenroth, D. (2022). Evaluation of Domain-Specific Word Vectors for Biomedical Word Sense Disambiguation. In Thomas Bürkle, Kerstin Denecke, Jürgen Holm, Murat Sariyar, Michael Lehmann (Eds.), Healthcare of the Future 2022. (pp. 23-27). IOS Press.

MLA:

Toddenroth, Dennis. "Evaluation of Domain-Specific Word Vectors for Biomedical Word Sense Disambiguation." Healthcare of the Future 2022. Ed. Thomas Bürkle, Kerstin Denecke, Jürgen Holm, Murat Sariyar, Michael Lehmann, IOS Press, 2022. 23-27.

BibTeX: Download