Harmonizing organ-at-risk structure names using open-source large language models

Thummerer A, Maspero M, van der Bijl E, Corradini S, Belka C, Landry G, Kurz C (2025)

Publication Type: Journal article

Publication year: 2025

Journal

Physics and Imaging in Radiation Oncology Elsevier

Book Volume: 35

Article Number: 100813

DOI: 10.1016/j.phro.2025.100813

Abstract

Background and purpose: Standardized radiotherapy structure nomenclature is crucial for automation, inter-institutional collaborations, and large-scale deep learning studies in radiation oncology. Despite the availability of nomenclature guidelines (AAPM-TG-263), their implementation is lacking and still faces challenges. This study evaluated open-source large language models (LLMs) for automated organ-at-risk (OAR) renaming on a multi-institutional and multilingual dataset. Materials and methods: Four open-source LLMs (Llama 3.3, Llama 3.3 R1, DeepSeek V3, DeepSeek R1) were evaluated using a dataset of 34,177 OAR structures from 1684 patients collected at three university medical centers with manual TG-263 ground-truth labels. LLM renaming was performed using a few-shot prompting technique, including detailed instructions and generic examples. Performance was assessed by calculating renaming accuracy on the entire dataset and a unique dataset (duplicates removed). In addition, we performed a failure analysis, prompt-based confidence correlation, and Monte Carlo sampling-based uncertainty estimation. Results: High renaming accuracy was achieved, with the reasoning-enhanced DeepSeek R1 model performing best (98.6 % unique accuracy, 99.9 % overall accuracy). Overall, reasoning models outperformed their non-reasoning counterparts. Monte Carlo sampling showed a stronger correlation with prediction errors (correlation coefficient of 0.70 for DeepSeek R1) and better error detection (Sensitivity 0.73, Specificity 1.0 for DeepSeek R1) compared to prompt-based confidence estimation (correlation coefficient < 0.42). Conclusions: Open-source LLMs, particularly those with reasoning capabilities, can accurately harmonize OAR nomenclature according to TG-263 across diverse multilingual and multi-institutional datasets. They can also facilitate TG-263 nomenclature adoption and the creation of large, standardized datasets for research and AI development.

Involved external institutions

Klinikum der Universität München (LMU Klinikum)

Germany (DE) University Medical Centre Utrecht (UMC Utrecht)

Netherlands (NL) Radboud University Nijmegen Medical Centre / Radboudumc of voluit Radboud Universitair Medisch Centrum (UMC)

Netherlands (NL)

How to cite

APA:

Thummerer, A., Maspero, M., van der Bijl, E., Corradini, S., Belka, C., Landry, G., & Kurz, C. (2025). Harmonizing organ-at-risk structure names using open-source large language models. Physics and Imaging in Radiation Oncology, 35. https://doi.org/10.1016/j.phro.2025.100813

MLA:

Thummerer, Adrian, et al. "Harmonizing organ-at-risk structure names using open-source large language models." Physics and Imaging in Radiation Oncology 35 (2025).

BibTeX: Download