Wind S, Sopa J, Truhn D, Lotfinia M, Nguyen TT, Bressem K, Adams L, Rusu M, Köstler H, Wellein G, Maier A, Tayebi Arasteh S (2025)
Publication Type: Journal article
Publication year: 2025
Book Volume: 8
Article Number: 790
Journal Issue: 1
URI: https://www.nature.com/articles/s41746-025-02250-5
DOI: 10.1038/s41746-025-02250-5
Open Access Link: https://www.nature.com/articles/s41746-025-02250-5
Large language models (LLMs) show promise for radiology decision support, yet conventional retrieval-augmented generation (RAG) relies on single-step retrieval and struggles with complex reasoning. We introduce radiology Retrieval and Reasoning (RaR), a multi-step retrieval framework that iteratively summarizes clinical questions, retrieves evidence, and synthesizes answers. We evaluated 25 LLMs spanning general-purpose, reasoning-optimized, and clinically fine-tuned models (0.5B → 670B parameters) on 104 expert-curated radiology questions and an independent set of 65 real radiology board-exam questions. RaR significantly improved mean diagnostic accuracy versus zero-shot prompting (75% vs. 67%; P = 1.1 × 10−7) and conventional online RAG (75% vs. 69%; P = 1.9 × 10−6). Gains were largest in mid-sized and small models (e.g., Mistral Large: 72% → 81%), while very large models showed minimal change. RaR reduced hallucinations and provided clinically relevant evidence in 46% of cases, improving factual grounding. These results show that multi-step retrieval enhances diagnostic reliability, especially in deployable mid-sized LLMs. Code, datasets, and RaR are publicly available.
APA:
Wind, S., Sopa, J., Truhn, D., Lotfinia, M., Nguyen, T.-T., Bressem, K.,... Tayebi Arasteh, S. (2025). Multi-step retrieval and reasoning improves radiology question answering with large language models. npj Digital Medicine, 8(1). https://doi.org/10.1038/s41746-025-02250-5
MLA:
Wind, Sebastian, et al. "Multi-step retrieval and reasoning improves radiology question answering with large language models." npj Digital Medicine 8.1 (2025).
BibTeX: Download