Benchmarking vision-language models for diagnostics in emergency and critical care settings

Kurz CF, Merzhevich T, Eskofier B, Kather JN, Gmeiner B (2025)


Publication Type: Journal article

Publication year: 2025

Journal

Book Volume: 8

Article Number: 423

Journal Issue: 1

DOI: 10.1038/s41746-025-01837-2

Abstract

The applicability of vision-language models (VLMs) for acute care in emergency and intensive care units remains underexplored. Using a multimodal dataset of diagnostic questions involving medical images and clinical context, we benchmarked several small open-source VLMs against GPT-4o. While open models demonstrated limited diagnostic accuracy (up to 40.4%), GPT-4o significantly outperformed them (68.1%). Findings highlight the need for specialized training and optimization to improve open-source VLMs for acute care applications.

Authors with CRIS profile

Involved external institutions

How to cite

APA:

Kurz, C.F., Merzhevich, T., Eskofier, B., Kather, J.N., & Gmeiner, B. (2025). Benchmarking vision-language models for diagnostics in emergency and critical care settings. npj Digital Medicine, 8(1). https://doi.org/10.1038/s41746-025-01837-2

MLA:

Kurz, Christoph F., et al. "Benchmarking vision-language models for diagnostics in emergency and critical care settings." npj Digital Medicine 8.1 (2025).

BibTeX: Download