Retina-RAG: Retrieval-Augmented Vision-Language Modeling for Joint Retinal Diagnosis and Clinical Report Generation

Zaian A, Bhat S, Abdalkader M, Maier A (2026)

Publication Language: English

Publication Status: Submitted

Publication Type: Unpublished / Preprint

Future Publication Type: Conference contribution

Publication year: 2026

URI: https://arxiv.org/abs/2605.06173

Open Access Link: https://arxiv.org/abs/2605.06173

Abstract

Diabetic Retinopathy (DR) is a leading cause of preventable blindness among working-age adults worldwide, yet most automated screening systems are limited to image-level classification and lack clinically structured reporting. We propose Retina-RAG, a low-cost modular framework that jointly performs DR severity grading, macular edema (ME) detection, and report generation. The architecture decouples a high-performance retinal classifier and a parameter-efficient vision-language model (Qwen2.5-VL-7B-Instruct) adapted via Low-Rank Adaptation (LoRA), enabling flexible component integration. A retrieval-augmented generation (RAG) module injects curated ophthalmic knowledge together with structured classifier outputs at inference time to improve diagnostic consistency and reduce hallucinations. Retina-RAG achieves an F1-score of 0.731 for DR grading and 0.948 for ME detection, substantially outperforming zero-shot Qwen (0.096, 0.732) and MMed-RAG (0.541, 0.641) on a retinal disease detection dataset with captions. For report generation, Retina-RAG attains ROUGE-L 0.438 and SBERT similarity 0.884, exceeding all baselines. The full framework operates on a single consumer-grade GPU, demonstrating that clinically structured retinal AI can be achieved with modest computational resources.

Authors with CRIS profile

Sheethal Bhat Lehrstuhl für Informatik 14 (Bild- und Sprachverarbeitung) (LME) Andreas Maier Lehrstuhl für Informatik 5 (Mustererkennung)

How to cite

APA:

Zaian, A., Bhat, S., Abdalkader, M., & Maier, A. (2026). Retina-RAG: Retrieval-Augmented Vision-Language Modeling for Joint Retinal Diagnosis and Clinical Report Generation. (Unpublished, Submitted).

MLA:

Zaian, Abdelrehman, et al. Retina-RAG: Retrieval-Augmented Vision-Language Modeling for Joint Retinal Diagnosis and Clinical Report Generation. Unpublished, Submitted. 2026.

BibTeX: Download