MammoBLIP: End-to-End Mammography Report Generation with Vision-Language Models and Public Multi-Institutional Datasets

Bhandary Panambur A, Wind S, Bayer S, Maier A (2025)

Publication Type: Conference contribution

Publication year: 2025

Publisher: IEEE

Pages Range: 1-1

Conference Proceedings Title: 2025 IEEE Nuclear Science Symposium (NSS), Medical Imaging Conference (MIC) and Room Temperature Semiconductor Detector Conference (RTSD)

Event location: Yokohama

DOI: 10.1109/NSS/MIC/RTSD57106.2025.11286311

Abstract

Breast cancer is the most frequently diagnosed malignancy among women globally and presents a growing challenge for healthcare systems. While early detection and accurate lesion assessment are vital, traditional radiology workflows are labor-intensive and prone to variability. Existing AI solutions in breast imaging are task-specific and lack the ability to generate comprehensive radiology reports. Vision-language models (VLMs), such as CLIP and BLIP, offer a promising direction for unifying visual understanding with natural language generation. We propose MammoBLIP, an end-to-end framework for automated mammography report generation based on the MedBLIP architecture. We curated a dataset of 81,076 images from five public sources-VinDR-Mammo, RSNA, CMMD, InBreast, and KAU—with standardized clinical annotations. Images are processed through a frozen EVA-CLIP Vision Transformer, followed by a lightweight transformer and projection layers to produce vision embeddings. These are aligned with text embeddings using a contrastive loss. A GPT-2-based BioMedLM model generates reports conditioned on these visual features. Only the transformer and projection heads are trained, keeping backbone models frozen to ensure computational efficiency. MammoBLIP achieves strong results across datasets, with an overall BLEU score of 65.36, ROUGE-1 of 0.75, BERT-F1 of 0.88, and SBERT similarity of 0.91. The generated reports are clinically coherent, with an average Flesch-Kincaid grade of ∼ 7. These findings provide initial insights into automated report generation and lay the groundwork for developing robust vision-language models tailored to clinical imaging tasks. Our results highlight MammoBLIP's potential to streamline workflows, enhance diagnostic consistency, and aid in radiologist training in future applications.

Authors with CRIS profile

Adarsh Bhandary Panambur Lehrstuhl für Informatik 5 (Mustererkennung) Sebastian Wind Lehrstuhl für Informatik 5 (Mustererkennung) Siming Bayer Lehrstuhl für Informatik 5 (Mustererkennung) Andreas Maier Lehrstuhl für Informatik 5 (Mustererkennung)

How to cite

APA:

Bhandary Panambur, A., Wind, S., Bayer, S., & Maier, A. (2025). MammoBLIP: End-to-End Mammography Report Generation with Vision-Language Models and Public Multi-Institutional Datasets. In 2025 IEEE Nuclear Science Symposium (NSS), Medical Imaging Conference (MIC) and Room Temperature Semiconductor Detector Conference (RTSD) (pp. 1-1). Yokohama, JP: IEEE.

MLA:

Bhandary Panambur, Adarsh, et al. "MammoBLIP: End-to-End Mammography Report Generation with Vision-Language Models and Public Multi-Institutional Datasets." Proceedings of the IEEE Symposium on Nuclear Science (NSS/MIC), Yokohama IEEE, 2025. 1-1.

BibTeX: Download