MM-DETR: Emulating the Diagnostic Clinical Workflow in Multi-view Multi-modal Mammography Mass Detection

Elbarbary K, Bhandary Panambur A, Bhat S, Bayer S, Maier A (2025)

Publication Language: English

Publication Type: Conference contribution, Conference Contribution

Publication year: 2025

Publisher: Springer, Cham

Series: Deep Breast Workshop on AI and Imaging for Diagnostic and Treatment Challenges in Breast Care

Book Volume: 16142

Pages Range: 258–267

Conference Proceedings Title: Artificial Intelligence and Imaging for Diagnostic and Treatment Challenges in Breast Care

Event location: Daejeon, South Korea

ISBN: 978-3-032-05558-3

URI: https://link.springer.com/chapter/10.1007/978-3-032-05559-0_26

DOI: 10.1007/978-3-032-05559-0_26

Open Access Link: https://link.springer.com/chapter/10.1007/978-3-032-05559-0_26

Abstract

Accurate mass detection in dense breast tissue remains a critical challenge in mammographic screening, as overlapping structures often obscure subtle lesions. We observe that a multi-modal vision-language detection framework with strong pretraining over natural images demonstrates remarkable state-of-the-art (SOTA) performance in lesion detection tasks via transfer learning. However, current SOTA does not take advantage of the entire context available to radiologists. Typical mammogram examinations contain two views of each breast for a given patient, and certain lesions are visible in one view, while hidden by dense breast tissue in another. We propose a novel Multi-view Multi-modal DETR object detection framework that trains a network to detect lesions after considering both views, thus emulating the workflow of a radiologist. Specifically, our method incorporates a bidirectional cross-attention fusion module to integrate relevant information from craniocaudal (CC) and mediolateral oblique (MLO) views simultaneously, reinforcing lesion-specific signals and aiding detection of masses that may be obscured by dense tissue. We evaluate our proposal on the public VinDR-Mammo dataset, achieving significant improvements in mass detection with reduced false negatives. Our method reaches a mass detection mAP of 0.654, outperforming Mammo-CLIP (0.580) by an absolute margin of 12.8%. It also reduces the false negative rate in DENSITY C cases by 5.9% compared to the single-view baseline, highlighting the clinical value of our method. The code is available at https://github.com/MMDETR/MM-DETR.

Authors with CRIS profile

Adarsh Bhandary Panambur Lehrstuhl für Informatik 5 (Mustererkennung) Sheethal Bhat Lehrstuhl für Informatik 14 (Bild- und Sprachverarbeitung) (LME) Siming Bayer Lehrstuhl für Informatik 5 (Mustererkennung) Andreas Maier Lehrstuhl für Informatik 5 (Mustererkennung)

How to cite

APA:

Elbarbary, K., Bhandary Panambur, A., Bhat, S., Bayer, S., & Maier, A. (2025). MM-DETR: Emulating the Diagnostic Clinical Workflow in Multi-view Multi-modal Mammography Mass Detection. In Artificial Intelligence and Imaging for Diagnostic and Treatment Challenges in Breast Care (pp. 258–267). Daejeon, South Korea, KR: Springer, Cham.

MLA:

Elbarbary, Karim, et al. "MM-DETR: Emulating the Diagnostic Clinical Workflow in Multi-view Multi-modal Mammography Mass Detection." Proceedings of the MICCAI Deep Breath Workshop, Daejeon, South Korea Springer, Cham, 2025. 258–267.

BibTeX: Download