Toward Foundation Detection Models via Example-Based Grounding in EM-DETR

Bhat S, Georgescu B, Mansoor A, Ghesu FC, Grbic S, Maier A (2025)


Publication Language: English

Publication Type: Conference contribution, Conference Contribution

Publication year: 2025

Publisher: Springer

Conference Proceedings Title: 10th International Conference on Computer Vision & Image Processing

Event location: Punjab, India IN

Abstract

Deep Learning (DL)-based Computer-Aided Diagnostic (CAD)

systems have shown great promise in supporting clinicians, but their

performance heavily depends on the availability of large, well-annotated

datasets. In medical lesion detection, the scarcity of comprehensive, stan-

dardized annotations and the heterogeneity of existing datasets limit the

development of robust foundation models. In this work, we introduce

a novel approach to building a multi-domain foundation model for le-

sion detection by integrating diverse medical image datasets annotated

for both anatomical structures and abnormalities. We first propose a

student-teacher training strategy to effectively combine datasets with

heterogeneous label spaces, mitigating catastrophic forgetting, and im-

proving feature learning. In addition, building on our previously proposed

example-based EM-DETR framework, we adapt it to jointly learn from

multiple domain-specific datasets, enabling fine-grained, example-driven

detection across modalities. Our model achieves state-of-the-art (SOTA)

results, with image-level AUC scores exceeding previous SOTA by 1%

point and on four key findings in CXRs.We achieve a leading mAP50

in CXRs and mammography, surpassing EM-DETR trained on a single

dataset by 2-3% points. This approach paves the way for scalable foun-

dation models in medical imaging that leverage heterogeneous data while

maintaining robust and generalizable performance.

Authors with CRIS profile

Involved external institutions

How to cite

APA:

Bhat, S., Georgescu, B., Mansoor, A., Ghesu, F.-C., Grbic, S., & Maier, A. (2025). Toward Foundation Detection Models via Example-Based Grounding in EM-DETR. In 10th International Conference on Computer Vision & Image Processing. Punjab, India, IN: Springer.

MLA:

Bhat, Sheethal, et al. "Toward Foundation Detection Models via Example-Based Grounding in EM-DETR." Proceedings of the Computer Vision and Image Processing, Punjab, India Springer, 2025.

BibTeX: Download