Salin E, Ayache S, Favre B (2023)
Publication Type: Conference contribution
Publication year: 2023
Publisher: Institute of Electrical and Electronics Engineers Inc.
Pages Range: 339-352
Conference Proceedings Title: Proceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023
ISBN: 9798350307443
DOI: 10.1109/ICCVW60793.2023.00041
Vision-language foundation models have had considerable increase in performances in the last few years. However, there is still a lack comprehensive evaluation methods able to clearly explain their performances. We argue that a more systematic approach to foundation model evaluation would be beneficial to their use in real-world applications. In particular, we think that those models should be evaluated on a broad range of precise capabilities, in order to bring awareness to the width of their scope and their potential weaknesses. To that end, we propose a methodology to build a taxonomy of multimodal capabilities for vision-language foundation models. The proposed taxonomy is intended as a first step towards an exhaustive evaluation of vision-language foundation models.
APA:
Salin, E., Ayache, S., & Favre, B. (2023). Towards an Exhaustive Evaluation of Vision-Language Foundation Models. In Proceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023 (pp. 339-352). Paris, FR: Institute of Electrical and Electronics Engineers Inc..
MLA:
Salin, Emmanuelle, Stéphane Ayache, and Benoit Favre. "Towards an Exhaustive Evaluation of Vision-Language Foundation Models." Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023, Paris Institute of Electrical and Electronics Engineers Inc., 2023. 339-352.
BibTeX: Download