Mapping sounds onto images using binaural spectrograms

Conference contribution

Publication Details

Author(s): Deleforge A, Drouard V, Girin L, Horaud R
Publication year: 2014
Pages range: 2470-2474
ISBN: 978-0-9928-6261-9
ISSN: 2076-1465
Language: English


We propose a novel method for mapping sound spectrograms onto images and thus enabling alignment between auditory and visual features for subsequent multimodal processing. We suggest a supervised learning approach to this audio-visual fusion problem, on the following grounds. Firstly, we use a Gaussian mixture of locally-linear regressions to learn a mapping from image locations to binaural spectrograms. Secondly, we derive a closed-form expression for the conditional posterior probability of an image location, given both an observed spectrogram, emitted from an unknown source direction, and the mapping parameters that were previously learnt. Prominently, the proposed method is able to deal with completely different spectrograms for training and for alignment. While fixed-length wide-spectrum sounds are used for learning, thus fully and robustly estimating the regression, variable-length sparse-spectrum sounds, e.g., speech, are used for alignment. The proposed method successfully extracts the image location of speech utterances in realistic reverberant-room scenarios.

FAU Authors / FAU Editors

Deleforge, Antoine
Professur für Nachrichtentechnik

How to cite

Deleforge, A., Drouard, V., Girin, L., & Horaud, R. (2014). Mapping sounds onto images using binaural spectrograms. In Proceedings of the 22nd European Signal Processing Conference (EUSIPCO) (pp. 2470-2474). Lisbon, PT.

Deleforge, Antoine, et al. "Mapping sounds onto images using binaural spectrograms." Proceedings of the 22nd European Signal Processing Conference (EUSIPCO), Lisbon 2014. 2470-2474.


Last updated on 2019-20-04 at 18:38