Hümmer C, Delcroix M, Ogawa A, Kinoshita K, Nakatani T, Kellermann W (2017)
Publication Language: English
Publication Type: Conference contribution
Publication year: 2017
Pages Range: 4875-4879
ISBN: 978-1-5090-4117-6
DOI: 10.1109/ICASSP.2017.7953083
We propose a new concept for adapting CNN-based acoustic models using spatial diffuseness features as auxiliary information about the acoustic environment: the spatial diffuseness features are simultaneously employed as acoustic-model input features and to estimate environmental cues for context adaptation, where one convolutional layer is factorized into several sub-layers to represent different acoustic conditions. This context-adaptive CNN-based acoustic model facilitates an online environmental adaptation and is experimentally verified for the real-world recordings provided by the CHiME-3 task. The best performing setup reduces the average word error rate scores achieved by the baseline system (without using spatial diffuseness features) from 19.4% to 15.9% and 12.2% to 10.7% considering two experimental setups with and without front-end signal enhancement, respectively.
APA:
Hümmer, C., Delcroix, M., Ogawa, A., Kinoshita, K., Nakatani, T., & Kellermann, W. (2017). Online environmental adaptation of CNN-based acoustic models using spatial diffuseness features. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4875-4879). New Orleans, US.
MLA:
Hümmer, Christian, et al. "Online environmental adaptation of CNN-based acoustic models using spatial diffuseness features." Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans 2017. 4875-4879.
BibTeX: Download