Barfuß H, Hümmer C, Schwarz A, Kellermann W (2017)
Publication Language: English
Publication Type: Journal article, Original article
Publication year: 2017
Book Volume: 46
Pages Range: 388 - 400
Journal Issue: 2017
URI: https://arxiv.org/abs/1604.03393
DOI: 10.1016/j.csl.2017.02.005
Speech recognition in adverse real-world environments is highly affected by reverberation and non-stationary background noise. A well-known strategy to reduce such undesired signal components in multi-microphone scenarios is spatial filtering of the microphone signals. In this article, we demonstrate that an additional coherence-based postfilter, which is applied to the beamformer output signal to remove diffuse interference components from the latter, is an effective means to further improve the recognition accuracy of modern deep learning speech recognition systems. To this end, the 3rd CHiME Speech Separation and Recognition Challenge (CHiME-3) baseline speech enhancement system is extended by a coherence-based postfilter and the postfilter’s impact on the Word Error Rates (WERs) of a state-of-the-art automatic speech recognition system is investigated for the realistic noisy environments provided by CHiME-3. To determine the time- and frequency-dependent postfilter gains, we use Direction-of-Arrival (DOA)-dependent and DOA-independent estimators of the coherent-to-diffuse power ratio as an approximation of the short-time signal-to-noise ratio. Our experiments show that incorporating coherence-based postfiltering into the CHiME-3 baseline speech enhancement system leads to a significant reduction of the WERs, with relative improvements of up to 11.31%.
APA:
Barfuß, H., Hümmer, C., Schwarz, A., & Kellermann, W. (2017). Robust coherence-based spectral enhancement for speech recognition in adverse real-world environments. Computer Speech and Language, 46(2017), 388 - 400. https://doi.org/10.1016/j.csl.2017.02.005
MLA:
Barfuß, Hendrik, et al. "Robust coherence-based spectral enhancement for speech recognition in adverse real-world environments." Computer Speech and Language 46.2017 (2017): 388 - 400.
BibTeX: Download