Robust coherence-based spectral enhancement for speech recognition in adverse real-world environments

Journal article
(Original article)


Publication Details

Author(s): Barfuß H, Hümmer C, Schwarz A, Kellermann W
Journal: Computer Speech and Language
Publication year: 2017
Journal issue: 46
Pages range: 388 - 400
ISSN: 0885-2308
Language: English


Abstract


Speech recognition in adverse real-world environments is highly affected by reverberation and non-stationary background noise. A well-known strategy to reduce such undesired signal components in multi-microphone scenarios is spatial filtering of the microphone signals. In this article, we demonstrate that an additional coherence-based postfilter, which is applied to the beamformer output signal to remove diffuse interference components from the latter, is an effective means to further improve the recognition accuracy of modern deep learning speech recognition systems. To this end, the 3rd CHiME Speech Separation and Recognition Challenge (CHiME-3) baseline speech enhancement system is extended by a coherence-based postfilter and the postfilter’s impact on the Word Error Rates (WERs) of a state-of-the-art automatic speech recognition system is investigated for the realistic noisy environments provided by CHiME-3. To determine the time- and frequency-dependent postfilter gains, we use Direction-of-Arrival (DOA)-dependent and DOA-independent estimators of the coherent-to-diffuse power ratio as an approximation of the short-time signal-to-noise ratio. Our experiments show that incorporating coherence-based postfiltering into the CHiME-3 baseline speech enhancement system leads to a significant reduction of the WERs, with relative improvements of up to 11.31%.



FAU Authors / FAU Editors

Barfuß, Hendrik
Professur für Nachrichtentechnik
Hümmer, Christian
Professur für Nachrichtentechnik
Kellermann, Walter Prof. Dr.-Ing.
Professur für Nachrichtentechnik
Schwarz, Andreas
Professur für Nachrichtentechnik


How to cite

APA:
Barfuß, H., Hümmer, C., Schwarz, A., & Kellermann, W. (2017). Robust coherence-based spectral enhancement for speech recognition in adverse real-world environments. Computer Speech and Language, 46, 388 - 400. https://dx.doi.org/10.1016/j.csl.2017.02.005

MLA:
Barfuß, Hendrik, et al. "Robust coherence-based spectral enhancement for speech recognition in adverse real-world environments." Computer Speech and Language 46 (2017): 388 - 400.

BibTeX: 

Last updated on 2018-10-08 at 23:02