Robust coherence-based spectral enhancement for speech recognition in adverse real-world environments

Barfuß H, Hümmer C, Schwarz A, Kellermann W (2017)

Publication Language: English

Publication Type: Journal article, Original article

Publication year: 2017

Journal

Computer Speech and Language Elsevier

Book Volume: 46

Pages Range: 388 - 400

Journal Issue: 2017

URI: https://arxiv.org/abs/1604.03393

DOI: 10.1016/j.csl.2017.02.005

Abstract

Speech recognition in adverse real-world environments is highly affected by reverberation and non-stationary background noise. A well-known strategy to reduce such undesired signal components in multi-microphone scenarios is spatial filtering of the microphone signals. In this article, we demonstrate that an additional coherence-based postfilter, which is applied to the beamformer output signal to remove diffuse interference components from the latter, is an effective means to further improve the recognition accuracy of modern deep learning speech recognition systems. To this end, the 3rd CHiME Speech Separation and Recognition Challenge (CHiME-3) baseline speech enhancement system is extended by a coherence-based postfilter and the postfilter’s impact on the Word Error Rates (WERs) of a state-of-the-art automatic speech recognition system is investigated for the realistic noisy environments provided by CHiME-3. To determine the time- and frequency-dependent postfilter gains, we use Direction-of-Arrival (DOA)-dependent and DOA-independent estimators of the coherent-to-diffuse power ratio as an approximation of the short-time signal-to-noise ratio. Our experiments show that incorporating coherence-based postfiltering into the CHiME-3 baseline speech enhancement system leads to a significant reduction of the WERs, with relative improvements of up to 11.31%.

Authors with CRIS profile

Hendrik Barfuß Professur für Signalverarbeitung Christian Hümmer Professur für Signalverarbeitung Andreas Schwarz Professur für Signalverarbeitung Walter Kellermann Professur für Signalverarbeitung

How to cite

APA:

Barfuß, H., Hümmer, C., Schwarz, A., & Kellermann, W. (2017). Robust coherence-based spectral enhancement for speech recognition in adverse real-world environments. Computer Speech and Language, 46(2017), 388 - 400. https://doi.org/10.1016/j.csl.2017.02.005

MLA:

Barfuß, Hendrik, et al. "Robust coherence-based spectral enhancement for speech recognition in adverse real-world environments." Computer Speech and Language 46.2017 (2017): 388 - 400.

BibTeX: Download