A stereophonic acoustic signal extraction scheme for noisy and reverberant environments

Journal article

Publication Details

Author(s): Reindl K, Zheng Y, Schwarz A, Meier S, Maas R, Sehr A, Kellermann W
Journal: Computer Speech and Language
Publisher: Elsevier
Publication year: 2012
Volume: 27
Journal issue: 3
Pages range: 726-745
ISSN: 0885-2308
Language: English


In this contribution, a novel two-channel acoustic front-end for robust automatic speech recognition in adverse acoustic environments with nonstationary interference and reverberation is proposed. From a MISO system perspective, a statistically optimum source signal extraction scheme based on the multichannel Wiener filter (MWF) is discussed for application in noisy and underdetermined scenarios. For free-field and diffuse noise conditions, this optimum scheme reduces to a Delay & Sum beamformer followed by a single-channel Wiener postfilter. Scenarios with multiple simultaneously interfering sources and background noise are usually modeled by a diffuse noise field. However, in reality, the free-field assumption is very weak because of the reverberant nature of acoustic environments. Therefore, we propose to estimate this simplified MWF solution in each frequency bin separately to cope with reverberation. We show that this approach can very efficiently be realized by the combination of a blocking matrix based on semi-blind source separation ('directional BSS'), which provides a continuously updated reference of all undesired noise and interference components separated from the desired source and its reflections, and a single-channel Wiener postfilter. Moreover, it is shown, how the obtained reference signal of all undesired components can efficiently be used to realize the Wiener postfilter, and at the same time, generalizes well-known postfilter realizations. The proposed front-end and its integration into an automatic speech recognition (ASR) system are analyzed and evaluated in noisy living-room-like environments according to the PASCAL CHiME challenge. A comparison to a simplified front-end based on a free-field assumption shows that the introduced system substantially improves the speech quality and the recognition performance under the considered adverse conditions. © 2012 Elsevier Ltd. All rights reserved.

FAU Authors / FAU Editors

Kellermann, Walter Prof. Dr.-Ing.
Professur für Nachrichtentechnik
Maas, Roland
Lehrstuhl für Multimediakommunikation und Signalverarbeitung
Meier, Stefan
Professur für Nachrichtentechnik
Reindl, Klaus
Professur für Nachrichtentechnik
Schwarz, Andreas
Professur für Nachrichtentechnik
Sehr, Armin Dr.-Ing.
Professur für Nachrichtentechnik
Zheng, Yuanhang
Lehrstuhl für Multimediakommunikation und Signalverarbeitung

How to cite

Reindl, K., Zheng, Y., Schwarz, A., Meier, S., Maas, R., Sehr, A., & Kellermann, W. (2012). A stereophonic acoustic signal extraction scheme for noisy and reverberant environments. Computer Speech and Language, 27(3), 726-745. https://dx.doi.org/10.1016/j.csl.2012.07.011

Reindl, Klaus, et al. "A stereophonic acoustic signal extraction scheme for noisy and reverberant environments." Computer Speech and Language 27.3 (2012): 726-745.


Last updated on 2018-17-10 at 08:23