A stereophonic acoustic signal extraction scheme for noisy and reverberant environments

Reindl K, Zheng Y, Schwarz A, Meier S, Maas R, Sehr A, Kellermann W (2012)

Publication Language: English

Publication Type: Journal article

Publication year: 2012

Journal

Computer Speech and Language Elsevier

Publisher: Elsevier

Book Volume: 27

Pages Range: 726-745

Journal Issue: 3

DOI: 10.1016/j.csl.2012.07.011

Abstract

In this contribution, a novel two-channel acoustic front-end for robust automatic speech recognition in adverse acoustic environments with nonstationary interference and reverberation is proposed. From a MISO system perspective, a statistically optimum source signal extraction scheme based on the multichannel Wiener filter (MWF) is discussed for application in noisy and underdetermined scenarios. For free-field and diffuse noise conditions, this optimum scheme reduces to a Delay & Sum beamformer followed by a single-channel Wiener postfilter. Scenarios with multiple simultaneously interfering sources and background noise are usually modeled by a diffuse noise field. However, in reality, the free-field assumption is very weak because of the reverberant nature of acoustic environments. Therefore, we propose to estimate this simplified MWF solution in each frequency bin separately to cope with reverberation. We show that this approach can very efficiently be realized by the combination of a blocking matrix based on semi-blind source separation ('directional BSS'), which provides a continuously updated reference of all undesired noise and interference components separated from the desired source and its reflections, and a single-channel Wiener postfilter. Moreover, it is shown, how the obtained reference signal of all undesired components can efficiently be used to realize the Wiener postfilter, and at the same time, generalizes well-known postfilter realizations. The proposed front-end and its integration into an automatic speech recognition (ASR) system are analyzed and evaluated in noisy living-room-like environments according to the PASCAL CHiME challenge. A comparison to a simplified front-end based on a free-field assumption shows that the introduced system substantially improves the speech quality and the recognition performance under the considered adverse conditions. © 2012 Elsevier Ltd. All rights reserved.

Authors with CRIS profile

Klaus Reindl Professur für Signalverarbeitung Yuanhang Zheng Lehrstuhl für Multimediakommunikation und Signalverarbeitung (LMS) Andreas Schwarz Professur für Signalverarbeitung Stefan Meier Professur für Signalverarbeitung Roland Maas Lehrstuhl für Multimediakommunikation und Signalverarbeitung (LMS) Armin Sehr Professur für Signalverarbeitung Walter Kellermann Professur für Signalverarbeitung

How to cite

APA:

Reindl, K., Zheng, Y., Schwarz, A., Meier, S., Maas, R., Sehr, A., & Kellermann, W. (2012). A stereophonic acoustic signal extraction scheme for noisy and reverberant environments. Computer Speech and Language, 27(3), 726-745. https://doi.org/10.1016/j.csl.2012.07.011

MLA:

Reindl, Klaus, et al. "A stereophonic acoustic signal extraction scheme for noisy and reverberant environments." Computer Speech and Language 27.3 (2012): 726-745.

BibTeX: Download