Schwarz A (2019)
Publication Language: English
Publication Type: Thesis
Publication year: 2019
One of the challenges for far-field speech communication and recognition applications is that the acquired speech signal is impacted by reverberation and noise. It is therefore often required to apply signal processing techniques for dereverberation and noise reduction. Particularly effective are techniques which exploit spatial information about the sound field from multichannel microphone signals. One approach for modeling the spatial characteristics of reverberation and noise are spatial coherence functions. These are dependent only on acoustic properties which are relatively similar between different rooms, and require a minimum of assumptions about the acoustic scenario, which provides the motivation for focusing this thesis on signal enhancement approaches exploiting spatial coherence models. As a foundation, the applicability of different spatial coherence models to reverberation, and their dependency on acoustic properties of the room, are investigated. Existing methods for signal enhancement are reviewed, with a focus on spectral enhancement methods which use a short-time coherence estimate to estimate the power ratio between desired coherent and undesired diffuse sound field components. Known spectral enhancement methods are expressed in this framework, and novel estimators are proposed which have both theoretical and practical advantages over existing methods. Based on these estimators, an effective dereverberation system is proposed which can operate without knowledge of the position of the desired source, solely by exploiting the characteristic spatial coherence of reverberation. Furthermore, a more experimental dereverberation system is proposed which additionally accounts for the effect of early signal reflections in the room, showing that this approach can provide promising directions for future research. Finally, the problem of how to effectively use spatial information in an automatic speech recognizer based on a deep neural network acoustic model is investigated. A novel way of exploiting spatial information for reverberation-robust automatic speech recognition is proposed, where a spatial feature vector is extracted from short-time coherence estimates and then supplied as input to the neural network. It is shown that this approach can exceed the improvements that are obtained by the application of signal enhancement methods for dereverberation.
Schwarz, A. (2019). Dereverberation and Robust Speech Recognition Using Spatial Coherence Models (Dissertation).
Schwarz, Andreas. Dereverberation and Robust Speech Recognition Using Spatial Coherence Models. Dissertation, 2019.