Artificial neural network-based feature combination for spatial voice activity detection
Meier S, Kellermann W (2016)
Publication Language: English
Publication Type: Conference contribution
Publication year: 2016
City/Town: San Francisco, USA
Pages Range: 2987-2991
Event location: San Francisco
DOI: 10.21437/Interspeech.2016-1184
Abstract
For many applications in speech communications and speech-based human-machine interaction, a reliable Voice Activity Detection (VAD) is crucial. Conventional methods for VAD typically differentiate between a target speaker and background noise by exploiting characteristic properties of speech signals. If a target speaker should be distinguished from other speech sources, these conventional concepts are no longer applicable, and other methods, typically exploiting the spatial diversity of the individual sources, are required. Often, it is beneficial to combine several features in order to improve the overall decision. Optimum combinations of features, however, depend strongly on the scenario, especially on the position of the target source, the characteristics of noise and interference and the
Signal-to-Interference Ratio (SIR). Moreover, choosing detection thresholds which are robust to changing scenarios is often a difficult problem. In this paper, these issues are addressed by introducing Artificial Neural Networks (ANNs) for spatial voice activity detection, which allow to combine several features with background information. The experimental results show that already small ANNs can significantly and robustly improve the detection rates, offering a valuable tool for VAD.
Authors with CRIS profile
How to cite
APA:
Meier, S., & Kellermann, W. (2016). Artificial neural network-based feature combination for spatial voice activity detection. In Proceedings of the Interspeech (pp. 2987-2991). San Francisco, US: San Francisco, USA.
MLA:
Meier, Stefan, and Walter Kellermann. "Artificial neural network-based feature combination for spatial voice activity detection." Proceedings of the Interspeech, San Francisco San Francisco, USA, 2016. 2987-2991.
BibTeX: Download