Artificial neural network-based feature combination for spatial voice activity detection

Conference contribution

Publication Details

Author(s): Meier S, Kellermann W
Publishing place: San Francisco, USA
Publication year: 2016
Pages range: 2987-2991
Language: English


For many applications in speech communications and speech-based human-machine interaction, a reliable Voice Activity Detection (VAD) is crucial. Conventional methods for VAD typically differentiate between a target speaker and background noise by exploiting characteristic properties of speech signals. If a target speaker should be distinguished from other speech sources, these conventional concepts are no longer applicable, and other methods, typically exploiting the spatial diversity of the individual sources, are required. Often, it is beneficial to combine several features in order to improve the overall decision. Optimum combinations of features, however, depend strongly on the scenario, especially on the position of the target source, the characteristics of noise and interference and the
Signal-to-Interference Ratio (SIR). Moreover, choosing detection thresholds which are robust to changing scenarios is often a difficult problem. In this paper, these issues are addressed by introducing Artificial Neural Networks (ANNs) for spatial voice activity detection, which allow to combine several features with background information. The experimental results show that already small ANNs can significantly and robustly improve the detection rates, offering a valuable tool for VAD.

FAU Authors / FAU Editors

Kellermann, Walter Prof. Dr.-Ing.
Professur für Nachrichtentechnik
Meier, Stefan
Professur für Nachrichtentechnik

How to cite

Meier, S., & Kellermann, W. (2016). Artificial neural network-based feature combination for spatial voice activity detection. (pp. 2987-2991). San Francisco, US: San Francisco, USA.

Meier, Stefan, and Walter Kellermann. "Artificial neural network-based feature combination for spatial voice activity detection." Proceedings of the Interspeech, San Francisco San Francisco, USA, 2016. 2987-2991.


Last updated on 2018-19-10 at 15:20