Artificial neural network-based feature combination for spatial voice activity detection

Meier S, Kellermann W (2016)

Publication Language: English

Publication Type: Conference contribution

Publication year: 2016

City/Town: San Francisco, USA

Pages Range: 2987-2991

Event location: San Francisco

DOI: 10.21437/Interspeech.2016-1184

Abstract

For many applications in speech communications and speech-based human-machine interaction, a reliable Voice Activity Detection (VAD) is crucial. Conventional methods for VAD typically differentiate between a target speaker and background noise by exploiting characteristic properties of speech signals. If a target speaker should be distinguished from other speech sources, these conventional concepts are no longer applicable, and other methods, typically exploiting the spatial diversity of the individual sources, are required. Often, it is beneficial to combine several features in order to improve the overall decision. Optimum combinations of features, however, depend strongly on the scenario, especially on the position of the target source, the characteristics of noise and interference and the

Signal-to-Interference Ratio (SIR). Moreover, choosing detection thresholds which are robust to changing scenarios is often a difficult problem. In this paper, these issues are addressed by introducing Artificial Neural Networks (ANNs) for spatial voice activity detection, which allow to combine several features with background information. The experimental results show that already small ANNs can significantly and robustly improve the detection rates, offering a valuable tool for VAD.

Authors with CRIS profile

Stefan Meier Professur für Signalverarbeitung Walter Kellermann Professur für Signalverarbeitung

How to cite

APA:

Meier, S., & Kellermann, W. (2016). Artificial neural network-based feature combination for spatial voice activity detection. In Proceedings of the Interspeech (pp. 2987-2991). San Francisco, US: San Francisco, USA.

MLA:

Meier, Stefan, and Walter Kellermann. "Artificial neural network-based feature combination for spatial voice activity detection." Proceedings of the Interspeech, San Francisco San Francisco, USA, 2016. 2987-2991.

BibTeX: Download