Artificial neural network-based feature combination for spatial voice activity detection
    Meier S, Kellermann W  (2016)
    
    
    Publication Language: English
    Publication Type: Conference contribution
    Publication year: 2016
    
    
    City/Town: San Francisco, USA
    
    
    
    Pages Range: 2987-2991
    
    
    
    
        Event location: San Francisco
        
            
    
 
        
    
    
    
    DOI: 10.21437/Interspeech.2016-1184
    
    Abstract
    For many applications in speech communications and speech-based human-machine interaction, a reliable Voice Activity Detection (VAD) is crucial. Conventional methods for VAD typically differentiate between a target speaker and background noise by exploiting characteristic properties of speech signals. If a target speaker should be distinguished from other speech sources, these conventional concepts are no longer applicable, and other methods, typically exploiting the spatial diversity of the individual sources, are required. Often, it is beneficial to combine several features in order to improve the overall decision. Optimum combinations of features, however, depend strongly on the scenario, especially on the position of the target source, the characteristics of noise and interference and the
Signal-to-Interference Ratio (SIR). Moreover, choosing detection thresholds which are robust to changing scenarios is often a difficult problem. In this paper, these issues are addressed by introducing Artificial Neural Networks (ANNs) for spatial voice activity detection, which allow to combine several features with background information. The experimental results show that already small ANNs can significantly and robustly improve the detection rates, offering a valuable tool for VAD.
    
    
    
        
            Authors with CRIS profile
        
        
    
    
    
    
    How to cite
    
        APA:
        Meier, S., & Kellermann, W. (2016). Artificial neural network-based feature combination for spatial voice activity detection. In Proceedings of the Interspeech (pp. 2987-2991). San Francisco, US: San Francisco, USA.
    
    
        MLA:
        Meier, Stefan, and Walter Kellermann. "Artificial neural network-based feature combination for spatial voice activity detection." Proceedings of the Interspeech, San Francisco San Francisco, USA, 2016. 2987-2991.
    
    BibTeX: Download