Briegleb A, Kellermann W (2024)
Publication Language: English
Publication Type: Journal article, Original article
Publication year: 2024
Book Volume: 61
URI: https://link.springer.com/article/10.1186/s13636-024-00381-3
DOI: 10.1186/s13636-024-00381-3
Mask-based multichannel speech enhancement methods based on artificial neural networks estimate a mask that is applied to the multichannel input signal or a reference channel to obtain the estimated desired signal. For the estimation, both spectral and spatial cues from the multichannel input can be used. However, the interplay of the two inside the neural network is typically unknown. In this contribution, we propose a framework to analyze neural spatiospectral filters (NSSFs) with respect to their capabilities to extract and represent spatial information. We explicitly take the characteristics of the training target signal into account and analyze its effect on the functionality of the NSSF. Using two conceptually different NSSFs as example, we show that not all NSSFs use spatial information under all circumstances and that the training target signal has a significant influence on the spatial filtering behavior of an NSSF. These insights help to assess the signal processing capabilities of neural networks and allow to make informed decisions when configuring, training, and deploying NSSFs.
APA:
Briegleb, A., & Kellermann, W. (2024). Analysis of spatial filtering in neural spatiospectral filters and its dependence on training target characteristics. EURASIP Journal on Audio, Speech, and Music Processing, 61. https://doi.org/10.1186/s13636-024-00381-3
MLA:
Briegleb, Annika, and Walter Kellermann. "Analysis of spatial filtering in neural spatiospectral filters and its dependence on training target characteristics." EURASIP Journal on Audio, Speech, and Music Processing 61 (2024).
BibTeX: Download