STFT bin selection for localization algorithms based on the sparsity of speech signal spectra

Brendel A, Huang C, Kellermann W (2018)


Publication Language: English

Publication Type: Conference contribution

Publication year: 2018

Pages Range: 2561-2568

Event location: Heraklion, Crete GR

Abstract

Many algorithms for localizing, tracking or Direction of Arrival (DOA) estimation of speech sources, rely on the so-called W-disjoint orthogonality, i.e., only one speaker is assumed to be active at a certain time-frequency bin. Based on this assumption, bin-wise DOA estimates can be computed from pairwise phase differences of each time-frequency bin and clustered afterwards. Averaging the estimates of each cluster, i.e., computing the cluster centroids, increases the robustness of the localization estimate. However, clustering can be computationally demanding due to the large amount of DOA estimates, and at the same time highly sensitive to errors as potentially many of them may not be reliable due to noise and reverberation. Therefore, an efficient selection algorithm for reliable Short-Time Fourier Transform (STFT) bins is desirable that aims at increasing the accuracy of the estimate while simultaneously reducing the computational complexity. In this contribution, we investigate different selection methods for STFT bins as suitable for localization algorithms for speech sources, which are based on the W-disjoint orthogonality, and exploit bin-wise speech signal power, Coherent-to-Diffuse Power Ratio (CDR), and Speech Presence Probability (SPP). The effectiveness of the selection processes is studied for different localization algorithms.

Authors with CRIS profile

How to cite

APA:

Brendel, A., Huang, C., & Kellermann, W. (2018). STFT bin selection for localization algorithms based on the sparsity of speech signal spectra. In Proceedings of the EURONOISE 2018 (pp. 2561-2568). Heraklion, Crete, GR.

MLA:

Brendel, Andreas, Chengyu Huang, and Walter Kellermann. "STFT bin selection for localization algorithms based on the sparsity of speech signal spectra." Proceedings of the EURONOISE 2018, Heraklion, Crete 2018. 2561-2568.

BibTeX: Download