Strauß M, Torcoli M, Edler B (2023)
Publication Type: Conference contribution
Publication year: 2023
Publisher: Institute of Electrical and Electronics Engineers Inc.
Pages Range: 444-450
Conference Proceedings Title: 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings
ISBN: 9798350396904
DOI: 10.1109/SLT54892.2023.10022898
Deep generative models for Speech Enhancement (SE) received increasing attention in recent years. The most prominent example are Generative Adversarial Networks (GANs), while normalizing flows (NF) received less attention despite their potential. Building on previous work, architectural modifications are proposed, along with an investigation of different conditional input representations. Despite being a common choice in related works, Mel-spectrograms demonstrate to be inadequate for the given scenario. Alternatively, a novel All-Pole Gammatone filterbank (APG) with high temporal resolution is proposed. Although computational evaluation metric results would suggest that state-of-the-art GAN-based methods perform best, a perceptual evaluation via a listening test indicates that the presented NF approach (based on time domain and APG) performs best, especially at lower SNRs. On average, APG outputs are rated as having good quality, which is unmatched by the other methods, including GAN.
APA:
Strauß, M., Torcoli, M., & Edler, B. (2023). Improved Normalizing Flow-Based Speech Enhancement Using an all-Pole Gammatone Filterbank for Conditional Input Representation. In 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings (pp. 444-450). Doha, QA: Institute of Electrical and Electronics Engineers Inc..
MLA:
Strauß, Martin, Matteo Torcoli, and Bernd Edler. "Improved Normalizing Flow-Based Speech Enhancement Using an all-Pole Gammatone Filterbank for Conditional Input Representation." Proceedings of the 2022 IEEE Spoken Language Technology Workshop, SLT 2022, Doha Institute of Electrical and Electronics Engineers Inc., 2023. 444-450.
BibTeX: Download