Strauß M, Pia N, K. S. Rao N, Edler B (2023)
Publication Type: Conference contribution, Conference Contribution
Publication year: 2023
Publisher: IEEE
City/Town: New Paltz, NY, USA
Conference Proceedings Title: 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
ISBN: 979-8-3503-2373-3
DOI: 10.1109/WASPAA58266.2023.10248144
This paper proposes SEFGAN, a Deep Neural Network (DNN) combining maximum likelihood training and Generative Adversarial Networks (GANs) for efficient speech enhancement (SE). For this, a DNN is trained to synthesize the enhanced speech conditioned on noisy speech using a Normalizing Flow (NF) as generator in a GAN framework. While the combination of likelihood models and GANs is not trivial, SEFGAN demonstrates that a hybrid adversarial and maximum likelihood training approach enables the model to maintain high quality audio generation and log-likelihood estimation. Our experiments indicate that this approach strongly outperforms the baseline NF-based model without introducing additional complexity to the enhancement network. A comparison using computational metrics and a listening experiment reveals that SEFGAN is competitive with other state-of-the-art models.
APA:
Strauß, M., Pia, N., K. S. Rao, N., & Edler, B. (2023). SEFGAN: Harvesting the Power of Normalizing Flows and GANs for Efficient High-Quality Speech Enhancement. In 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). New Paltz, NY, USA: IEEE.
MLA:
Strauß, Martin, et al. "SEFGAN: Harvesting the Power of Normalizing Flows and GANs for Efficient High-Quality Speech Enhancement." Proceedings of the 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) New Paltz, NY, USA: IEEE, 2023.
BibTeX: Download