Deep Learning-based F0 Synthesis for Speaker Anonymization

Gaznepoglu ÜE, Peters N (2023)


Publication Type: Conference contribution

Publication year: 2023

Publisher: European Signal Processing Conference, EUSIPCO

Pages Range: 291-295

Conference Proceedings Title: European Signal Processing Conference

Event location: Helsinki, FIN

ISBN: 9789464593600

DOI: 10.23919/EUSIPCO58844.2023.10290038

Abstract

Voice conversion for speaker anonymization is an emerging concept for privacy protection. In a deep learning setting, this is achieved by extracting multiple features from speech, altering the speaker identity, and waveform synthesis. However, many existing systems do not modify fundamental frequency (F0) trajectories, which convey prosody information and can reveal speaker identity. Moreover, mismatch between F0 and other features can degrade speech quality and intelligibility. In this paper, we formally introduce a method that synthesizes F0 trajectories from other speech features and evaluate its reconstructional capabilities. Then we test our approach within a speaker anonymization framework, comparing it to a baseline and a state-of-the-art F0 modification that utilizes speaker information. The results show that our method improves both speaker anonymity, measured by the equal error rate, and utility, measured by the word error rate.

Authors with CRIS profile

How to cite

APA:

Gaznepoglu, Ü.E., & Peters, N. (2023). Deep Learning-based F0 Synthesis for Speaker Anonymization. In European Signal Processing Conference (pp. 291-295). Helsinki, FIN: European Signal Processing Conference, EUSIPCO.

MLA:

Gaznepoglu, Ünal Ege, and Nils Peters. "Deep Learning-based F0 Synthesis for Speaker Anonymization." Proceedings of the 31st European Signal Processing Conference, EUSIPCO 2023, Helsinki, FIN European Signal Processing Conference, EUSIPCO, 2023. 291-295.

BibTeX: Download