Moussa D, Bergmann S, Rieß C (2024)
Publication Language: English
Publication Type: Conference contribution, Conference Contribution
Publication year: 2024
Pages Range: 2260-2264
Conference Proceedings Title: Proc. Interspeech 2024
URI: https://www.isca-archive.org/interspeech_2024/moussa24_interspeech.html#
DOI: 10.21437/Interspeech.2024-1652
Open Access Link: https://www.isca-archive.org/interspeech_2024/moussa24_interspeech.html#
Compression traces are an important forensic cue to uncover the processing history and integrity of audio evidence. With continuous advances in the AI domain, efficient generative lossy neural codecs like Lyra-V2, EnCodec or Improved RVQGAN can compete with traditional speech and audio codecs. Their fundamentally different learning based approach compared to analytical lossy compression methods poses a new challenge for audio forensics. This calls for a closer examination of such techniques to prepare forensics for audio evidence processed by AI-based codecs. In this work, we thus want to take a first step towards robustly detecting traces of neural codecs in audio samples. We report that distinctive frequency artefacts enable for identifying neurally compressed audio and fingerprint specific AI-based codecs. We further analyse the robustness towards cross-dataset testing and noise, downsampling, and traditional compression post-processing.
APA:
Moussa, D., Bergmann, S., & Rieß, C. (2024). Unmasking Neural Codecs: Forensic Identification of AI-compressed Speech. In ISCA (Eds.), Proc. Interspeech 2024 (pp. 2260-2264). Kos, GR.
MLA:
Moussa, Denise, Sandra Bergmann, and Christian Rieß. "Unmasking Neural Codecs: Forensic Identification of AI-compressed Speech." Proceedings of the Interspeech 2024, Kos Ed. ISCA, 2024. 2260-2264.
BibTeX: Download