Unmasking Neural Codecs: Forensic Identification of AI-compressed Speech

Moussa D, Bergmann S, Rieß C (2024)


Publication Language: English

Publication Type: Conference contribution, Conference Contribution

Publication year: 2024

Pages Range: 2260-2264

Conference Proceedings Title: Proc. Interspeech 2024

Event location: Kos GR

URI: https://www.isca-archive.org/interspeech_2024/moussa24_interspeech.html#

DOI: 10.21437/Interspeech.2024-1652

Open Access Link: https://www.isca-archive.org/interspeech_2024/moussa24_interspeech.html#

Abstract

Compression traces are an important forensic cue to uncover the processing history and integrity of audio evidence. With continuous advances in the AI domain, efficient generative lossy neural codecs like Lyra-V2, EnCodec or Improved RVQGAN can compete with traditional speech and audio codecs. Their fundamentally different learning based approach compared to analytical lossy compression methods poses a new challenge for audio forensics. This calls for a closer examination of such techniques to prepare forensics for audio evidence processed by AI-based codecs. In this work, we thus want to take a first step towards robustly detecting traces of neural codecs in audio samples. We report that distinctive frequency artefacts enable for identifying neurally compressed audio and fingerprint specific AI-based codecs. We further analyse the robustness towards cross-dataset testing and noise, downsampling, and traditional compression post-processing.

Authors with CRIS profile

How to cite

APA:

Moussa, D., Bergmann, S., & Rieß, C. (2024). Unmasking Neural Codecs: Forensic Identification of AI-compressed Speech. In ISCA (Eds.), Proc. Interspeech 2024 (pp. 2260-2264). Kos, GR.

MLA:

Moussa, Denise, Sandra Bergmann, and Christian Rieß. "Unmasking Neural Codecs: Forensic Identification of AI-compressed Speech." Proceedings of the Interspeech 2024, Kos Ed. ISCA, 2024. 2260-2264.

BibTeX: Download