Personalized Fine-Tuning with Controllable Synthetic Speech from LLM-Generated Transcripts for Dysarthric Speech Recognition

Wagner D, Baumann I, Engert N, Lee S, Nöth E, Riedhammer K, Bocklet T (2025)


Publication Type: Conference contribution

Publication year: 2025

Publisher: International Speech Communication Association

Pages Range: 3294-3298

Conference Proceedings Title: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Event location: Rotterdam, NLD NL

DOI: 10.21437/Interspeech.2025-2155

Abstract

In this work, we present our submission to the Speech Accessibility Project challenge for dysarthric speech recognition. We integrate parameter-efficient fine-tuning with latent audio representations to improve an encoder-decoder ASR system. Synthetic training data is generated by fine-tuning Parler-TTS to mimic dysarthric speech, using LLM-generated prompts for corpus-consistent target transcripts. Personalization with x-vectors consistently reduces word error rates (WERs) over non-personalized fine-tuning. AdaLoRA adapters outperform full fine-tuning and standard low-rank adaptation, achieving relative WER reductions of ∼23% and ∼22%, respectively. Further improvements (∼5% WER reduction) come from incorporating wav2vec 2.0-based audio representations. Training with synthetic dysarthric speech yields up to ∼7% relative WER improvement over personalized fine-tuning alone.

Authors with CRIS profile

Involved external institutions

How to cite

APA:

Wagner, D., Baumann, I., Engert, N., Lee, S., Nöth, E., Riedhammer, K., & Bocklet, T. (2025). Personalized Fine-Tuning with Controllable Synthetic Speech from LLM-Generated Transcripts for Dysarthric Speech Recognition. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 3294-3298). Rotterdam, NLD, NL: International Speech Communication Association.

MLA:

Wagner, Dominik, et al. "Personalized Fine-Tuning with Controllable Synthetic Speech from LLM-Generated Transcripts for Dysarthric Speech Recognition." Proceedings of the 26th Interspeech Conference 2025, Rotterdam, NLD International Speech Communication Association, 2025. 3294-3298.

BibTeX: Download