Personalized Fine-Tuning with Controllable Synthetic Speech from LLM-Generated Transcripts for Dysarthric Speech Recognition

Wagner D, Baumann I, Engert N, Lee S, Nöth E, Riedhammer K, Bocklet T (2025)

Publication Type: Conference contribution

Publication year: 2025

Publisher: International Speech Communication Association

Pages Range: 3294-3298

Conference Proceedings Title: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Event location: Rotterdam, NLD

DOI: 10.21437/Interspeech.2025-2155

Abstract

In this work, we present our submission to the Speech Accessibility Project challenge for dysarthric speech recognition. We integrate parameter-efficient fine-tuning with latent audio representations to improve an encoder-decoder ASR system. Synthetic training data is generated by fine-tuning Parler-TTS to mimic dysarthric speech, using LLM-generated prompts for corpus-consistent target transcripts. Personalization with x-vectors consistently reduces word error rates (WERs) over non-personalized fine-tuning. AdaLoRA adapters outperform full fine-tuning and standard low-rank adaptation, achieving relative WER reductions of ∼23% and ∼22%, respectively. Further improvements (∼5% WER reduction) come from incorporating wav2vec 2.0-based audio representations. Training with synthetic dysarthric speech yields up to ∼7% relative WER improvement over personalized fine-tuning alone.

Authors with CRIS profile

Elmar Nöth Technische Fakultät

Involved external institutions

Technische Hochschule Nürnberg "Georg Simon Ohm"

Germany (DE) Korea Advanced Institute of Science and Technology (KAIST)

Korea, Republic of (KR)

How to cite

APA:

Wagner, D., Baumann, I., Engert, N., Lee, S., Nöth, E., Riedhammer, K., & Bocklet, T. (2025). Personalized Fine-Tuning with Controllable Synthetic Speech from LLM-Generated Transcripts for Dysarthric Speech Recognition. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 3294-3298). Rotterdam, NLD, NL: International Speech Communication Association.

MLA:

Wagner, Dominik, et al. "Personalized Fine-Tuning with Controllable Synthetic Speech from LLM-Generated Transcripts for Dysarthric Speech Recognition." Proceedings of the 26th Interspeech Conference 2025, Rotterdam, NLD International Speech Communication Association, 2025. 3294-3298.

BibTeX: Download