Towards Inclusive ASR: Investigating Voice Conversion for Dysarthric Speech Recognition in Low-Resource Languages

Li CJ, Yeo E, Choi K, Perez Toro PA, Someki M, Das RK, Yue Z, Orozco Arroyave JR, Nöth E, Mortensen DR (2025)


Publication Type: Conference contribution

Publication year: 2025

Publisher: International Speech Communication Association

Pages Range: 2128-2132

Conference Proceedings Title: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Event location: Rotterdam NL

DOI: 10.21437/Interspeech.2025-512

Abstract

Automatic speech recognition (ASR) for dysarthric speech remains challenging due to data scarcity, particularly in non-English languages. To address this, we fine-tune a voice conversion model on English dysarthric speech (UASpeech) to encode both speaker characteristics and prosodic distortions, then apply it to convert healthy non-English speech (FLEURS) into non-English dysarthric-like speech. The generated data is then used to fine-tune a multilingual ASR model, Massively Multilingual Speech (MMS), for improved dysarthric speech recognition. Evaluation on PC-GITA (Spanish), EasyCall (Italian), and SSNCE (Tamil) demonstrates that VC with both speaker and prosody conversion significantly outperforms the off-the-shelf MMS performance and conventional augmentation techniques such as speed and tempo perturbation. Objective and subjective analyses of the generated data further confirm that the generated speech simulates dysarthric characteristics.

Authors with CRIS profile

Involved external institutions

How to cite

APA:

Li, C.J., Yeo, E., Choi, K., Perez Toro, P.A., Someki, M., Das, R.K.,... Mortensen, D.R. (2025). Towards Inclusive ASR: Investigating Voice Conversion for Dysarthric Speech Recognition in Low-Resource Languages. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 2128-2132). Rotterdam, NL: International Speech Communication Association.

MLA:

Li, Chin Jou, et al. "Towards Inclusive ASR: Investigating Voice Conversion for Dysarthric Speech Recognition in Low-Resource Languages." Proceedings of the 26th Interspeech Conference 2025, Rotterdam International Speech Communication Association, 2025. 2128-2132.

BibTeX: Download