Impact of Including Pathological Speech in Pre-training on Pathology Detection

Weise T, Maier A, Demir K, Pérez Toro PA, Arias Vergara T, Heismann B, Nöth E, Schuster M, Yang SH (2023)

Publication Type: Conference contribution

Publication year: 2023


Publisher: Springer

Series: Lecture Notes in Computer Science

City/Town: Cham

Book Volume: 14102

Pages Range: 141-153

Conference Proceedings Title: Text, Speech, and Dialogue

Event location: Pilsen CZ

ISBN: 9783031404979

DOI: 10.1007/978-3-031-40498-6_13


Transfer learning has achieved state-of-the-art performance across many different areas, requiring magnitudes less labeled data compared to traditional methods. Pre-trained weights are learned in a self-supervised way on large amounts of unlabeled data, which are fine-tuned for the desired downstream task using labeled data. An example of this in the speech domain is the wav2vec2.0 framework, which was originally designed for automatic speech recognition (ASR) but can also be fine-tuned for general sequence classification tasks.

This paper analyses the effects of including pathological speech during the pre-training of wav2vec2.0, where quantized speech representations are learned, on the performance of a fine-tuned pathology detection task. We show that this architecture can be successfully fine-tuned for cleft lip and palate (CLP) detection, where the best-performing model yields an F1-score of 

 when pre-trained on healthy speech only. Our experiments show, that including pathological speech during pre-training drastically degrades the performance on detection of the same pathology for which it was fine-tuned. The worst-performing model was pre-trained exclusively on CLP speech, resulting in an F1-score of 

. Whilst performed experiments only focus on CLP, the magnitude of the results suggest, that other pathologies will also follow this trend.

Authors with CRIS profile

Involved external institutions

How to cite


Weise, T., Maier, A., Demir, K., Pérez Toro, P.A., Arias Vergara, T., Heismann, B.,... Yang, S.H. (2023). Impact of Including Pathological Speech in Pre-training on Pathology Detection. In Kamil Ekštein, František Pártl, Miloslav Konopík (Eds.), Text, Speech, and Dialogue (pp. 141-153). Pilsen, CZ: Cham: Springer.


Weise, Tobias, et al. "Impact of Including Pathological Speech in Pre-training on Pathology Detection." Proceedings of the TSD 2023: Text, Speech, and Dialogue, Pilsen Ed. Kamil Ekštein, František Pártl, Miloslav Konopík, Cham: Springer, 2023. 141-153.

BibTeX: Download