Impact of Including Pathological Speech in Pre-training on Pathology Detection

Weise T, Maier A, Demir K, Perez Toro PA, Arias Vergara T, Heismann B, Nöth E, Schuster M, Yang SH (2023)

Publication Type: Conference contribution

Publication year: 2023

Journal

Lecture Notes in Computer Science Springer Verlag

Publisher: Springer

Series: Lecture Notes in Computer Science

City/Town: Cham

Book Volume: 14102

Pages Range: 141-153

Conference Proceedings Title: Text, Speech, and Dialogue

Event location: Pilsen

ISBN: 9783031404979

DOI: 10.1007/978-3-031-40498-6_13

Abstract

Transfer learning has achieved state-of-the-art performance across many different areas, requiring magnitudes less labeled data compared to traditional methods. Pre-trained weights are learned in a self-supervised way on large amounts of unlabeled data, which are fine-tuned for the desired downstream task using labeled data. An example of this in the speech domain is the wav2vec2.0 framework, which was originally designed for automatic speech recognition (ASR) but can also be fine-tuned for general sequence classification tasks.

This paper analyses the effects of including pathological speech during the pre-training of wav2vec2.0, where quantized speech representations are learned, on the performance of a fine-tuned pathology detection task. We show that this architecture can be successfully fine-tuned for cleft lip and palate (CLP) detection, where the best-performing model yields an F1-score of

when pre-trained on healthy speech only. Our experiments show, that including pathological speech during pre-training drastically degrades the performance on detection of the same pathology for which it was fine-tuned. The worst-performing model was pre-trained exclusively on CLP speech, resulting in an F1-score of

. Whilst performed experiments only focus on CLP, the magnitude of the results suggest, that other pathologies will also follow this trend.

Authors with CRIS profile

Tobias Weise Professur für Artificial Intelligence in Biomedical Speech Processing (Stiftungsprofessur) Andreas Maier Lehrstuhl für Informatik 5 (Mustererkennung) Kubilay Demir Lehrstuhl für Informatik 5 (Mustererkennung) Paula Andrea Perez Toro Professur für Informatik (Mustererkennung) Tomás Arias Vergara Lehrstuhl für Informatik 5 (Mustererkennung) Björn Heismann Technische Fakultät Elmar Nöth Professur für Informatik (Mustererkennung) Seung Hee Yang Professur für Artificial Intelligence in Biomedical Speech Processing (Stiftungsprofessur)

Involved external institutions

Ludwig-Maximilians-Universität (LMU)

Germany (DE)

How to cite

APA:

Weise, T., Maier, A., Demir, K., Perez Toro, P.A., Arias Vergara, T., Heismann, B.,... Yang, S.H. (2023). Impact of Including Pathological Speech in Pre-training on Pathology Detection. In Kamil Ekštein, František Pártl, Miloslav Konopík (Eds.), Text, Speech, and Dialogue (pp. 141-153). Pilsen, CZ: Cham: Springer.

MLA:

Weise, Tobias, et al. "Impact of Including Pathological Speech in Pre-training on Pathology Detection." Proceedings of the TSD 2023: Text, Speech, and Dialogue, Pilsen Ed. Kamil Ekštein, František Pártl, Miloslav Konopík, Cham: Springer, 2023. 141-153.

BibTeX: Download