Yang SH, Weise T, Demir K (2023)
Publication Language: English
Publication Type: Conference contribution, Conference Contribution
Publication year: 2023
Pages Range: 12
Transfer learning has achieved state-of-the-art performance across many different areas, requiring magnitudes less labeled data compared to traditional methods. Pre-trained weights are learned in a self-supervised way on large amounts of unlabeled data, which are fine-tuned for the desired downstream task using labeled data. An example of this in the speech domain is the wav2vec2.0 framework, which was originally designed for automatic speech recognition (ASR) but can also be fine-tuned for general sequence classification tasks. This paper analyses the effects of including pathological speech during the pretraining of wav2vec2.0, where quantized speech representations are learned, on the performance of a fine-tuned pathology detection task. We show that this architecture can be successfully fine-tuned for cleft lip and palate (CLP) detection, where the best-performing model yields an F1-score of 82.3% when pre-trained on healthy speech only. Our experiments show, that including pathological speech during pre-training drastically degrades the performance on detection of the same pathology for which it was fine-tuned. The worst-performing model was pretrained exclusively on CLP speech, resulting in an F1-score of 33.9%. Whilst performed experiments only focus on CLP, the magnitude of the results suggest, that other pathologies will also follow this trend.
APA:
Yang, S.H., Weise, T., & Demir, K. (2023). Impact of Including Pathological Speech in Pre-Training on Pathology Detection. In Proceedings of the Text, Speech, and Dialogue. Satellite event of Interspeech 22023 (pp. 12). Pilsen, CZ.
MLA:
Yang, Seung Hee, Tobias Weise, and Kubilay Demir. "Impact of Including Pathological Speech in Pre-Training on Pathology Detection." Proceedings of the Text, Speech, and Dialogue. Satellite event of Interspeech 22023, Pilsen 2023. 12.
BibTeX: Download