Multi-class Detection of Pathological Speech with Latent Features: How does it perform on unseen data?

Wagner D, Baumann I, Braun F, Bayerl SP, Nöth E, Riedhammer K, Bocklet T (2023)


Publication Type: Conference contribution

Publication year: 2023

Publisher: International Speech Communication Association

Book Volume: 2023-August

Pages Range: 2318-2322

Conference Proceedings Title: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Event location: Dublin, IRL

DOI: 10.21437/Interspeech.2023-464

Abstract

The detection of pathologies from speech features is usually defined as a binary classification task with one class representing a specific pathology and the other class representing healthy speech. In this work, we train neural networks, large margin classifiers, and tree boosting machines to distinguish between four pathologies: Parkinson's disease, laryngeal cancer, cleft lip and palate, and oral squamous cell carcinoma. We show that latent representations extracted at different layers of a pre-trained wav2vec 2.0 system can be effectively used to classify these types of pathological voices. We evaluate the robustness of our classifiers by adding room impulse responses to the test data and by applying them to unseen speech corpora. Our approach achieves unweighted average F1-Scores between 74.1% and 97.0%, depending on the model and the noise conditions used. The systems generalize and perform well on unseen data of healthy speakers sampled from a variety of different sources.

Authors with CRIS profile

Involved external institutions

How to cite

APA:

Wagner, D., Baumann, I., Braun, F., Bayerl, S.P., Nöth, E., Riedhammer, K., & Bocklet, T. (2023). Multi-class Detection of Pathological Speech with Latent Features: How does it perform on unseen data? In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 2318-2322). Dublin, IRL: International Speech Communication Association.

MLA:

Wagner, Dominik, et al. "Multi-class Detection of Pathological Speech with Latent Features: How does it perform on unseen data?" Proceedings of the 24th International Speech Communication Association, Interspeech 2023, Dublin, IRL International Speech Communication Association, 2023. 2318-2322.

BibTeX: Download