Automatic Classification of Parkinson’s Disease Using Wav2vec Embeddings at Phoneme, Syllable, and Word Levels

Gallo-Aristizábal JD, Escobar-Grisales D, Ríos-Urrego CD, Nöth E, Orozco-Arroyave JR (2024)


Publication Type: Conference contribution

Publication year: 2024

Journal

Publisher: Springer Science and Business Media Deutschland GmbH

Book Volume: 15049 LNAI

Pages Range: 313-323

Conference Proceedings Title: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Event location: Brno, CZE

ISBN: 9783031705656

DOI: 10.1007/978-3-031-70566-3_27

Abstract

Parkinson’s disease (PD) is a neurological condition that produces several speech deficits, typically known as hypokinetic dysarthria. PD involves motor impairments and muscle dysfunction in the phonatory apparatus, producing anomalies in oral communication. Speech signals have been used as a biomarker for diagnosis and monitoring of PD. In this work, we discriminate between PD patients and healthy controls based on patterns extracted from speech signals collected from Colombian spanish speakers, considering three different granularity levels: phoneme, syllable, and word. The Wav2vec 2.0 model is used to obtain frame-level representations of each utterance. These representations are grouped according to each granularity level using different statistical functionals. Each granularity level was evaluated independently, obtaining accuracies of 86%, 80%, and 83% for phonemes, syllables, and words, respectively. In addition, we identified the phonological classes with better discrimination capability. Nasals, approximant, and plosive classes were the three most accurate. We believe that this work constitutes a step forward in the development of automatic systems that support speech and language therapy of PD patients. For future work, we plan to model co-articulation information in words and syllables.

Authors with CRIS profile

Involved external institutions

How to cite

APA:

Gallo-Aristizábal, J.D., Escobar-Grisales, D., Ríos-Urrego, C.D., Nöth, E., & Orozco-Arroyave, J.R. (2024). Automatic Classification of Parkinson’s Disease Using Wav2vec Embeddings at Phoneme, Syllable, and Word Levels. In Elmar Nöth, Aleš Horák, Petr Sojka (Eds.), Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (pp. 313-323). Brno, CZE: Springer Science and Business Media Deutschland GmbH.

MLA:

Gallo-Aristizábal, Jeferson David, et al. "Automatic Classification of Parkinson’s Disease Using Wav2vec Embeddings at Phoneme, Syllable, and Word Levels." Proceedings of the 27th International Conference on Text, Speech, and Dialogue, TSD 2024, Brno, CZE Ed. Elmar Nöth, Aleš Horák, Petr Sojka, Springer Science and Business Media Deutschland GmbH, 2024. 313-323.

BibTeX: Download