APPLYING X-VECTORS ON PATHOLOGICAL SPEECH AFTER LARYNX REMOVAL

Scheuerer R, Haderlein T, Nöth E, Bocklet T (2021)


Publication Type: Conference contribution

Publication year: 2021

Publisher: IEEE

City/Town: NEW YORK

Pages Range: 1079-1086

Conference Proceedings Title: 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU)

Event location: Online

DOI: 10.1109/ASRU51503.2021.9688278

Abstract

Speaker embeddings extracted from time delayed neural networks (TDNNs) contributed to major recent advancements in speaker recognition and verification. We use an X-Vector system trained on augmented VoxCeleb1 and VoxCeleb2 data to obtain embeddings for pathological speech after total or partial larynx removal. We show that our model is able to effectively distinguish and visualize patient groups when generating embeddings. We further compare various regression models on the task of automatically predicting different perceptual ratings by speech therapists (intelligibility, vocal effort, and overall quality) based on the extracted speaker embeddings. For both patient groups we show Pearson correlations in the range of +0.8; we find that Random Forest and Support Vector Regression produce scores that best resemble the experts' assessments.


Authors with CRIS profile

Involved external institutions

How to cite

APA:

Scheuerer, R., Haderlein, T., Nöth, E., & Bocklet, T. (2021). APPLYING X-VECTORS ON PATHOLOGICAL SPEECH AFTER LARYNX REMOVAL. In 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU) (pp. 1079-1086). Online: NEW YORK: IEEE.

MLA:

Scheuerer, Ralph, et al. "APPLYING X-VECTORS ON PATHOLOGICAL SPEECH AFTER LARYNX REMOVAL." Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Online NEW YORK: IEEE, 2021. 1079-1086.

BibTeX: Download