Scheuerer R, Haderlein T, Nöth E, Bocklet T (2021)
Publication Type: Conference contribution
Publication year: 2021
Publisher: IEEE
City/Town: NEW YORK
Pages Range: 1079-1086
Conference Proceedings Title: 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU)
Event location: Online
DOI: 10.1109/ASRU51503.2021.9688278
Speaker embeddings extracted from time delayed neural networks (TDNNs) contributed to major recent advancements in speaker recognition and verification. We use an X-Vector system trained on augmented VoxCeleb1 and VoxCeleb2 data to obtain embeddings for pathological speech after total or partial larynx removal. We show that our model is able to effectively distinguish and visualize patient groups when generating embeddings. We further compare various regression models on the task of automatically predicting different perceptual ratings by speech therapists (intelligibility, vocal effort, and overall quality) based on the extracted speaker embeddings. For both patient groups we show Pearson correlations in the range of +0.8; we find that Random Forest and Support Vector Regression produce scores that best resemble the experts' assessments.
APA:
Scheuerer, R., Haderlein, T., Nöth, E., & Bocklet, T. (2021). APPLYING X-VECTORS ON PATHOLOGICAL SPEECH AFTER LARYNX REMOVAL. In 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU) (pp. 1079-1086). Online: NEW YORK: IEEE.
MLA:
Scheuerer, Ralph, et al. "APPLYING X-VECTORS ON PATHOLOGICAL SPEECH AFTER LARYNX REMOVAL." Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Online NEW YORK: IEEE, 2021. 1079-1086.
BibTeX: Download