Tackling Data Scarcity in Automatic Pathological Speech Analysis

Tayebi Arasteh S (2024)


Publication Language: English

Publication Type: Thesis

Publication year: 2024

URI: https://open.fau.de/handle/openfau/33327

DOI: 10.25593/open-fau-1401

Abstract

The emergence of speech as a crucial element in digital technology has raised significant privacy concerns due to its biometric nature, particularly evident in its use as a biomarker in healthcare. Speech is extensively used in applications such as detecting Parkinson’s and Alzheimer’s diseases and in speech therapy because it is cost-effective and non-invasive. However, the effectiveness of deep learning (DL) algorithms depends on having access to substantial amounts of speech data for training. Despite the daily influx of patients with speech or voice disorders at various institutions, utilizing this data for research is challenging due to strict privacy regulations. Since speech data can reveal a wealth of personal information, there is a pressing need for privacy-preserving technologies in voice data usage. As a result, the amount of data available for public research remains severely restricted. This thesis aims to increase the availability of pathological speech datasets for healthcare research by ensuring data privacy and security. It involves determining if pathological speech is more vulnerable to verification than healthy speech, guiding the development of anonymization techniques that protect personal information without altering diagnostic utility. Additionally, the thesis explores privacy-preserving collaborative learning models, particularly federated learning (FL), to enable collective analysis without sharing raw data, assessing its applicability to pathological speech. In the first part of the research, a DL-driven automatic speaker verification (ASV) approach was employed to analyze pathological speech data from various age groups and disorders. The findings indicated that pathological speech generally faces higher privacy breach risks compared to healthy speech. Specifically, adults with Dysphonia are at greater risk of re-verification, while conditions like Dysarthria showed similar risks to healthy speakers. Speech intelligibility did not affect the ASV system’s performance. For pediatric cases, particularly those with Cleft Lip and Palate (CLP), the recording environment significantly impacted re-verification risks. Combining data across different pathological types improved ASV performance, suggesting the benefits of pathological diversity. The second part of the thesis examines the impact of speaker anonymization on pathological speech. To address privacy concerns, this study focuses on anonymizing pathological speech from over 2,700 speakers across multiple German institutions. Both DL-based and signal processing-based anonymization methods were explored, showing significant privacy improvements across various disorders with minimal impact on diagnostic utility. Disorders such as Dysarthria, Dysphonia, and CLP experienced minimal utility changes, while Dysglossia showed slight improvements. These findings highlight that the impact of anonymization varies across different disorders, necessitating disorder-specific strategies to balance privacy with diagnostic utility. Additionally, the fairness analysis revealed consistent anonymization effects across most demographics, demonstrating the effectiveness of these methods in enhancing privacy while maintaining the utility of pathological speech for diagnostic purposes. The final part of the research explores the use of FL for detecting Parkinson's disease using speech signals. Given the stringent patient data privacy regulations that hinder data sharing between institutions, FL offers a solution by enabling collaborative model training without sharing raw data. This study utilized speech data from three real-world language corpora in German, Spanish, and Czech, each from a separate institution. The FL model outperformed all local models in terms of diagnostic accuracy and performed comparably to a model trained on centrally combined datasets, without requiring any data sharing. In conclusion, this thesis contributes to addressing data limitation challenges in the field of pathological speech processing by developing and evaluating methods that enhance privacy and diagnostic accuracy. The study demonstrates the effectiveness of anonymization and FL in enhancing privacy while maintaining diagnostic utility, highlighting the importance of customized approaches. This research paves the way for more effective and privacy-conscious healthcare applications of speech, ultimately improving patient outcomes.

Authors with CRIS profile

How to cite

APA:

Tayebi Arasteh, S. (2024). Tackling Data Scarcity in Automatic Pathological Speech Analysis (Dissertation).

MLA:

Tayebi Arasteh, Soroosh. Tackling Data Scarcity in Automatic Pathological Speech Analysis. Dissertation, 2024.

BibTeX: Download