Surgical mask detection with deep recurrent phonetic models

Klumpp P, Arias Vergara T, Vásquez-Correa JC, Pérez-Toro PA, Hönig FT, Nöth E, Orozco-Arroyave JR (2020)

Publication Language: English

Publication Type: Conference contribution, Conference Contribution

Publication year: 2020

Publisher: International Speech Communication Association (ISCA)

Conference Proceedings Title: Interspeech 2020

Abstract

To solve the task of surgical mask detection from audio recordings in the scope of Interspeech’s ComParE challenge, we introduce a phonetic recognizer which is able to differentiate between clear and mask samples.
A deep recurrent phoneme recognition model is first trained on spectrograms from a German corpus to learn the spectral properties of different speech sounds. Under the assumption that each phoneme sounds differently among clear and mask speech, the model is then used to compute frame-wise phonetic labels for the challenge data, including information about the presence of a surgical mask. These labels served to train a second phoneme recognition model which is finally able to differentiate between mask and clear phoneme productions. For a single utterance, we can compute a functional representation and learn a random forest classifier to detect whether a speech sample was
produced with or without a mask.
Our method performed better than the baseline methods on both validation and test set. Furthermore, we could show how wearing a mask influences the speech signal. Certain phoneme groups were clearly affected by the obstruction in front of the vocal tract, while others remained almost unaffected.

Authors with CRIS profile

Philipp Klumpp Professur für Informatik (Mustererkennung) Tomás Arias Vergara Professur für Informatik (Mustererkennung) Florian Thomas Hönig Professur für Informatik (Mustererkennung) Elmar Nöth Professur für Informatik (Mustererkennung)

Involved external institutions

Universidad de Antioquía (UDEA)

Colombia (CO)

How to cite

APA:

Klumpp, P., Arias Vergara, T., Vásquez-Correa, J.C., Pérez-Toro, P.A., Hönig, F.T., Nöth, E., & Orozco-Arroyave, J.R. (2020). Surgical mask detection with deep recurrent phonetic models. In Interspeech 2020. International Speech Communication Association (ISCA).

MLA:

Klumpp, Philipp, et al. "Surgical mask detection with deep recurrent phonetic models." Proceedings of the Interspeech 2020 International Speech Communication Association (ISCA), 2020.

BibTeX: Download