On the Impact of Children's Emotional Speech on Acoustic and Language Models

Steidl S, Batliner A, Seppi D, Schuller B (2010)

Publication Language: English

Publication Type: Journal article, Original article

Publication year: 2010

Journal

EURASIP Journal on Audio, Speech, and Music Processing Hindawi Publishing Corporation / Springer Verlag (Germany) / SpringerOpen

Original Authors: Steidl Stefan, Batliner Anton, Seppi Dino, Schuller Björn

Publisher: Hindawi Publishing Corporation / Springer Verlag (Germany) / SpringerOpen

Book Volume: 2010

Article Number: 783954

URI: http://downloads.hindawi.com/journals/asmp/2010/783954.pdf

DOI: 10.1155/2010/783954

Abstract

The automatic recognition of children's speech is well known to be a challenge, and so is the influence of affect that is believed to downgrade performance of a speech recogniser. In this contribution, we investigate the combination of both phenomena. Extensive test runs are carried out for 1 k vocabulary continuous speech recognition on spontaneous motherese, emphatic, and angry children's speech as opposed to neutral speech. The experiments address the question how specific emotions influence word accuracy. In a first scenario, "emotional" speech recognisers are compared to a speech recogniser trained on neutral speech only. For this comparison, equal amounts of training data are used for each emotion-related state. In a second scenario, a "neutral" speech recogniser trained on large amounts of neutral speech is adapted by adding only some emotionally coloured data in the training process. The results show that emphatic and angry speech is recognised best—even better than neutral speech—and that the performance can be improved further by adaptation of the acoustic and linguistic models. In order to show the variability of emotional speech, we visualise the distribution of the four emotion-related states in the MFCC space by applying a Sammon transformation.

Authors with CRIS profile

Stefan Steidl Lehrstuhl für Informatik 14 (Bild- und Sprachverarbeitung) (LME) Anton Batliner Lehrstuhl für Informatik 14 (Bild- und Sprachverarbeitung) (LME)

Involved external institutions

Katholieke Universiteit Leuven (KUL) / Catholic University of Leuven

Belgium (BE) Technische Universität München (TUM)

Germany (DE)

How to cite

APA:

Steidl, S., Batliner, A., Seppi, D., & Schuller, B. (2010). On the Impact of Children's Emotional Speech on Acoustic and Language Models. EURASIP Journal on Audio, Speech, and Music Processing, 2010. https://doi.org/10.1155/2010/783954

MLA:

Steidl, Stefan, et al. "On the Impact of Children's Emotional Speech on Acoustic and Language Models." EURASIP Journal on Audio, Speech, and Music Processing 2010 (2010).

BibTeX: Download