Emotion Recognition from Speech: Putting ASR in the Loop

Schuller B, Batliner A, Steidl S, Seppi D (2009)

Publication Language: English

Publication Type: Conference contribution, Conference Contribution

Publication year: 2009

Original Authors: Schuller Björn, Batliner Anton, Seppi Dino, Steidl Stefan

Pages Range: 4585-4588

Conference Proceedings Title: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing ICASSP 2009

Event location: Taipei

URI: http://www5.informatik.uni-erlangen.de/Forschung/Publikationen/2009/Schuller09-ERF.pdf

Abstract

This paper investigates the automatic recognition of emotion from spoken words by vector space modeling vs. string kernels which have not been investigated in this respect, yet. Apart from the spoken content directly, we integrate Part-of-Speech and higher semantic tagging in our analyses. As opposed to most works in the field, we evaluate the performance with an ASR engine in the loop. Extensive experiments are run on the FAU Aibo Emotion Corpus of 4k spontaneous emotional child-robot interactions and show surprisingly low performance degradation with real ASR over transcription-based emotion recognition. In the result, bag of words dominate over all other modeling forms based on the spoken content, directly.

Authors with CRIS profile

Anton Batliner Lehrstuhl für Informatik 14 (Bild- und Sprachverarbeitung) (LME) Stefan Steidl Lehrstuhl für Informatik 14 (Bild- und Sprachverarbeitung) (LME)

Involved external institutions

Technische Universität München (TUM)

Germany (DE)

How to cite

APA:

Schuller, B., Batliner, A., Steidl, S., & Seppi, D. (2009). Emotion Recognition from Speech: Putting ASR in the Loop. In ICASSP (Eds.), Proc. Int. Conf. on Acoustics, Speech, and Signal Processing ICASSP 2009 (pp. 4585-4588). Taipei, TW.

MLA:

Schuller, Björn, et al. "Emotion Recognition from Speech: Putting ASR in the Loop." Proceedings of the Int. Conf. on Acoustics, Speech, and Signal Processing ICASSP 2009, Taipei Ed. ICASSP, 2009. 4585-4588.

BibTeX: Download