Tandem decoding of children's speech for keyword detection in a child-robot interaction scenario

Wöllmer M, Schuller B, Batliner A, Steidl S, Seppi D (2011)


Publication Language: English

Publication Type: Journal article, Original article

Publication year: 2011

Journal

Original Authors: Wöllmer Martin, Schuller Björn, Batliner Anton, Steidl Stefan, Seppi Dino

Publisher: Association for Computing Machinary, Inc.

Book Volume: 7

Article Number: 12

Journal Issue: 4

URI: http://www5.informatik.uni-erlangen.de/Forschung/Publikationen/2011/Woellmer11-TDO.pdf

DOI: 10.1145/1998384.1998386

Abstract

In this article, we focus on keyword detection in children's speech as it is needed in voice command systems. We use the FAU Aibo Emotion Corpus which contains emotionally colored spontaneous children's speech recorded in a child-robot interaction scenario and investigate various recent key-word spotting techniques. As the principle of bidirectional Long Short-Term Memory (BLSTM) is known to be well-suited for context-sensitive phoneme prediction, we incorporate a BLSTM network into a Tandem model for exible coarticulation modeling in children's speech. Our experiments reveal that the Tandem model prevails over a triphone-based Hidden Markov Model approach.

Authors with CRIS profile

Involved external institutions

How to cite

APA:

Wöllmer, M., Schuller, B., Batliner, A., Steidl, S., & Seppi, D. (2011). Tandem decoding of children's speech for keyword detection in a child-robot interaction scenario. ACM Transactions on Speech and Language Processing, 7(4). https://doi.org/10.1145/1998384.1998386

MLA:

Wöllmer, Martin, et al. "Tandem decoding of children's speech for keyword detection in a child-robot interaction scenario." ACM Transactions on Speech and Language Processing 7.4 (2011).

BibTeX: Download