Efficient training of acoustic models for reverberation-robust medium-vocabulary automatic speech recognition

Sehr A, Barfuß H, Hofmann C, Maas R, Kellermann W (2014)

Publication Language: English

Publication Status: Published

Publication Type: Conference contribution, Conference Contribution

Publication year: 2014

Publisher: IEEE Computer Society

Pages Range: 177-181

Article Number: 6843275

Event location: Nancy

ISBN: 978-1-4799-3109-5

DOI: 10.1109/HSCMA.2014.6843275

Abstract

A recently proposed concept for training reverberation-robust acoustic models for automatic speech recognition using pairs of clean and reverberant data is extended from word models to tied-state triphone models in this paper. The key idea of the concept, termed ICEWIND, is to use the clean data for the temporal alignment and the reverberant data for the estimation of the emission densities. Experiments with the 5000-word Wall Street Journal corpus confirm the benefits of ICEWIND with tied-state triphones: While the training time is reduced by more than 90%, the word accuracy is improved at the same time, both for room-specific and multi-style hidden Markov models. Since the acoustic models trained with ICEWIND need less Gaussian components for the emission densities to achieve comparable recognition rates as Baum-Welch acoustic models, ICEWIND also allows for a reduced decoding complexity. © 2014 IEEE.

Authors with CRIS profile

Hendrik Barfuß Professur für Signalverarbeitung Christian Hofmann Professur für Signalverarbeitung Roland Maas Lehrstuhl für Multimediakommunikation und Signalverarbeitung (LMS) Walter Kellermann Professur für Signalverarbeitung

Involved external institutions

Hochschule für Technik und Wirtschaft Berlin (HTW)

Germany (DE)

How to cite

APA:

Sehr, A., Barfuß, H., Hofmann, C., Maas, R., & Kellermann, W. (2014). Efficient training of acoustic models for reverberation-robust medium-vocabulary automatic speech recognition. In Proceedings of the Joint Workshop on Hands-Free Speech Communication and Microphone Arrays (HSCMA) (pp. 177-181). Nancy, FR: IEEE Computer Society.

MLA:

Sehr, Armin, et al. "Efficient training of acoustic models for reverberation-robust medium-vocabulary automatic speech recognition." Proceedings of the Joint Workshop on Hands-Free Speech Communication and Microphone Arrays (HSCMA), Nancy IEEE Computer Society, 2014. 177-181.

BibTeX: Download