A novel approach for matched reverberant training of HMMs using data pairs

Conference contribution
(Conference Contribution)


Publication Details

Author(s): Sehr A, Hofmann C, Maas R, Kellermann W
Publication year: 2010
Pages range: 566-569


Abstract


For robust distant-talking speech recognition, a novel HMM training approach using data pairs is proposed. The data pairs of clean and reverberant feature vectors, also called stereo data, are used for deriving the HMM parameters of a matched-condition reverberant HMM from a well-trained clean-speech HMM in two steps. In the first step, the alignment of the frames to the states is determined from the clean data and the clean-speech HMM. This state-frame alignment (SFA) is then used in the second step to estimate the Gaussian mixture densities for each state of the reverberant HMM by applying the Expectation Maximization (EM) algorithm to the reverberant data. Thus, a more accurate temporal alignment is achieved than by standard matched condition training, and the discrimination capability of the HMMs is increased. Connected digit recognition experiments show that the proposed approach decreases the word error rate (WER) by up to 44% while substantially reducing the training complexity. These improvements will make reverberant training attractive for a wider range of applications. © 2010 ISCA.



FAU Authors / FAU Editors

Hofmann, Christian
Professur für Nachrichtentechnik
Kellermann, Walter Prof. Dr.-Ing.
Professur für Nachrichtentechnik
Maas, Roland
Lehrstuhl für Multimediakommunikation und Signalverarbeitung
Sehr, Armin Dr.-Ing.
Professur für Nachrichtentechnik


How to cite

APA:
Sehr, A., Hofmann, C., Maas, R., & Kellermann, W. (2010). A novel approach for matched reverberant training of HMMs using data pairs. (pp. 566-569). Makuhari, Chiba, JP.

MLA:
Sehr, Armin, et al. "A novel approach for matched reverberant training of HMMs using data pairs." Proceedings of the 11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010, Makuhari, Chiba 2010. 566-569.

BibTeX: 

Last updated on 2018-08-08 at 15:08