Conditional emission densities for combining speech enhancement and recognition systems

Conference contribution
(Conference Contribution)


Publication Details

Author(s): Sehr A, Yoshioka T, Delcroix M, Kinoshita K, Nakatani T, Maas R, Kellermann W
Publication year: 2013
Pages range: 3502-3506
ISBN: 978-1-6299-3443-3
Language: English


Abstract


A novel framework based on conditional emission densities for hidden Markov models (HMMs) is proposed in this contribution to integrate speech enhancement systems with automatic speech recognition systems. In the training phase, the observed feature vectors, corrupted by background noise and reverberation, together with estimates for the interference as provided by the speech enhancement system are used for training joint densities of the observations and the interference estimates. In the decoding phase, the joint densities are transformed to conditional densities of the observed features given the interference estimates. Thus, front end processing can be exploited for obtaining interference estimates, and the estimation errors can be modeled very effectively in a data-driven way. Connected digit recognition experiments in a simulated reverberant environment show the potential of the proposed approach: HMMs with the proposed conditional densities outperform various configurations of conventional HMMs in the logarithmic melspectral domain. This is a first step towards using conditional densities for creating synergies between front end and back end. Copyright © 2013 ISCA.


FAU Authors / FAU Editors

Kellermann, Walter Prof. Dr.-Ing.
Professur für Nachrichtentechnik
Maas, Roland
Lehrstuhl für Multimediakommunikation und Signalverarbeitung


External institutions
Hochschule für Technik und Wirtschaft Berlin (HTW)
Nippon Telegraph and Telephone (NTT) / 日本電信電話株式会社


How to cite

APA:
Sehr, A., Yoshioka, T., Delcroix, M., Kinoshita, K., Nakatani, T., Maas, R., & Kellermann, W. (2013). Conditional emission densities for combining speech enhancement and recognition systems. In Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech) (pp. 3502-3506). Lyon, FR.

MLA:
Sehr, Armin, et al. "Conditional emission densities for combining speech enhancement and recognition systems." Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech), Lyon 2013. 3502-3506.

BibTeX: 

Last updated on 2019-03-06 at 07:14