Conditional emission densities for combining speech enhancement and recognition systems

Sehr A, Yoshioka T, Delcroix M, Kinoshita K, Nakatani T, Maas R, Kellermann W (2013)

Publication Language: English

Publication Status: Published

Publication Type: Conference contribution, Conference Contribution

Publication year: 2013

Pages Range: 3502-3506

Event location: Lyon

ISBN: 978-1-6299-3443-3

URI: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84906212744∨igin=inward

Abstract

A novel framework based on conditional emission densities for hidden Markov models (HMMs) is proposed in this contribution to integrate speech enhancement systems with automatic speech recognition systems. In the training phase, the observed feature vectors, corrupted by background noise and reverberation, together with estimates for the interference as provided by the speech enhancement system are used for training joint densities of the observations and the interference estimates. In the decoding phase, the joint densities are transformed to conditional densities of the observed features given the interference estimates. Thus, front end processing can be exploited for obtaining interference estimates, and the estimation errors can be modeled very effectively in a data-driven way. Connected digit recognition experiments in a simulated reverberant environment show the potential of the proposed approach: HMMs with the proposed conditional densities outperform various configurations of conventional HMMs in the logarithmic melspectral domain. This is a first step towards using conditional densities for creating synergies between front end and back end. Copyright © 2013 ISCA.

Authors with CRIS profile

Roland Maas Lehrstuhl für Multimediakommunikation und Signalverarbeitung (LMS) Walter Kellermann Professur für Signalverarbeitung

Involved external institutions

Nippon Telegraph and Telephone (NTT) / 日本電信電話株式会社

Japan (JP) Hochschule für Technik und Wirtschaft Berlin (HTW)

Germany (DE)

How to cite

APA:

Sehr, A., Yoshioka, T., Delcroix, M., Kinoshita, K., Nakatani, T., Maas, R., & Kellermann, W. (2013). Conditional emission densities for combining speech enhancement and recognition systems. In Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech) (pp. 3502-3506). Lyon, FR.

MLA:

Sehr, Armin, et al. "Conditional emission densities for combining speech enhancement and recognition systems." Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech), Lyon 2013. 3502-3506.

BibTeX: Download