Parra-Gallego LF, Arias-Vergara T, Arroyave JRO (2021)
Publication Type: Conference contribution
Publication year: 2021
Publisher: Springer Science and Business Media Deutschland GmbH
Book Volume: 1431 CCIS
Pages Range: 72-83
Conference Proceedings Title: Communications in Computer and Information Science
Event location: Virtual, Online
ISBN: 9783030867010
DOI: 10.1007/978-3-030-86702-7_7
This paper is focused on developing an Automatic Speech Recognition (ASR) system robust against different noisy scenarios. ASR systems are widely used in call centers to convert telephone recordings into text transcriptions which are further used as input to automatically evaluate the Quality of the Service (QoS). Since the evaluation of the QoS and the customer satisfaction is performed by analyzing the text resulting from the ASR system, this process highly depends on the accuracy of the transcription. Given that the calls are usually recorded in non-controlled acoustic conditions, the accuracy of the ASR is typically decreased. To address this problem, we first evaluated four different hybrid architectures: (1) Gaussian Mixture Models (GMM) (baseline), (2) Time Delay Neural Network (TDNN), (3) Long Short-Term Memory (LSTM), and (4) Gated Recurrent Unit (GRU). The evaluation is performed considering a total of 478,6 h of recordings collected in a real call-center. Each recording has its respective transcription and three perceptual labels about the level of noise present during the phone-call: Low level of noise (LN), Medium Level of noise (ML), and High Level of noise (HN). The LSTM-based model achieved the best performance in the MN and HN scenarios with 22, 55 % and 27, 99 % of word error rate (WER), respectively. Additionally, we implemented a denoiser based on GRUs to enhance the speech signals and the results improved in 1,16% in the HN scenario.
APA:
Parra-Gallego, L.F., Arias-Vergara, T., & Arroyave, J.R.O. (2021). Robust Automatic Speech Recognition for Call Center Applications. In Juan Carlos Figueroa-García, Yesid Díaz-Gutierrez, Elvis Eduardo Gaona-García, Alvaro David Orjuela-Cañón (Eds.), Communications in Computer and Information Science (pp. 72-83). Virtual, Online: Springer Science and Business Media Deutschland GmbH.
MLA:
Parra-Gallego, Luis Felipe, Tomas Arias-Vergara, and Juan Rafael Orozco Arroyave. "Robust Automatic Speech Recognition for Call Center Applications." Proceedings of the 8th Workshop on Engineering Applications, WEA 2021, Virtual, Online Ed. Juan Carlos Figueroa-García, Yesid Díaz-Gutierrez, Elvis Eduardo Gaona-García, Alvaro David Orjuela-Cañón, Springer Science and Business Media Deutschland GmbH, 2021. 72-83.
BibTeX: Download