Robust Automatic Speech Recognition for Call Center Applications

Parra-Gallego LF, Arias-Vergara T, Arroyave JRO (2021)

Publication Type: Conference contribution

Publication year: 2021

Journal

Communications in Computer and Information Science Springer Verlag

Publisher: Springer Science and Business Media Deutschland GmbH

Book Volume: 1431 CCIS

Pages Range: 72-83

Conference Proceedings Title: Communications in Computer and Information Science

Event location: Virtual, Online

ISBN: 9783030867010

DOI: 10.1007/978-3-030-86702-7_7

Abstract

This paper is focused on developing an Automatic Speech Recognition (ASR) system robust against different noisy scenarios. ASR systems are widely used in call centers to convert telephone recordings into text transcriptions which are further used as input to automatically evaluate the Quality of the Service (QoS). Since the evaluation of the QoS and the customer satisfaction is performed by analyzing the text resulting from the ASR system, this process highly depends on the accuracy of the transcription. Given that the calls are usually recorded in non-controlled acoustic conditions, the accuracy of the ASR is typically decreased. To address this problem, we first evaluated four different hybrid architectures: (1) Gaussian Mixture Models (GMM) (baseline), (2) Time Delay Neural Network (TDNN), (3) Long Short-Term Memory (LSTM), and (4) Gated Recurrent Unit (GRU). The evaluation is performed considering a total of 478,6 h of recordings collected in a real call-center. Each recording has its respective transcription and three perceptual labels about the level of noise present during the phone-call: Low level of noise (LN), Medium Level of noise (ML), and High Level of noise (HN). The LSTM-based model achieved the best performance in the MN and HN scenarios with 22, 55 % and 27, 99 % of word error rate (WER), respectively. Additionally, we implemented a denoiser based on GRUs to enhance the speech signals and the results improved in 1,16% in the HN scenario.

Authors with CRIS profile

Tomás Arias Vergara

Involved external institutions

Universidad de Antioquía (UDEA)

Colombia (CO)

How to cite

APA:

Parra-Gallego, L.F., Arias-Vergara, T., & Arroyave, J.R.O. (2021). Robust Automatic Speech Recognition for Call Center Applications. In Juan Carlos Figueroa-García, Yesid Díaz-Gutierrez, Elvis Eduardo Gaona-García, Alvaro David Orjuela-Cañón (Eds.), Communications in Computer and Information Science (pp. 72-83). Virtual, Online: Springer Science and Business Media Deutschland GmbH.

MLA:

Parra-Gallego, Luis Felipe, Tomas Arias-Vergara, and Juan Rafael Orozco Arroyave. "Robust Automatic Speech Recognition for Call Center Applications." Proceedings of the 8th Workshop on Engineering Applications, WEA 2021, Virtual, Online Ed. Juan Carlos Figueroa-García, Yesid Díaz-Gutierrez, Elvis Eduardo Gaona-García, Alvaro David Orjuela-Cañón, Springer Science and Business Media Deutschland GmbH, 2021. 72-83.

BibTeX: Download