Korse S, Gupta K, Fuchs G (2020)
Publication Type: Conference contribution
Publication year: 2020
Publisher: Institute of Electrical and Electronics Engineers Inc.
Book Volume: 2020-May
Pages Range: 6764-6768
Conference Proceedings Title: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISBN: 9781509066315
DOI: 10.1109/ICASSP40776.2020.9053283
The quality of speech codecs deteriorates at low bitrates due to high quantization noise. A post-filter is generally employed to enhance the quality of the coded speech. In this paper, a data-driven post-filter relying on masking in the time-frequency domain is proposed. A fully connected neural network (FCNN), a convolutional encoder-decoder (CED) network and a long short-term memory (LSTM) network are implemeted to estimate a real-valued mask per time-frequency bin. The proposed models were tested on the five lowest operating modes (6.65 kbps-15.85 kbps) of the Adaptive Multi-Rate Wideband codec (AMR-WB). Both objective and subjective evaluations confirm the enhancement of the coded speech and also show the superiority of the mask-based neural network system over a conventional heuristic post-filter used in the standard like ITU-T G.718.
APA:
Korse, S., Gupta, K., & Fuchs, G. (2020). Enhancement of Coded Speech Using a Mask-Based Post-Filter. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (pp. 6764-6768). Barcelona, ES: Institute of Electrical and Electronics Engineers Inc..
MLA:
Korse, Srikanth, Kishan Gupta, and Guillaume Fuchs. "Enhancement of Coded Speech Using a Mask-Based Post-Filter." Proceedings of the 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020, Barcelona Institute of Electrical and Electronics Engineers Inc., 2020. 6764-6768.
BibTeX: Download