Emotion Recognition from Speech under Environmental Noise Conditions using Wavelet Decomposition

Vasquez-Correa JC, Garcia N, Orozco-Arroyave JR, Arias-Londono JD, Vargas-Bonilla JF, Nöth E (2015)

Publication Status: Published

Publication Type: Conference contribution

Publication year: 2015

Journal

49th Annual IEEE International Carnahan Conference on Security Technology (ICCST)

Pages Range: 247-252

Abstract

Automatic emotion recognition considering speech signals has attracted the attention of the research community in the last years. One of the main challenges is to find suitable features to represent the affective state of the speaker. In this paper, a new set of features derived from the wavelet packet transform is proposed to classify different negative emotions such as anger, fear, and disgust, and to differentiate between those negative emotions and neutral state, or positive emotions such as happiness. Different wavelet decompositions are considered both for voiced and unvoiced segments, in order to determine a frequency band where the emotions are concentrated. Several measures are calculated in the wavelet decomposed signals, including log-energy, entropy measures, mel frequency cepstral coefficients, and the Lempel-Ziv complexity. The experiments consider two different databases extensively used in emotion recognition: the Berlin emotional database, and the enterface05 database. Also, in order to approximate to real-world conditions in terms of the quality of recorded speech, such databases are degraded using different environmental noise such as cafeteria babble, and street noise. The addition of noise is performed considering several signal to noise ratio levels which range from -3 to 6 dB. Finally, the effect produced by two different speech enhancement methods is evaluated. According to results, the features calculated from the lower frequency wavelet decomposition coefficients are able to recognize the fear-type emotions in speech. Also, one of the speech enhancement algorithms has proven to be useful to improve of the accuracy in cases of speech signals affected by highly background noise.

Authors with CRIS profile

Elmar Nöth Professur für Informatik (Mustererkennung)

Involved external institutions

Universidad de Antioquía (UDEA)

Colombia (CO)

How to cite

APA:

Vasquez-Correa, J.C., Garcia, N., Orozco-Arroyave, J.R., Arias-Londono, J.D., Vargas-Bonilla, J.F., & Nöth, E. (2015). Emotion Recognition from Speech under Environmental Noise Conditions using Wavelet Decomposition. (pp. 247-252).

MLA:

Vasquez-Correa, J. C., et al. "Emotion Recognition from Speech under Environmental Noise Conditions using Wavelet Decomposition." 2015. 247-252.

BibTeX: Download