A Bayesian Network Approach to Selected Problems in Speech Signal Processing

Hümmer C (2019)

Publication Language: English

Publication Type: Thesis

Publication year: 2019

Publisher: Friedrich-Alexander-Universität Erlangen-Nürnberg

City/Town: Erlangen; Nürnberg

URI: https://nbn-resolving.org/urn:nbn:de:bvb:29-opus4-105371

Abstract

The application of machine learning techniques to signal processing tasks has become of increasing interest in recent years. In particular, directed graphical models, named Bayesian networks, have shown to provide a powerful framework for deriving links between existing and new algorithms from a generalized point of view. This motivates to exploit a systematic Bayesian network approach in this thesis which is described as follows. A sequence of real-world observations is modeled to be produced by a set of latent random variables with unknown statistics. The underlying process producing the observations is described by a probabilistic model and its Bayesian network representation. This is the basis for acquiring information about the latent random variables by applying the steps of inference and decision. The described machine learning methodology will be consistently used to address two distinct speech signal processing tasks from a unifying Bayesian network perspective. First, the problem of single-channel Nonlinear Acoustic Echo Cancellation (NAEC) is considered with the goal to remove the acoustic coupling between a loudspeaker and a microphone. This leads to the derivation of the NLMS algorithms with fixed and optimum adaptive stepsize value as special cases of the Kalman filter. Furthermore, the Elitist Particle Filter based on Evolutionary Strategies (EPFES) is introduced as a new algorithm to estimate the parameters of a nonlinear acoustic echo path model. The experimental results for a synthesized scenario and real smartphone recordings illustrate that the EPFES is a promising method for NAEC. As a second application, the task of environmentally-robust Automatic Speech Recognition (ASR) is addressed by modeling acoustic features to be random variables instead of deterministic point estimates. This model is taken into account by modifying the acoustic-model scoring during the recognition phase. To this end, both a well-known and a new uncertainty decoding strategy are derived from a unifying Bayesian network perspective. The experimental evaluation shows that applying the proposed uncertainty decoding concept improves the recognition accuracy achieved by a powerful deep neural network-based ASR system.

Authors with CRIS profile

Christian Hümmer Professur für Signalverarbeitung

How to cite

APA:

Hümmer, C. (2019). A Bayesian Network Approach to Selected Problems in Speech Signal Processing (Dissertation).

MLA:

Hümmer, Christian. A Bayesian Network Approach to Selected Problems in Speech Signal Processing. Dissertation, Erlangen; Nürnberg: Friedrich-Alexander-Universität Erlangen-Nürnberg, 2019.

BibTeX: Download