Speech Processing and Understanding

Research picture

Description / Outline

Our research focuses on modeling speech and language patterns using machine and deep learning methods. We develop spoken dialogue systems, enhance speech, and process out-of-vocabulary words. We analyze prosodic features such as accents and phrase boundaries, and automatically recognize emotion-related states using multi-modal data, including facial expressions, gestures, and physiological parameters. We also recognize user focus in human-machine interactions and analyze pathological speech from children with cleft lip and palate or patients with speech and language disorders. In natural language processing, we develop and apply methods like Large Language Models (LLMs), topic modeling, and part-of-speech tagging, with applications in both medical and industrial domains. We also  leverage LLMs and deep learning for advanced speech and language understanding, addressing ethical AI, text summarization, and question/answering systems. Additionally, our work extends to analyzing animal speech (e.g., such as the one from orcas) aiming to interpret communication patterns in zoos and the wild.

Faculty/Institution

Contacts