VENTRILOQUIST-NET: LEVERAGING SPEECH CUES FOR EMOTIVE TALKING HEAD GENERATION

Das D, Khan Q, Cremers D (2022)

Publication Type: Conference contribution

Publication year: 2022

Journal

Proceedings - International Conference on Image Processing, ICIP

Publisher: IEEE Computer Society

Pages Range: 1716-1720

Conference Proceedings Title: Proceedings - International Conference on Image Processing, ICIP

Event location: Bordeaux, FRA

ISBN: 9781665496209

DOI: 10.1109/ICIP46576.2022.9897657

Abstract

In this paper, we propose Ventriloquist-Net: A Talking Head Generation model that uses only a speech segment and a single source face image. It places emphasis on emotive expressions. Cues for generating these expressions are implicitly inferred from the speech clip only. We formulate our framework to comprise of independently trained modules to expedite convergence. This not only allows extension to datasets in a semi-supervised manner but also facilitates handling in-the-wild source images. Quantitative and qualitative evaluations on generated videos demonstrate state-of-the-art performance even on unseen input data. Implementation and supplementary videos are available at https://github.com/dipnds/VentriloquistNet.

Involved external institutions

Technische Universität München (TUM)

Germany (DE)

How to cite

APA:

Das, D., Khan, Q., & Cremers, D. (2022). VENTRILOQUIST-NET: LEVERAGING SPEECH CUES FOR EMOTIVE TALKING HEAD GENERATION. In Proceedings - International Conference on Image Processing, ICIP (pp. 1716-1720). Bordeaux, FRA: IEEE Computer Society.

MLA:

Das, Deepan, Qadeer Khan, and Daniel Cremers. "VENTRILOQUIST-NET: LEVERAGING SPEECH CUES FOR EMOTIVE TALKING HEAD GENERATION." Proceedings of the 29th IEEE International Conference on Image Processing, ICIP 2022, Bordeaux, FRA IEEE Computer Society, 2022. 1716-1720.

BibTeX: Download