Speech-Driven Expressive Talking Lips with Conditional Sequential Generative Adversarial Networks

Sadoughi, Najmeh; Busso, Carlos

doi:10.1109/TAFFC.2019.2916031

Computer Science > Human-Computer Interaction

arXiv:1806.00154 (cs)

[Submitted on 1 Jun 2018]

Title:Speech-Driven Expressive Talking Lips with Conditional Sequential Generative Adversarial Networks

Authors:Najmeh Sadoughi, Carlos Busso

View PDF

Abstract:Articulation, emotion, and personality play strong roles in the orofacial movements. To improve the naturalness and expressiveness of virtual agents (VAs), it is important that we carefully model the complex interplay between these factors. This paper proposes a conditional generative adversarial network, called conditional sequential GAN (CSG), which learns the relationship between emotion and lexical content in a principled manner. This model uses a set of articulatory and emotional features directly extracted from the speech signal as conditioning inputs, generating realistic movements. A key feature of the approach is that it is a speech-driven framework that does not require transcripts. Our experiments show the superiority of this model over three state-of-the-art baselines in terms of objective and subjective evaluations. When the target emotion is known, we propose to create emotionally dependent models by either adapting the base model with the target emotional data (CSG-Emo-Adapted), or adding emotional conditions as the input of the model (CSG-Emo-Aware). Objective evaluations of these models show improvements for the CSG-Emo-Adapted compared with the CSG model, as the trajectory sequences are closer to the original sequences. Subjective evaluations show significantly better results for this model compared with the CSG model when the target emotion is happiness.

Subjects:	Human-Computer Interaction (cs.HC)
Cite as:	arXiv:1806.00154 [cs.HC]
	(or arXiv:1806.00154v1 [cs.HC] for this version)
	https://doi.org/10.48550/arXiv.1806.00154
Journal reference:	IEEE Transactions on Affective Computing, vol. 12, no. 4, pp. 1031-1044, October-December 2021
Related DOI:	https://doi.org/10.1109/TAFFC.2019.2916031

Submission history

From: Najmeh Sadoughi [view email]
[v1] Fri, 1 Jun 2018 01:09:25 UTC (5,300 KB)

Computer Science > Human-Computer Interaction

Title:Speech-Driven Expressive Talking Lips with Conditional Sequential Generative Adversarial Networks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Human-Computer Interaction

Title:Speech-Driven Expressive Talking Lips with Conditional Sequential Generative Adversarial Networks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators