DiffusionTalker: Personalization and Acceleration for Speech-Driven 3D Face Diffuser

Chen, Peng; Wei, Xiaobao; Lu, Ming; Zhu, Yitong; Yao, Naiming; Xiao, Xingyu; Chen, Hui

Computer Science > Computer Vision and Pattern Recognition

arXiv:2311.16565 (cs)

[Submitted on 28 Nov 2023 (v1), last revised 2 Dec 2023 (this version, v2)]

Title:DiffusionTalker: Personalization and Acceleration for Speech-Driven 3D Face Diffuser

Authors:Peng Chen, Xiaobao Wei, Ming Lu, Yitong Zhu, Naiming Yao, Xingyu Xiao, Hui Chen

View PDF HTML (experimental)

Abstract:Speech-driven 3D facial animation has been an attractive task in both academia and industry. Traditional methods mostly focus on learning a deterministic mapping from speech to animation. Recent approaches start to consider the non-deterministic fact of speech-driven 3D face animation and employ the diffusion model for the task. However, personalizing facial animation and accelerating animation generation are still two major limitations of existing diffusion-based methods. To address the above limitations, we propose DiffusionTalker, a diffusion-based method that utilizes contrastive learning to personalize 3D facial animation and knowledge distillation to accelerate 3D animation generation. Specifically, to enable personalization, we introduce a learnable talking identity to aggregate knowledge in audio sequences. The proposed identity embeddings extract customized facial cues across different people in a contrastive learning manner. During inference, users can obtain personalized facial animation based on input audio, reflecting a specific talking style. With a trained diffusion model with hundreds of steps, we distill it into a lightweight model with 8 steps for acceleration. Extensive experiments are conducted to demonstrate that our method outperforms state-of-the-art methods. The code will be released.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2311.16565 [cs.CV]
	(or arXiv:2311.16565v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2311.16565

Submission history

From: Peng Chen [view email]
[v1] Tue, 28 Nov 2023 07:13:20 UTC (9,556 KB)
[v2] Sat, 2 Dec 2023 16:48:09 UTC (9,556 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:DiffusionTalker: Personalization and Acceleration for Speech-Driven 3D Face Diffuser

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:DiffusionTalker: Personalization and Acceleration for Speech-Driven 3D Face Diffuser

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators