Investigating the impact of 2D gesture representation on co-speech gesture generation

Guichoux, Teo; Soulier, Laure; Obin, Nicolas; Pelachaud, Catherine

Computer Science > Artificial Intelligence

arXiv:2406.15111 (cs)

[Submitted on 21 Jun 2024 (v1), last revised 24 Jun 2024 (this version, v2)]

Title:Investigating the impact of 2D gesture representation on co-speech gesture generation

Authors:Teo Guichoux, Laure Soulier, Nicolas Obin, Catherine Pelachaud

View PDF HTML (experimental)

Abstract:Co-speech gestures play a crucial role in the interactions between humans and embodied conversational agents (ECA). Recent deep learning methods enable the generation of realistic, natural co-speech gestures synchronized with speech, but such approaches require large amounts of training data. "In-the-wild" datasets, which compile videos from sources such as YouTube through human pose detection models, offer a solution by providing 2D skeleton sequences that are paired with speech. Concurrently, innovative lifting models have emerged, capable of transforming these 2D pose sequences into their 3D counterparts, leading to large and diverse datasets of 3D gestures. However, the derived 3D pose estimation is essentially a pseudo-ground truth, with the actual ground truth being the 2D motion data. This distinction raises questions about the impact of gesture representation dimensionality on the quality of generated motions, a topic that, to our knowledge, remains largely unexplored. In this work, we evaluate the impact of the dimensionality of the training data, 2D or 3D joint coordinates, on the performance of a multimodal speech-to-gesture deep generative model. We use a lifting model to convert 2D-generated sequences of body pose to 3D. Then, we compare the sequence of gestures generated directly in 3D to the gestures generated in 2D and lifted to 3D as post-processing.

Comments:	8 pages. Paper accepted at WACAI 2024
Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2406.15111 [cs.AI]
	(or arXiv:2406.15111v2 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2406.15111

Submission history

From: Téo Guichoux [view email]
[v1] Fri, 21 Jun 2024 12:59:20 UTC (472 KB)
[v2] Mon, 24 Jun 2024 08:19:00 UTC (471 KB)

Computer Science > Artificial Intelligence

Title:Investigating the impact of 2D gesture representation on co-speech gesture generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Investigating the impact of 2D gesture representation on co-speech gesture generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators