SemTalk: Holistic Co-speech Motion Generation with Frame-level Semantic Emphasis

Zhang, Xiangyue; Li, Jiangfang; Zhang, Jiaxu; Dang, Ziqiang; Ren, Jianqiang; Bo, Liefeng; Tu, Zhigang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2412.16563 (cs)

[Submitted on 21 Dec 2024]

Title:SemTalk: Holistic Co-speech Motion Generation with Frame-level Semantic Emphasis

Authors:Xiangyue Zhang, Jiangfang Li, Jiaxu Zhang, Ziqiang Dang, Jianqiang Ren, Liefeng Bo, Zhigang Tu

View PDF HTML (experimental)

Abstract:A good co-speech motion generation cannot be achieved without a careful integration of common rhythmic motion and rare yet essential semantic motion. In this work, we propose SemTalk for holistic co-speech motion generation with frame-level semantic emphasis. Our key insight is to separately learn general motions and sparse motions, and then adaptively fuse them. In particular, rhythmic consistency learning is explored to establish rhythm-related base motion, ensuring a coherent foundation that synchronizes gestures with the speech rhythm. Subsequently, textit{semantic emphasis learning is designed to generate semantic-aware sparse motion, focusing on frame-level semantic cues. Finally, to integrate sparse motion into the base motion and generate semantic-emphasized co-speech gestures, we further leverage a learned semantic score for adaptive synthesis. Qualitative and quantitative comparisons on two public datasets demonstrate that our method outperforms the state-of-the-art, delivering high-quality co-speech motion with enhanced semantic richness over a stable base motion.

Comments:	11 pages, 8 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2412.16563 [cs.CV]
	(or arXiv:2412.16563v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2412.16563

Submission history

From: Xiangyue Zhang [view email]
[v1] Sat, 21 Dec 2024 10:16:07 UTC (42,278 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:SemTalk: Holistic Co-speech Motion Generation with Frame-level Semantic Emphasis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:SemTalk: Holistic Co-speech Motion Generation with Frame-level Semantic Emphasis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators