CASIM: Composite Aware Semantic Injection for Text to Motion Generation

Chang, Che-Jui; Liu, Qingze Tony; Zhou, Honglu; Pavlovic, Vladimir; Kapadia, Mubbasir

Computer Science > Computer Vision and Pattern Recognition

arXiv:2502.02063 (cs)

[Submitted on 4 Feb 2025]

Title:CASIM: Composite Aware Semantic Injection for Text to Motion Generation

Authors:Che-Jui Chang, Qingze Tony Liu, Honglu Zhou, Vladimir Pavlovic, Mubbasir Kapadia

View PDF HTML (experimental)

Abstract:Recent advances in generative modeling and tokenization have driven significant progress in text-to-motion generation, leading to enhanced quality and realism in generated motions. However, effectively leveraging textual information for conditional motion generation remains an open challenge. We observe that current approaches, primarily relying on fixed-length text embeddings (e.g., CLIP) for global semantic injection, struggle to capture the composite nature of human motion, resulting in suboptimal motion quality and controllability. To address this limitation, we propose the Composite Aware Semantic Injection Mechanism (CASIM), comprising a composite-aware semantic encoder and a text-motion aligner that learns the dynamic correspondence between text and motion tokens. Notably, CASIM is model and representation-agnostic, readily integrating with both autoregressive and diffusion-based methods. Experiments on HumanML3D and KIT benchmarks demonstrate that CASIM consistently improves motion quality, text-motion alignment, and retrieval scores across state-of-the-art methods. Qualitative analyses further highlight the superiority of our composite-aware approach over fixed-length semantic injection, enabling precise motion control from text prompts and stronger generalization to unseen text inputs.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
Cite as:	arXiv:2502.02063 [cs.CV]
	(or arXiv:2502.02063v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2502.02063

Submission history

From: Che-Jui Chang [view email]
[v1] Tue, 4 Feb 2025 07:22:07 UTC (975 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:CASIM: Composite Aware Semantic Injection for Text to Motion Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:CASIM: Composite Aware Semantic Injection for Text to Motion Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators