EigenActor: Variant Body-Object Interaction Generation Evolved from Invariant Action Basis Reasoning

Gao, Xuehao; Yang, Yang; Du, Shaoyi; Wu, Yang; Liu, Yebin; Qi, Guo-Jun

Abstract:This paper explores a cross-modality synthesis task that infers 3D human-object interactions (HOIs) from a given text-based instruction. Existing text-to-HOI synthesis methods mainly deploy a direct mapping from texts to object-specific 3D body motions, which may encounter a performance bottleneck since the huge cross-modality gap. In this paper, we observe that those HOI samples with the same interaction intention toward different targets, e.g., "lift a chair" and "lift a cup", always encapsulate similar action-specific body motion patterns while characterizing different object-specific interaction styles. Thus, learning effective action-specific motion priors and object-specific interaction priors is crucial for a text-to-HOI model and dominates its performances on text-HOI semantic consistency and body-object interaction realism. In light of this, we propose a novel body pose generation strategy for the text-to-HOI task: infer object-agnostic canonical body action first and then enrich object-specific interaction styles. Specifically, the first canonical body action inference stage focuses on learning intra-class shareable body motion priors and mapping given text-based semantics to action-specific canonical 3D body motions. Then, in the object-specific interaction inference stage, we focus on object affordance learning and enrich object-specific interaction styles on an inferred action-specific body motion basis. Extensive experiments verify that our proposed text-to-HOI synthesis system significantly outperforms other SOTA methods on three large-scale datasets with better semantic consistency and interaction realism performances.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2503.00382 [cs.CV]
	(or arXiv:2503.00382v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2503.00382

Computer Science > Computer Vision and Pattern Recognition

Title:EigenActor: Variant Body-Object Interaction Generation Evolved from Invariant Action Basis Reasoning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators