Pre-training Auto-regressive Robotic Models with 4D Representations

Niu, Dantong; Sharma, Yuvan; Xue, Haoru; Biamby, Giscard; Zhang, Junyi; Ji, Ziteng; Darrell, Trevor; Herzig, Roei

Computer Science > Robotics

arXiv:2502.13142 (cs)

[Submitted on 18 Feb 2025]

Title:Pre-training Auto-regressive Robotic Models with 4D Representations

Authors:Dantong Niu, Yuvan Sharma, Haoru Xue, Giscard Biamby, Junyi Zhang, Ziteng Ji, Trevor Darrell, Roei Herzig

View PDF HTML (experimental)

Abstract:Foundation models pre-trained on massive unlabeled datasets have revolutionized natural language and computer vision, exhibiting remarkable generalization capabilities, thus highlighting the importance of pre-training. Yet, efforts in robotics have struggled to achieve similar success, limited by either the need for costly robotic annotations or the lack of representations that effectively model the physical world. In this paper, we introduce ARM4R, an Auto-regressive Robotic Model that leverages low-level 4D Representations learned from human video data to yield a better pre-trained robotic model. Specifically, we focus on utilizing 3D point tracking representations from videos derived by lifting 2D representations into 3D space via monocular depth estimation across time. These 4D representations maintain a shared geometric structure between the points and robot state representations up to a linear transformation, enabling efficient transfer learning from human video data to low-level robotic control. Our experiments show that ARM4R can transfer efficiently from human video data to robotics and consistently improves performance on tasks across various robot environments and configurations.

Subjects:	Robotics (cs.RO); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2502.13142 [cs.RO]
	(or arXiv:2502.13142v1 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2502.13142

Submission history

From: Dantong Niu [view email]
[v1] Tue, 18 Feb 2025 18:59:01 UTC (36,451 KB)

Computer Science > Robotics

Title:Pre-training Auto-regressive Robotic Models with 4D Representations

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:Pre-training Auto-regressive Robotic Models with 4D Representations

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators