VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors

Zhu, Yifeng; Joshi, Abhishek; Stone, Peter; Zhu, Yuke

Computer Science > Robotics

arXiv:2210.11339 (cs)

[Submitted on 20 Oct 2022 (v1), last revised 8 Mar 2023 (this version, v2)]

Title:VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors

Authors:Yifeng Zhu, Abhishek Joshi, Peter Stone, Yuke Zhu

View PDF

Abstract:We introduce VIOLA, an object-centric imitation learning approach to learning closed-loop visuomotor policies for robot manipulation. Our approach constructs object-centric representations based on general object proposals from a pre-trained vision model. VIOLA uses a transformer-based policy to reason over these representations and attend to the task-relevant visual factors for action prediction. Such object-based structural priors improve deep imitation learning algorithm's robustness against object variations and environmental perturbations. We quantitatively evaluate VIOLA in simulation and on real robots. VIOLA outperforms the state-of-the-art imitation learning methods by $45.8\%$ in success rate. It has also been deployed successfully on a physical robot to solve challenging long-horizon tasks, such as dining table arrangement and coffee making. More videos and model details can be found in supplementary material and the project website: this https URL .

Comments:	Published at the 6th Conference on Robot Learning
Subjects:	Robotics (cs.RO)
Cite as:	arXiv:2210.11339 [cs.RO]
	(or arXiv:2210.11339v2 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2210.11339

Submission history

From: Yifeng Zhu [view email]
[v1] Thu, 20 Oct 2022 15:20:37 UTC (12,369 KB)
[v2] Wed, 8 Mar 2023 17:46:19 UTC (12,369 KB)

Computer Science > Robotics

Title:VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators