FOSP: Fine-tuning Offline Safe Policy through World Models

Cao, Chenyang; Xin, Yucheng; Wu, Silang; He, Longxiang; Yan, Zichen; Tan, Junbo; Wang, Xueqian

Computer Science > Robotics

arXiv:2407.04942 (cs)

[Submitted on 6 Jul 2024]

Title:FOSP: Fine-tuning Offline Safe Policy through World Models

Authors:Chenyang Cao, Yucheng Xin, Silang Wu, Longxiang He, Zichen Yan, Junbo Tan, Xueqian Wang

View PDF HTML (experimental)

Abstract:Model-based Reinforcement Learning (RL) has shown its high training efficiency and capability of handling high-dimensional tasks. Regarding safety issues, safe model-based RL can achieve nearly zero-cost performance and effectively manage the trade-off between performance and safety. Nevertheless, prior works still pose safety challenges due to the online exploration in real-world deployment. To address this, some offline RL methods have emerged as solutions, which learn from a static dataset in a safe way by avoiding interactions with the environment. In this paper, we aim to further enhance safety during the deployment stage for vision-based robotic tasks by fine-tuning an offline-trained policy. We incorporate in-sample optimization, model-based policy expansion, and reachability guidance to construct a safe offline-to-online framework. Moreover, our method proves to improve the generalization of offline policy in unseen safety-constrained scenarios. Finally, the efficiency of our method is validated on simulation benchmarks with five vision-only tasks and a real robot by solving some deployment problems using limited data.

Comments:	21 pages
Subjects:	Robotics (cs.RO); Machine Learning (cs.LG)
Cite as:	arXiv:2407.04942 [cs.RO]
	(or arXiv:2407.04942v1 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2407.04942

Submission history

From: Chenyang Cao [view email]
[v1] Sat, 6 Jul 2024 03:22:57 UTC (2,563 KB)

Computer Science > Robotics

Title:FOSP: Fine-tuning Offline Safe Policy through World Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:FOSP: Fine-tuning Offline Safe Policy through World Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators