Two-Step Offline Preference-Based Reinforcement Learning with Constrained Actions

Xu, Yinglun; Suresh, Tarun; Gumaste, Rohan; Zhu, David; Li, Ruirui; Wang, Zhengyang; Jiang, Haoming; Tang, Xianfeng; Yin, Qingyu; Cheng, Monica Xiao; Zeng, Qi; Zhang, Chao; Singh, Gagandeep

Computer Science > Machine Learning

arXiv:2401.00330 (cs)

[Submitted on 30 Dec 2023 (v1), last revised 25 Oct 2024 (this version, v3)]

Title:Two-Step Offline Preference-Based Reinforcement Learning with Constrained Actions

Authors:Yinglun Xu, Tarun Suresh, Rohan Gumaste, David Zhu, Ruirui Li, Zhengyang Wang, Haoming Jiang, Xianfeng Tang, Qingyu Yin, Monica Xiao Cheng, Qi Zeng, Chao Zhang, Gagandeep Singh

View PDF HTML (experimental)

Abstract:Preference-based reinforcement learning (PBRL) in the offline setting has succeeded greatly in industrial applications such as chatbots. A two-step learning framework where one applies a reinforcement learning step after a reward modeling step has been widely adopted for the problem. However, such a method faces challenges from the risk of reward hacking and the complexity of reinforcement learning. To overcome the challenge, our insight is that both challenges come from the state-actions not supported in the dataset. Such state-actions are unreliable and increase the complexity of the reinforcement learning problem at the second step. Based on the insight, we develop a novel two-step learning method called PRC: preference-based reinforcement learning with constrained actions. The high-level idea is to limit the reinforcement learning agent to optimize over a constrained action space that excludes the out-of-distribution state-actions. We empirically verify that our method has high learning efficiency on various datasets in robotic control environments.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2401.00330 [cs.LG]
	(or arXiv:2401.00330v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2401.00330

Submission history

From: Yinglun Xu [view email]
[v1] Sat, 30 Dec 2023 21:37:18 UTC (7,056 KB)
[v2] Wed, 23 Oct 2024 19:38:34 UTC (7,533 KB)
[v3] Fri, 25 Oct 2024 17:31:50 UTC (7,533 KB)

Computer Science > Machine Learning

Title:Two-Step Offline Preference-Based Reinforcement Learning with Constrained Actions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Two-Step Offline Preference-Based Reinforcement Learning with Constrained Actions

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators