Constraint-Adaptive Policy Switching for Offline Safe Reinforcement Learning

Chemingui, Yassine; Deshwal, Aryan; Wei, Honghao; Fern, Alan; Doppa, Janardhan Rao

Computer Science > Machine Learning

arXiv:2412.18946 (cs)

[Submitted on 25 Dec 2024]

Title:Constraint-Adaptive Policy Switching for Offline Safe Reinforcement Learning

Authors:Yassine Chemingui, Aryan Deshwal, Honghao Wei, Alan Fern, Janardhan Rao Doppa

View PDF HTML (experimental)

Abstract:Offline safe reinforcement learning (OSRL) involves learning a decision-making policy to maximize rewards from a fixed batch of training data to satisfy pre-defined safety constraints. However, adapting to varying safety constraints during deployment without retraining remains an under-explored challenge. To address this challenge, we introduce constraint-adaptive policy switching (CAPS), a wrapper framework around existing offline RL algorithms. During training, CAPS uses offline data to learn multiple policies with a shared representation that optimize different reward and cost trade-offs. During testing, CAPS switches between those policies by selecting at each state the policy that maximizes future rewards among those that satisfy the current cost constraint. Our experiments on 38 tasks from the DSRL benchmark demonstrate that CAPS consistently outperforms existing methods, establishing a strong wrapper-based baseline for OSRL. The code is publicly available at this https URL.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2412.18946 [cs.LG]
	(or arXiv:2412.18946v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2412.18946

Submission history

From: Yassine Chemingui [view email]
[v1] Wed, 25 Dec 2024 16:42:27 UTC (1,621 KB)

Computer Science > Machine Learning

Title:Constraint-Adaptive Policy Switching for Offline Safe Reinforcement Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Constraint-Adaptive Policy Switching for Offline Safe Reinforcement Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators