Safe Reinforcement Learning in Constrained Markov Decision Processes

Wachi, Akifumi; Sui, Yanan

Computer Science > Machine Learning

arXiv:2008.06626 (cs)

[Submitted on 15 Aug 2020]

Title:Safe Reinforcement Learning in Constrained Markov Decision Processes

Authors:Akifumi Wachi, Yanan Sui

View PDF

Abstract:Safe reinforcement learning has been a promising approach for optimizing the policy of an agent that operates in safety-critical applications. In this paper, we propose an algorithm, SNO-MDP, that explores and optimizes Markov decision processes under unknown safety constraints. Specifically, we take a stepwise approach for optimizing safety and cumulative reward. In our method, the agent first learns safety constraints by expanding the safe region, and then optimizes the cumulative reward in the certified safe region. We provide theoretical guarantees on both the satisfaction of the safety constraint and the near-optimality of the cumulative reward under proper regularity assumptions. In our experiments, we demonstrate the effectiveness of SNO-MDP through two experiments: one uses a synthetic data in a new, openly-available environment named GP-SAFETY-GYM, and the other simulates Mars surface exploration by using real observation data.

Comments:	10 pages, 6 figures, Accepted to ICML2020
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Cite as:	arXiv:2008.06626 [cs.LG]
	(or arXiv:2008.06626v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2008.06626

Submission history

From: Akifumi Wachi [view email]
[v1] Sat, 15 Aug 2020 02:20:23 UTC (2,320 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2020-08

Change to browse by:

cs
cs.AI
cs.RO

References & Citations

DBLP - CS Bibliography

listing | bibtex

Akifumi Wachi
Yanan Sui

export BibTeX citation

Computer Science > Machine Learning

Title:Safe Reinforcement Learning in Constrained Markov Decision Processes

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Safe Reinforcement Learning in Constrained Markov Decision Processes

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators