Ensuring Safety in an Uncertain Environment: Constrained MDPs via Stochastic Thresholds

Zuo, Qian; He, Fengxiang

Computer Science > Machine Learning

arXiv:2504.04973 (cs)

[Submitted on 7 Apr 2025]

Title:Ensuring Safety in an Uncertain Environment: Constrained MDPs via Stochastic Thresholds

Authors:Qian Zuo, Fengxiang He

View PDF HTML (experimental)

Abstract:This paper studies constrained Markov decision processes (CMDPs) with constraints against stochastic thresholds, aiming at safety of reinforcement learning in unknown and uncertain environments. We leverage a Growing-Window estimator sampling from interactions with the uncertain and dynamic environment to estimate the thresholds, based on which we design Stochastic Pessimistic-Optimistic Thresholding (SPOT), a novel model-based primal-dual algorithm for multiple constraints against stochastic thresholds. SPOT enables reinforcement learning under both pessimistic and optimistic threshold settings. We prove that our algorithm achieves sublinear regret and constraint violation; i.e., a reward regret of $\tilde{\mathcal{O}}(\sqrt{T})$ while allowing an $\tilde{\mathcal{O}}(\sqrt{T})$ constraint violation over $T$ episodes. The theoretical guarantees show that our algorithm achieves performance comparable to that of an approach relying on fixed and clear thresholds. To the best of our knowledge, SPOT is the first reinforcement learning algorithm that realises theoretical guaranteed performance in an uncertain environment where even thresholds are unknown.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:2504.04973 [cs.LG]
	(or arXiv:2504.04973v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2504.04973

Submission history

From: Qian Zuo [view email]
[v1] Mon, 7 Apr 2025 11:58:19 UTC (114 KB)

Computer Science > Machine Learning

Title:Ensuring Safety in an Uncertain Environment: Constrained MDPs via Stochastic Thresholds

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Ensuring Safety in an Uncertain Environment: Constrained MDPs via Stochastic Thresholds

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators