Learning safety critics via a non-contractive binary bellman operator

Castellano, Agustin; Min, Hancheng; Bazerque, Juan Andrés; Mallada, Enrique

Computer Science > Machine Learning

arXiv:2401.12849 (cs)

[Submitted on 23 Jan 2024]

Title:Learning safety critics via a non-contractive binary bellman operator

Authors:Agustin Castellano, Hancheng Min, Juan Andrés Bazerque, Enrique Mallada

View PDF HTML (experimental)

Abstract:The inability to naturally enforce safety in Reinforcement Learning (RL), with limited failures, is a core challenge impeding its use in real-world applications. One notion of safety of vast practical relevance is the ability to avoid (unsafe) regions of the state space. Though such a safety goal can be captured by an action-value-like function, a.k.a. safety critics, the associated operator lacks the desired contraction and uniqueness properties that the classical Bellman operator enjoys. In this work, we overcome the non-contractiveness of safety critic operators by leveraging that safety is a binary property. To that end, we study the properties of the binary safety critic associated with a deterministic dynamical system that seeks to avoid reaching an unsafe region. We formulate the corresponding binary Bellman equation (B2E) for safety and study its properties. While the resulting operator is still non-contractive, we fully characterize its fixed points representing--except for a spurious solution--maximal persistently safe regions of the state space that can always avoid failure. We provide an algorithm that, by design, leverages axiomatic knowledge of safe data to avoid spurious fixed points.

Subjects:	Machine Learning (cs.LG); Systems and Control (eess.SY)
Cite as:	arXiv:2401.12849 [cs.LG]
	(or arXiv:2401.12849v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2401.12849

Submission history

From: Agustin Castellano [view email]
[v1] Tue, 23 Jan 2024 15:33:30 UTC (869 KB)

Computer Science > Machine Learning

Title:Learning safety critics via a non-contractive binary bellman operator

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Learning safety critics via a non-contractive binary bellman operator

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators