Safe-Critical Modular Deep Reinforcement Learning with Temporal Logic through Gaussian Processes and Control Barrier Functions

Cai, Mingyu; Vasile, Cristian-Ioan

Computer Science > Robotics

arXiv:2109.02791v1 (cs)

[Submitted on 7 Sep 2021 (this version), latest version 26 Aug 2022 (v7)]

Title:Safe-Critical Modular Deep Reinforcement Learning with Temporal Logic through Gaussian Processes and Control Barrier Functions

Authors:Mingyu Cai, Cristian-Ioan Vasile

View PDF

Abstract:Reinforcement learning (RL) is a promising approach and has limited success towards real-world applications, because ensuring safe exploration or facilitating adequate exploitation is a challenges for controlling robotic systems with unknown models and measurement uncertainties. Such a learning problem becomes even more intractable for complex tasks over continuous space (state-space and action-space). In this paper, we propose a learning-based control framework consisting of several aspects: (1) linear temporal logic (LTL) is leveraged to facilitate complex tasks over an infinite horizons which can be translated to a novel automaton structure; (2) we propose an innovative reward scheme for RL-agent with the formal guarantee such that global optimal policies maximize the probability of satisfying the LTL specifications; (3) based on a reward shaping technique, we develop a modular policy-gradient architecture utilizing the benefits of automaton structures to decompose overall tasks and facilitate the performance of learned controllers; (4) by incorporating Gaussian Processes (GPs) to estimate the uncertain dynamic systems, we synthesize a model-based safeguard using Exponential Control Barrier Functions (ECBFs) to address problems with high-order relative degrees. In addition, we utilize the properties of LTL automatons and ECBFs to construct a guiding process to further improve the efficiency of exploration. Finally, we demonstrate the effectiveness of the framework via several robotic environments. And we show such an ECBF-based modular deep RL algorithm achieves near-perfect success rates and guard safety with a high probability confidence during training.

Comments:	Under Review
Subjects:	Robotics (cs.RO); Formal Languages and Automata Theory (cs.FL); Machine Learning (cs.LG)
Cite as:	arXiv:2109.02791 [cs.RO]
	(or arXiv:2109.02791v1 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2109.02791

Submission history

From: Mingyu Cai [view email]
[v1] Tue, 7 Sep 2021 00:51:12 UTC (1,734 KB)
[v2] Thu, 7 Oct 2021 12:19:15 UTC (1,734 KB)
[v3] Thu, 4 Nov 2021 14:44:15 UTC (3,334 KB)
[v4] Thu, 11 Nov 2021 15:53:18 UTC (3,335 KB)
[v5] Wed, 15 Dec 2021 20:28:37 UTC (4,386 KB)
[v6] Sun, 22 May 2022 02:22:32 UTC (3,918 KB)
[v7] Fri, 26 Aug 2022 14:14:52 UTC (7,328 KB)

Computer Science > Robotics

Title:Safe-Critical Modular Deep Reinforcement Learning with Temporal Logic through Gaussian Processes and Control Barrier Functions

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:Safe-Critical Modular Deep Reinforcement Learning with Temporal Logic through Gaussian Processes and Control Barrier Functions

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators