Constrained Exploration in Reinforcement Learning with Optimality Preservation

Chen, Peter C. Y.

Abstract:We consider a class of reinforcement-learning systems in which the agent follows a behavior policy to explore a discrete state-action space to find an optimal policy while adhering to some restriction on its behavior. Such restriction may prevent the agent from visiting some state-action pairs, possibly leading to the agent finding only a sub-optimal policy. To address this problem we introduce the concept of constrained exploration with optimality preservation, whereby the exploration behavior of the agent is constrained to meet a specification while the optimality of the (original) unconstrained learning process is preserved. We first establish a feedback-control structure that models the dynamics of the unconstrained learning process. We then extend this structure by adding a supervisor to ensure that the behavior of the agent meets the specification, and establish (for a class of reinforcement-learning problems with a known deterministic environment) a necessary and sufficient condition under which optimality is preserved. This work demonstrates the utility and the prospect of studying reinforcement-learning problems in the context of the theories of discrete-event systems, automata and formal languages.

Comments:	33 pages, and 6 figures
Subjects:	Machine Learning (cs.LG); Systems and Control (eess.SY)
Cite as:	arXiv:2304.03104 [cs.LG]
	(or arXiv:2304.03104v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2304.03104

Computer Science > Machine Learning

Title:Constrained Exploration in Reinforcement Learning with Optimality Preservation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators