Provably Efficient Exploration in Reward Machines with Low Regret

Bourel, Hippolyte; Jonsson, Anders; Maillard, Odalric-Ambrym; Ma, Chenxiao; Talebi, Mohammad Sadegh

Computer Science > Machine Learning

arXiv:2412.19194 (cs)

[Submitted on 26 Dec 2024]

Title:Provably Efficient Exploration in Reward Machines with Low Regret

Authors:Hippolyte Bourel, Anders Jonsson, Odalric-Ambrym Maillard, Chenxiao Ma, Mohammad Sadegh Talebi

View PDF HTML (experimental)

Abstract:We study reinforcement learning (RL) for decision processes with non-Markovian reward, in which high-level knowledge of the task in the form of reward machines is available to the learner. We consider probabilistic reward machines with initially unknown dynamics, and investigate RL under the average-reward criterion, where the learning performance is assessed through the notion of regret. Our main algorithmic contribution is a model-based RL algorithm for decision processes involving probabilistic reward machines that is capable of exploiting the structure induced by such machines. We further derive high-probability and non-asymptotic bounds on its regret and demonstrate the gain in terms of regret over existing algorithms that could be applied, but obliviously to the structure. We also present a regret lower bound for the studied setting. To the best of our knowledge, the proposed algorithm constitutes the first attempt to tailor and analyze regret specifically for RL with probabilistic reward machines.

Comments:	35 pages
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2412.19194 [cs.LG]
	(or arXiv:2412.19194v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2412.19194

Submission history

From: Mohammad Sadegh Talebi [view email]
[v1] Thu, 26 Dec 2024 12:25:04 UTC (397 KB)

Computer Science > Machine Learning

Title:Provably Efficient Exploration in Reward Machines with Low Regret

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Provably Efficient Exploration in Reward Machines with Low Regret

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators