Best Policy Identification in discounted MDPs: Problem-specific Sample Complexity

Marjani, Aymen Al; Proutiere, Alexandre

Statistics > Machine Learning

arXiv:2009.13405v1 (stat)

[Submitted on 28 Sep 2020 (this version), latest version 10 May 2021 (v4)]

Title:Best Policy Identification in discounted MDPs: Problem-specific Sample Complexity

Authors:Aymen Al Marjani, Alexandre Proutiere

View PDF

Abstract:We investigate the problem of best-policy identification in discounted Markov Decision Processes (MDPs) with finite state and action spaces. We assume that the agent has access to a generative model and that the MDP possesses a unique optimal policy. In this setting, we derive a problem-specific lower bound of the sample complexity satisfied by any learning algorithm. This lower bound corresponds to an optimal sample allocation that solves a non-convex program, and hence, is hard to exploit in the design of efficient algorithms. We provide a simple and tight upper bound of the sample complexity lower bound, whose corresponding nearly-optimal sample allocation becomes explicit. The upper bound depends on specific functionals of the MDP such as the sub-optimal gaps and the variance of the next-state value function, and thus really summarizes the hardness of the MDP. We devise KLB-TS (KL Ball Track-and-Stop), an algorithm tracking this nearly-optimal allocation, and provide asymptotic guarantees for its sample complexity (both almost surely and in expectation). The advantages of KLB-TS against state-of-the-art algorithms are finally discussed.

Comments:	44 pages
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2009.13405 [stat.ML]
	(or arXiv:2009.13405v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2009.13405

Submission history

From: Aymen Al Marjani [view email]
[v1] Mon, 28 Sep 2020 15:22:24 UTC (2,315 KB)
[v2] Thu, 15 Oct 2020 17:15:58 UTC (192 KB)
[v3] Fri, 16 Oct 2020 15:45:40 UTC (192 KB)
[v4] Mon, 10 May 2021 16:40:20 UTC (195 KB)

Statistics > Machine Learning

Title:Best Policy Identification in discounted MDPs: Problem-specific Sample Complexity

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Best Policy Identification in discounted MDPs: Problem-specific Sample Complexity

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators