$Q$-learning with Logarithmic Regret

Yang, Kunhe; Yang, Lin F.; Du, Simon S.

Computer Science > Machine Learning

arXiv:2006.09118 (cs)

[Submitted on 16 Jun 2020 (v1), last revised 23 Feb 2021 (this version, v2)]

Title:$Q$-learning with Logarithmic Regret

Authors:Kunhe Yang, Lin F. Yang, Simon S. Du

View PDF

Abstract:This paper presents the first non-asymptotic result showing that a model-free algorithm can achieve a logarithmic cumulative regret for episodic tabular reinforcement learning if there exists a strictly positive sub-optimality gap in the optimal $Q$-function. We prove that the optimistic $Q$-learning studied in [Jin et al. 2018] enjoys a ${\mathcal{O}}\left(\frac{SA\cdot \mathrm{poly}\left(H\right)}{\Delta_{\min}}\log\left(SAT\right)\right)$ cumulative regret bound, where $S$ is the number of states, $A$ is the number of actions, $H$ is the planning horizon, $T$ is the total number of steps, and $\Delta_{\min}$ is the minimum sub-optimality gap. This bound matches the information theoretical lower bound in terms of $S,A,T$ up to a $\log\left(SA\right)$ factor. We further extend our analysis to the discounted setting and obtain a similar logarithmic cumulative regret bound.

Comments:	Accepted by AISTATS 2021
Subjects:	Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Cite as:	arXiv:2006.09118 [cs.LG]
	(or arXiv:2006.09118v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2006.09118

Submission history

From: Kunhe Yang [view email]
[v1] Tue, 16 Jun 2020 13:01:33 UTC (25 KB)
[v2] Tue, 23 Feb 2021 11:44:44 UTC (41 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2020-06

Change to browse by:

cs
math
math.OC
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Lin F. Yang
Simon S. Du

export BibTeX citation

Computer Science > Machine Learning

Title:$Q$-learning with Logarithmic Regret

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:$Q$-learning with Logarithmic Regret

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators