Essentially Sharp Estimates on the Entropy Regularization Error in Discrete Discounted Markov Decision Processes

Müller, Johannes; Cayci, Semih

Mathematics > Optimization and Control

arXiv:2406.04163 (math)

[Submitted on 6 Jun 2024 (v1), last revised 25 Jun 2024 (this version, v2)]

Title:Essentially Sharp Estimates on the Entropy Regularization Error in Discrete Discounted Markov Decision Processes

Authors:Johannes Müller, Semih Cayci

View PDF HTML (experimental)

Abstract:We study the error introduced by entropy regularization of infinite-horizon discrete discounted Markov decision processes. We show that this error decreases exponentially in the inverse regularization strength both in a weighted KL-divergence and in value with a problem-specific exponent. We provide a lower bound matching our upper bound up to a polynomial factor. Our proof relies on the correspondence of the solutions of entropy-regularized Markov decision processes with gradient flows of the unregularized reward with respect to a Riemannian metric common in natural policy gradient methods. Further, this correspondence allows us to identify the limit of the gradient flow as the generalized maximum entropy optimal policy, thereby characterizing the implicit bias of the Kakade gradient flow which corresponds to a time-continuous version of the natural policy gradient method. We use this to show that for entropy-regularized natural policy gradient methods the overall error decays exponentially in the square root of the number of iterations improving existing sublinear guarantees.

Comments:	26 pages, 1 figure
Subjects:	Optimization and Control (math.OC); Machine Learning (cs.LG); Systems and Control (eess.SY)
MSC classes:	37N40, 65K05, 90C05, 90C40, 90C53
Cite as:	arXiv:2406.04163 [math.OC]
	(or arXiv:2406.04163v2 [math.OC] for this version)
	https://doi.org/10.48550/arXiv.2406.04163

Submission history

From: Johannes Müller [view email]
[v1] Thu, 6 Jun 2024 15:20:37 UTC (44 KB)
[v2] Tue, 25 Jun 2024 10:26:49 UTC (46 KB)

Mathematics > Optimization and Control

Title:Essentially Sharp Estimates on the Entropy Regularization Error in Discrete Discounted Markov Decision Processes

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Mathematics > Optimization and Control

Title:Essentially Sharp Estimates on the Entropy Regularization Error in Discrete Discounted Markov Decision Processes

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators