Kernelized Reinforcement Learning with Order Optimal Regret Bounds

Vakili, Sattar; Olkhovskaya, Julia

Computer Science > Machine Learning

arXiv:2306.07745 (cs)

[Submitted on 13 Jun 2023 (v1), last revised 14 Mar 2024 (this version, v3)]

Title:Kernelized Reinforcement Learning with Order Optimal Regret Bounds

Authors:Sattar Vakili, Julia Olkhovskaya

View PDF HTML (experimental)

Abstract:Reinforcement learning (RL) has shown empirical success in various real world settings with complex models and large state-action spaces. The existing analytical results, however, typically focus on settings with a small number of state-actions or simple models such as linearly modeled state-action value functions. To derive RL policies that efficiently handle large state-action spaces with more general value functions, some recent works have considered nonlinear function approximation using kernel ridge regression. We propose $\pi$-KRVI, an optimistic modification of least-squares value iteration, when the state-action value function is represented by a reproducing kernel Hilbert space (RKHS). We prove the first order-optimal regret guarantees under a general setting. Our results show a significant polynomial in the number of episodes improvement over the state of the art. In particular, with highly non-smooth kernels (such as Neural Tangent kernel or some Matérn kernels) the existing results lead to trivial (superlinear in the number of episodes) regret bounds. We show a sublinear regret bound that is order optimal in the case of Matérn kernels where a lower bound on regret is known.

Comments:	Advances in Neural Information Processing Systems (NeurIPS), 2023. In the previous version, we utilized Lemma C.1 from Yang et al., 2020a to bound the RKHS norm of the kernel ridge predictor. In the current version, this is proven in Lemma 5
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:2306.07745 [cs.LG]
	(or arXiv:2306.07745v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2306.07745

Submission history

From: Sattar Vakili [view email]
[v1] Tue, 13 Jun 2023 13:01:42 UTC (81 KB)
[v2] Tue, 28 Nov 2023 11:11:54 UTC (83 KB)
[v3] Thu, 14 Mar 2024 13:36:01 UTC (83 KB)

Computer Science > Machine Learning

Title:Kernelized Reinforcement Learning with Order Optimal Regret Bounds

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Kernelized Reinforcement Learning with Order Optimal Regret Bounds

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators