Robust Q-Learning under Corrupted Rewards

Maity, Sreejeet; Mitra, Aritra

Computer Science > Machine Learning

arXiv:2409.03237 (cs)

[Submitted on 5 Sep 2024]

Title:Robust Q-Learning under Corrupted Rewards

Authors:Sreejeet Maity, Aritra Mitra

View PDF HTML (experimental)

Abstract:Recently, there has been a surge of interest in analyzing the non-asymptotic behavior of model-free reinforcement learning algorithms. However, the performance of such algorithms in non-ideal environments, such as in the presence of corrupted rewards, is poorly understood. Motivated by this gap, we investigate the robustness of the celebrated Q-learning algorithm to a strong-contamination attack model, where an adversary can arbitrarily perturb a small fraction of the observed rewards. We start by proving that such an attack can cause the vanilla Q-learning algorithm to incur arbitrarily large errors. We then develop a novel robust synchronous Q-learning algorithm that uses historical reward data to construct robust empirical Bellman operators at each time step. Finally, we prove a finite-time convergence rate for our algorithm that matches known state-of-the-art bounds (in the absence of attacks) up to a small inevitable $O(\varepsilon)$ error term that scales with the adversarial corruption fraction $\varepsilon$. Notably, our results continue to hold even when the true reward distributions have infinite support, provided they admit bounded second moments.

Comments:	Accepted to the Decision and Control Conference (CDC) 2024
Subjects:	Machine Learning (cs.LG); Systems and Control (eess.SY); Optimization and Control (math.OC); Machine Learning (stat.ML)
Cite as:	arXiv:2409.03237 [cs.LG]
	(or arXiv:2409.03237v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2409.03237

Submission history

From: Aritra Mitra [view email]
[v1] Thu, 5 Sep 2024 04:37:02 UTC (26 KB)

Computer Science > Machine Learning

Title:Robust Q-Learning under Corrupted Rewards

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Robust Q-Learning under Corrupted Rewards

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators