Divergence-Augmented Policy Optimization

Wang, Qing; Li, Yingru; Xiong, Jiechao; Zhang, Tong

Computer Science > Machine Learning

arXiv:2501.15034 (cs)

[Submitted on 25 Jan 2025]

Title:Divergence-Augmented Policy Optimization

Authors:Qing Wang, Yingru Li, Jiechao Xiong, Tong Zhang

View PDF HTML (experimental)

Abstract:In deep reinforcement learning, policy optimization methods need to deal with issues such as function approximation and the reuse of off-policy data. Standard policy gradient methods do not handle off-policy data well, leading to premature convergence and instability. This paper introduces a method to stabilize policy optimization when off-policy data are reused. The idea is to include a Bregman divergence between the behavior policy that generates the data and the current policy to ensure small and safe policy updates with off-policy data. The Bregman divergence is calculated between the state distributions of two policies, instead of only on the action probabilities, leading to a divergence augmentation formulation. Empirical experiments on Atari games show that in the data-scarce scenario where the reuse of off-policy data becomes necessary, our method can achieve better performance than other state-of-the-art deep reinforcement learning algorithms.

Comments:	33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:2501.15034 [cs.LG]
	(or arXiv:2501.15034v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2501.15034

Submission history

From: Yingru Li [view email]
[v1] Sat, 25 Jan 2025 02:35:46 UTC (416 KB)

Computer Science > Machine Learning

Title:Divergence-Augmented Policy Optimization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Divergence-Augmented Policy Optimization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators