Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic Perspective

Zhang, Jinouwen; Xue, Rongkun; Niu, Yazhe; Chen, Yun; Yang, Jing; Li, Hongsheng; Liu, Yu

Abstract:Generative models, particularly diffusion models, have achieved remarkable success in density estimation for multimodal data, drawing significant interest from the reinforcement learning (RL) community, especially in policy modeling in continuous action spaces. However, existing works exhibit significant variations in training schemes and RL optimization objectives, and some methods are only applicable to diffusion models. In this study, we compare and analyze various generative policy training and deployment techniques, identifying and validating effective designs for generative policy algorithms. Specifically, we revisit existing training objectives and classify them into two categories, each linked to a simpler approach. The first approach, Generative Model Policy Optimization (GMPO), employs a native advantage-weighted regression formulation as the training objective, which is significantly simpler than previous methods. The second approach, Generative Model Policy Gradient (GMPG), offers a numerically stable implementation of the native policy gradient method. We introduce a standardized experimental framework named GenerativeRL. Our experiments demonstrate that the proposed methods achieve state-of-the-art performance on various offline-RL datasets, offering a unified and practical guideline for training and deploying generative policies.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2412.01245 [cs.LG]
	(or arXiv:2412.01245v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2412.01245

Computer Science > Machine Learning

Title:Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic Perspective

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators