Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback

Shen, Wei; Liu, Guanlin; Wu, Zheng; Zhu, Ruofei; Yang, Qingping; Xin, Chao; Yue, Yu; Yan, Lin

Computer Science > Machine Learning

arXiv:2503.22230 (cs)

[Submitted on 28 Mar 2025 (v1), last revised 2 Apr 2025 (this version, v3)]

Title:Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback

Authors:Wei Shen, Guanlin Liu, Zheng Wu, Ruofei Zhu, Qingping Yang, Chao Xin, Yu Yue, Lin Yan

View PDF HTML (experimental)

Abstract:Reinforcement Learning from Human Feedback (RLHF) is crucial for aligning large language models with human preferences. While recent research has focused on algorithmic improvements, the importance of prompt-data construction has been overlooked. This paper addresses this gap by exploring data-driven bottlenecks in RLHF performance scaling, particularly reward hacking and decreasing response diversity. We introduce a hybrid reward system combining reasoning task verifiers (RTV) and a generative reward model (GenRM) to mitigate reward hacking. We also propose a novel prompt-selection method, Pre-PPO, to maintain response diversity and enhance learning effectiveness. Additionally, we find that prioritizing mathematical and coding tasks early in RLHF training significantly improves performance. Experiments across two model sizes validate our methods' effectiveness and scalability. Results show that RTV is most resistant to reward hacking, followed by GenRM with ground truth, and then GenRM with SFT Best-of-N responses. Our strategies enable rapid capture of subtle task-specific distinctions, leading to substantial improvements in overall RLHF performance. This work highlights the importance of careful data construction and provides practical methods to overcome performance barriers in RLHF.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2503.22230 [cs.LG]
	(or arXiv:2503.22230v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2503.22230

Submission history

From: Wei Shen [view email]
[v1] Fri, 28 Mar 2025 08:26:41 UTC (4,904 KB)
[v2] Mon, 31 Mar 2025 13:09:14 UTC (5,008 KB)
[v3] Wed, 2 Apr 2025 13:26:34 UTC (5,008 KB)

Computer Science > Machine Learning

Title:Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators