Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning

Xu, Yixuan Even; Savani, Yash; Fang, Fei; Kolter, Zico

Computer Science > Machine Learning

arXiv:2504.13818 (cs)

[Submitted on 18 Apr 2025]

Title:Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning

Authors:Yixuan Even Xu, Yash Savani, Fei Fang, Zico Kolter

View PDF HTML (experimental)

Abstract:Reinforcement learning (RL) has emerged as a powerful paradigm for enhancing reasoning capabilities in large language models, but faces a fundamental asymmetry in computation and memory requirements: inference is embarrassingly parallel with a minimal memory footprint, while policy updates require extensive synchronization and are memory-intensive. To address this asymmetry, we introduce PODS (Policy Optimization with Down-Sampling), a framework that strategically decouples these phases by generating numerous rollouts in parallel but updating only on an informative subset. Within this framework, we develop max-variance down-sampling, a theoretically motivated method that selects rollouts with maximally diverse reward signals. We prove that this approach has an efficient algorithmic solution, and empirically demonstrate that GRPO with PODS using max-variance down-sampling achieves superior performance over standard GRPO on the GSM8K benchmark.

Comments:	9 pages, 1 figure
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2504.13818 [cs.LG]
	(or arXiv:2504.13818v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2504.13818

Submission history

From: Yixuan Even Xu [view email]
[v1] Fri, 18 Apr 2025 17:49:55 UTC (161 KB)

Computer Science > Machine Learning

Title:Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators