CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models

Lin, Zhihang; Lin, Mingbao; Xie, Yuan; Ji, Rongrong

Computer Science > Artificial Intelligence

arXiv:2503.22342 (cs)

[Submitted on 28 Mar 2025]

Title:CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models

Authors:Zhihang Lin, Mingbao Lin, Yuan Xie, Rongrong Ji

View PDF HTML (experimental)

Abstract:This paper introduces Completion Pruning Policy Optimization (CPPO) to accelerate the training of reasoning models based on Group Relative Policy Optimization (GRPO). GRPO, while effective, incurs high training costs due to the need for sampling multiple completions for each question. Our experiment and theoretical analysis reveals that the number of completions impacts model accuracy yet increases training time multiplicatively, and not all completions contribute equally to policy training -- their contribution depends on their relative advantage. To address these issues, we propose CPPO, which prunes completions with low absolute advantages, significantly reducing the number needed for gradient calculation and updates. Additionally, we introduce a dynamic completion allocation strategy to maximize GPU utilization by incorporating additional questions, further enhancing training efficiency. Experimental results demonstrate that CPPO achieves up to $8.32\times$ speedup on GSM8K and $3.51\times$ on Math while preserving or even enhancing the accuracy compared to the original GRPO. We release our code at this https URL.

Comments:	16 pages
Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2503.22342 [cs.AI]
	(or arXiv:2503.22342v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2503.22342

Submission history

From: Zhihang Lin [view email]
[v1] Fri, 28 Mar 2025 11:30:05 UTC (199 KB)

Computer Science > Artificial Intelligence

Title:CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators