Outcome-Refining Process Supervision for Code Generation

Yu, Zhuohao; Gu, Weizheng; Wang, Yidong; Zeng, Zhengran; Wang, Jindong; Ye, Wei; Zhang, Shikun

Computer Science > Computation and Language

arXiv:2412.15118 (cs)

[Submitted on 19 Dec 2024]

Title:Outcome-Refining Process Supervision for Code Generation

Authors:Zhuohao Yu, Weizheng Gu, Yidong Wang, Zhengran Zeng, Jindong Wang, Wei Ye, Shikun Zhang

View PDF HTML (experimental)

Abstract:Large Language Models have demonstrated remarkable capabilities in code generation, yet they often struggle with complex programming tasks that require deep algorithmic reasoning. While process supervision through learned reward models shows promise in guiding reasoning steps, it requires expensive training data and suffers from unreliable evaluation. We propose Outcome-Refining Process Supervision, a novel paradigm that treats outcome refinement itself as the process to be supervised. Our framework leverages concrete execution signals to ground the supervision of reasoning steps, while using tree-structured exploration to maintain multiple solution trajectories simultaneously. Experiments demonstrate that our approach enables even smaller models to achieve high success accuracy and performance metrics on competitive programming tasks, creates more reliable verification than traditional reward models without requiring training PRMs. Our approach achieves significant improvements across 5 models and 3 datasets: an average of 26.9% increase in correctness and 42.2% in efficiency. The results suggest that providing structured reasoning space with concrete verification signals is crucial for solving complex programming tasks. We open-source all our code and data at: this https URL

Comments:	18 pages, 5 figures, Code: this https URL
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Software Engineering (cs.SE)
Cite as:	arXiv:2412.15118 [cs.CL]
	(or arXiv:2412.15118v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.15118

Submission history

From: Zhuohao Yu [view email]
[v1] Thu, 19 Dec 2024 17:59:42 UTC (5,789 KB)

Computer Science > Computation and Language

Title:Outcome-Refining Process Supervision for Code Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Outcome-Refining Process Supervision for Code Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators