Iterative Value Function Optimization for Guided Decoding

Liu, Zhenhua; Li, Lijun; Chen, Ruizhe; Jiang, Yuxian; Zhu, Tong; Su, Zhaochen; Chen, Wenliang; Shao, Jing

Computer Science > Computation and Language

arXiv:2503.02368 (cs)

[Submitted on 4 Mar 2025 (v1), last revised 5 Mar 2025 (this version, v2)]

Title:Iterative Value Function Optimization for Guided Decoding

Authors:Zhenhua Liu, Lijun Li, Ruizhe Chen, Yuxian Jiang, Tong Zhu, Zhaochen Su, Wenliang Chen, Jing Shao

View PDF HTML (experimental)

Abstract:While Reinforcement Learning from Human Feedback (RLHF) has become the predominant method for controlling language model outputs, it suffers from high computational costs and training instability. Guided decoding, especially value-guided methods, offers a cost-effective alternative by controlling outputs without re-training models. However, the accuracy of the value function is crucial for value-guided decoding, as inaccuracies can lead to suboptimal decision-making and degraded performance. Existing methods struggle with accurately estimating the optimal value function, leading to less effective control. We propose Iterative Value Function Optimization, a novel framework that addresses these limitations through two key components: Monte Carlo Value Estimation, which reduces estimation variance by exploring diverse trajectories, and Iterative On-Policy Optimization, which progressively improves value estimation through collecting trajectories from value-guided policies. Extensive experiments on text summarization, multi-turn dialogue, and instruction following demonstrate the effectiveness of value-guided decoding approaches in aligning language models. These approaches not only achieve alignment but also significantly reduce computational costs by leveraging principled value function optimization for efficient and effective control.

Comments:	20 pages, 10 figures
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2503.02368 [cs.CL]
	(or arXiv:2503.02368v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2503.02368

Submission history

From: Zhenhua Liu [view email]
[v1] Tue, 4 Mar 2025 07:49:10 UTC (142 KB)
[v2] Wed, 5 Mar 2025 09:12:25 UTC (142 KB)

Computer Science > Computation and Language

Title:Iterative Value Function Optimization for Guided Decoding

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Iterative Value Function Optimization for Guided Decoding

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators