Fine-Tuning Language Models with Advantage-Induced Policy Alignment

Zhu, Banghua; Sharma, Hiteshi; Frujeri, Felipe Vieira; Dong, Shi; Zhu, Chenguang; Jordan, Michael I.; Jiao, Jiantao

Computer Science > Computation and Language

arXiv:2306.02231 (cs)

[Submitted on 4 Jun 2023 (v1), last revised 2 Nov 2023 (this version, v3)]

Title:Fine-Tuning Language Models with Advantage-Induced Policy Alignment

Authors:Banghua Zhu, Hiteshi Sharma, Felipe Vieira Frujeri, Shi Dong, Chenguang Zhu, Michael I. Jordan, Jiantao Jiao

View PDF

Abstract:Reinforcement learning from human feedback (RLHF) has emerged as a reliable approach to aligning large language models (LLMs) to human preferences. Among the plethora of RLHF techniques, proximal policy optimization (PPO) is of the most widely used methods. Despite its popularity, however, PPO may suffer from mode collapse, instability, and poor sample efficiency. We show that these issues can be alleviated by a novel algorithm that we refer to as Advantage-Induced Policy Alignment (APA), which leverages a squared error loss function based on the estimated advantages. We demonstrate empirically that APA consistently outperforms PPO in language tasks by a large margin, when a separate reward model is employed as the evaluator. In addition, compared with PPO, APA offers a more stable form of control over the deviation from the model's initial policy, ensuring that the model improves its performance without collapsing to deterministic output. In addition to empirical results, we also provide a theoretical justification supporting the design of our loss function.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)
Cite as:	arXiv:2306.02231 [cs.CL]
	(or arXiv:2306.02231v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2306.02231

Submission history

From: Banghua Zhu [view email]
[v1] Sun, 4 Jun 2023 01:59:40 UTC (114 KB)
[v2] Tue, 6 Jun 2023 23:04:34 UTC (114 KB)
[v3] Thu, 2 Nov 2023 22:47:14 UTC (734 KB)

Computer Science > Computation and Language

Title:Fine-Tuning Language Models with Advantage-Induced Policy Alignment

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Fine-Tuning Language Models with Advantage-Induced Policy Alignment

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators