Gibbs Sampling from Human Feedback: A Provable KL- constrained Framework for RLHF

Xiong, Wei; Dong, Hanze; Ye, Chenlu; Zhong, Han; Jiang, Nan; Zhang, Tong

Computer Science > Machine Learning

arXiv:2312.11456v1 (cs)

[Submitted on 18 Dec 2023 (this version), latest version 1 May 2024 (v4)]

Title:Gibbs Sampling from Human Feedback: A Provable KL- constrained Framework for RLHF

Authors:Wei Xiong, Hanze Dong, Chenlu Ye, Han Zhong, Nan Jiang, Tong Zhang

View PDF HTML (experimental)

Abstract:This paper studies the theoretical framework of the alignment process of generative models with Reinforcement Learning from Human Feedback (RLHF). We consider a standard mathematical formulation, the reverse-KL regularized contextual bandit for RLHF. Despite its widespread practical application, a rigorous theoretical analysis of this formulation remains open. We investigate its theoretical properties both in offline and online settings and propose efficient algorithms with finite-sample theoretical guarantees. Our work bridges the gap between theory and practice by linking our theoretical insights with existing practical alignment algorithms such as Direct Preference Optimization (DPO) and Rejection Sampling Optimization (RSO). Furthermore, these findings and connections also offer both theoretical and practical communities new tools and insights for future algorithmic design of alignment algorithms.

Comments:	31 pages
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:2312.11456 [cs.LG]
	(or arXiv:2312.11456v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2312.11456

Submission history

From: Wei Xiong [view email]
[v1] Mon, 18 Dec 2023 18:58:42 UTC (3,186 KB)
[v2] Sun, 28 Jan 2024 22:32:48 UTC (3,099 KB)
[v3] Tue, 20 Feb 2024 06:14:42 UTC (3,100 KB)
[v4] Wed, 1 May 2024 14:50:56 UTC (2,905 KB)

Computer Science > Machine Learning

Title:Gibbs Sampling from Human Feedback: A Provable KL- constrained Framework for RLHF

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Gibbs Sampling from Human Feedback: A Provable KL- constrained Framework for RLHF

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators