Game-Theoretic Regularized Self-Play Alignment of Large Language Models

Tang, Xiaohang; Yoon, Sangwoong; Son, Seongho; Yuan, Huizhuo; Gu, Quanquan; Bogunovic, Ilija

Computer Science > Machine Learning

arXiv:2503.00030 (cs)

[Submitted on 24 Feb 2025]

Title:Game-Theoretic Regularized Self-Play Alignment of Large Language Models

Authors:Xiaohang Tang, Sangwoong Yoon, Seongho Son, Huizhuo Yuan, Quanquan Gu, Ilija Bogunovic

View PDF HTML (experimental)

Abstract:Self-play alignment algorithms have been developed as effective methods for fine-tuning large language models (LLMs), formulating preference optimization as a two-player game. However, the regularization with respect to the reference policy, which is crucial for mitigating over-optimization, has been insufficiently investigated in self-play alignment. In this paper, we show that our regularization method can improve the unregularized self-play significantly. To study the impact of different regularizations in self-play alignment, we propose Regularized Self-Play Policy Optimization (RSPO). This generalized framework regularizes the self-play by simply adding a chosen regularization term into the loss while maintaining provable last-iterate convergence to the Nash Equilibrium of the corresponding regularized game. Surprisingly, empirical evaluations using the Mistral-7B-Instruct base model reveal that forward KL divergence regularization reduces response length in RSPO, whereas reverse KL divergence markedly improves raw win rates. RSPO with a linear combination of forward and reverse KL divergence regularization substantially increases the length-controlled win rate in AlpacaEval-2, elevating the unregularized self-play alignment method (SPPO) from $28.53\%$ to $35.44\%$. Finally, we show that RSPO also improves the response diversity.

Comments:	Preprint
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2503.00030 [cs.LG]
	(or arXiv:2503.00030v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2503.00030

Submission history

From: Xiaohang Tang [view email]
[v1] Mon, 24 Feb 2025 22:43:21 UTC (172 KB)

Computer Science > Machine Learning

Title:Game-Theoretic Regularized Self-Play Alignment of Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Game-Theoretic Regularized Self-Play Alignment of Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators