Plug-and-Play Training Framework for Preference Optimization

Ma, Jingyuan; Li, Rui; Li, Zheng; Sha, Lei; Sui, Zhifang

Computer Science > Computation and Language

arXiv:2412.20996 (cs)

[Submitted on 30 Dec 2024]

Title:Plug-and-Play Training Framework for Preference Optimization

Authors:Jingyuan Ma, Rui Li, Zheng Li, Lei Sha, Zhifang Sui

View PDF HTML (experimental)

Abstract:Recently, preference optimization methods such as DPO have significantly enhanced large language models (LLMs) in wide tasks including dialogue and question-answering. However, current methods fail to account for the varying difficulty levels of training samples during preference optimization, leading to mediocre performance in tasks with high accuracy requirements, particularly in mathematical reasoning. To address this limitation, we propose a novel training framework, which employs multiple sampling to analyze output distributions, assign different weights to samples, and incorporate these weights into the preference optimization process. This plug-and-play approach enables LLMs to prioritize challenging examples during training, improving learning efficiency. Experimental results demonstrate that our framework integrates seamlessly with various preference optimization methods and achieves consistent improvements in mathematical reasoning tasks.

Comments:	12 pages, 9 figures
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2412.20996 [cs.CL]
	(or arXiv:2412.20996v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.20996

Submission history

From: Jingyuan Ma [view email]
[v1] Mon, 30 Dec 2024 15:01:48 UTC (1,006 KB)

Computer Science > Computation and Language

Title:Plug-and-Play Training Framework for Preference Optimization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Plug-and-Play Training Framework for Preference Optimization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators