MetaAligner: Towards Generalizable Multi-Objective Alignment of Language Models

Yang, Kailai; Liu, Zhiwei; Xie, Qianqian; Huang, Jimin; Zhang, Tianlin; Ananiadou, Sophia

Computer Science > Computation and Language

arXiv:2403.17141 (cs)

[Submitted on 25 Mar 2024 (v1), last revised 7 Oct 2024 (this version, v3)]

Title:MetaAligner: Towards Generalizable Multi-Objective Alignment of Language Models

Authors:Kailai Yang, Zhiwei Liu, Qianqian Xie, Jimin Huang, Tianlin Zhang, Sophia Ananiadou

View PDF HTML (experimental)

Abstract:Recent advancements in large language models (LLMs) focus on aligning to heterogeneous human expectations and values via multi-objective preference alignment. However, existing methods are dependent on the policy model parameters, which require high-cost repetition of their alignment algorithms for each new policy model, and they cannot expand to unseen objectives due to their static alignment objectives. In this work, we propose Meta-Objective Aligner (MetaAligner), the first policy-agnostic and generalizable method for multi-objective preference alignment. MetaAligner models multi-objective alignment into three stages: (1) dynamic objectives reformulation algorithm reorganizes traditional alignment datasets to supervise the model on performing flexible alignment across different objectives; (2) conditional weak-to-strong correction paradigm aligns the weak outputs of fixed policy models to approach strong outputs with higher preferences in the corresponding alignment objectives, enabling plug-and-play inferences on any policy models, which significantly reduces training costs and facilitates alignment on close-source policy models; (3) generalizable inference method flexibly adjusts target objectives by updating their text descriptions in the prompts, facilitating generalizable alignment to unseen objectives. Experimental results show that MetaAligner achieves significant and balanced improvements in multi-objective alignments on 10 state-of-the-art policy models, and saves up to 93.63% of GPU training hours compared to previous alignment methods. The model also effectively aligns unseen objectives, marking the first step towards generalizable multi-objective preference alignment.

Comments:	Accepted by NeurIPS 2024 main track
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2403.17141 [cs.CL]
	(or arXiv:2403.17141v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2403.17141

Submission history

From: Kailai Yang [view email]
[v1] Mon, 25 Mar 2024 19:28:10 UTC (4,568 KB)
[v2] Mon, 6 May 2024 14:17:41 UTC (11,658 KB)
[v3] Mon, 7 Oct 2024 03:19:16 UTC (11,975 KB)

Computer Science > Computation and Language

Title:MetaAligner: Towards Generalizable Multi-Objective Alignment of Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:MetaAligner: Towards Generalizable Multi-Objective Alignment of Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators