Preference-grounded Token-level Guidance for Language Model Fine-tuning

Yang, Shentao; Zhang, Shujian; Xia, Congying; Feng, Yihao; Xiong, Caiming; Zhou, Mingyuan

Computer Science > Computation and Language

arXiv:2306.00398 (cs)

[Submitted on 1 Jun 2023 (v1), last revised 9 Oct 2023 (this version, v2)]

Title:Preference-grounded Token-level Guidance for Language Model Fine-tuning

Authors:Shentao Yang, Shujian Zhang, Congying Xia, Yihao Feng, Caiming Xiong, Mingyuan Zhou

View PDF

Abstract:Aligning language models (LMs) with preferences is an important problem in natural language generation. A key challenge is that preferences are typically provided at the *sequence level* while LM training and generation both occur at the *token level*. There is, therefore, a *granularity mismatch* between the preference and the LM training losses, which may complicate the learning problem. In this paper, we address this issue by developing an alternate training process, where we iterate between grounding the sequence-level preference into token-level training guidance, and improving the LM with the learned guidance. For guidance learning, we design a framework that extends the pairwise-preference learning in imitation learning to both variable-length LM generation and the utilization of the preference among multiple generations. For LM training, based on the amount of supervised data, we present two *minimalist* learning objectives that utilize the learned guidance. In experiments, our method performs competitively on two distinct representative LM tasks -- discrete-prompt generation and text summarization.

Comments:	37th Conference on Neural Information Processing Systems (NeurIPS 2023)
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2306.00398 [cs.CL]
	(or arXiv:2306.00398v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2306.00398

Submission history

From: Shentao Yang [view email]
[v1] Thu, 1 Jun 2023 07:00:07 UTC (2,597 KB)
[v2] Mon, 9 Oct 2023 23:31:44 UTC (2,755 KB)

Computer Science > Computation and Language

Title:Preference-grounded Token-level Guidance for Language Model Fine-tuning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Preference-grounded Token-level Guidance for Language Model Fine-tuning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators