Beyond the Binary: Capturing Diverse Preferences With Reward Regularization

Padmakumar, Vishakh; Jin, Chuanyang; Kirk, Hannah Rose; He, He

Computer Science > Computation and Language

arXiv:2412.03822 (cs)

[Submitted on 5 Dec 2024]

Title:Beyond the Binary: Capturing Diverse Preferences With Reward Regularization

Authors:Vishakh Padmakumar, Chuanyang Jin, Hannah Rose Kirk, He He

View PDF HTML (experimental)

Abstract:Large language models (LLMs) are increasingly deployed via public-facing interfaces to interact with millions of users, each with diverse preferences. Despite this, preference tuning of LLMs predominantly relies on reward models trained using binary judgments where annotators select the preferred choice out of pairs of model outputs. In this work, we argue that this reliance on binary choices does not capture the broader, aggregate preferences of the target user in real-world tasks. We propose a taxonomy that identifies two dimensions of subjectivity where different users disagree on the preferred output-namely, the Plurality of Responses to Prompts, where prompts allow for multiple correct answers, and the Indistinguishability of Responses, where candidate outputs are paraphrases of each other. We show that reward models correlate weakly with user preferences in these cases. As a first step to address this issue, we introduce a simple yet effective method that augments existing binary preference datasets with synthetic preference judgments to estimate potential user disagreement. Incorporating these via a margin term as a form of regularization during model training yields predictions that better align with the aggregate user preferences.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2412.03822 [cs.CL]
	(or arXiv:2412.03822v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.03822

Submission history

From: Chuanyang Jin [view email]
[v1] Thu, 5 Dec 2024 02:35:46 UTC (1,017 KB)

Computer Science > Computation and Language

Title:Beyond the Binary: Capturing Diverse Preferences With Reward Regularization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Beyond the Binary: Capturing Diverse Preferences With Reward Regularization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators