SDPO: Segment-Level Direct Preference Optimization for Social Agents

Kong, Aobo; Ma, Wentao; Zhao, Shiwan; Li, Yongbin; Wu, Yuchuan; Wang, Ke; Liu, Xiaoqian; Li, Qicheng; Qin, Yong; Huang, Fei

Computer Science > Artificial Intelligence

arXiv:2501.01821 (cs)

[Submitted on 3 Jan 2025]

Title:SDPO: Segment-Level Direct Preference Optimization for Social Agents

Authors:Aobo Kong, Wentao Ma, Shiwan Zhao, Yongbin Li, Yuchuan Wu, Ke Wang, Xiaoqian Liu, Qicheng Li, Yong Qin, Fei Huang

View PDF HTML (experimental)

Abstract:Social agents powered by large language models (LLMs) can simulate human social behaviors but fall short in handling complex goal-oriented social dialogues. Direct Preference Optimization (DPO) has proven effective in aligning LLM behavior with human preferences across a variety of agent tasks. Existing DPO-based approaches for multi-turn interactions are divided into turn-level and session-level methods. The turn-level method is overly fine-grained, focusing exclusively on individual turns, while session-level methods are too coarse-grained, often introducing training noise. To address these limitations, we propose Segment-Level Direct Preference Optimization (SDPO), which focuses on specific key segments within interactions to optimize multi-turn agent behavior while minimizing training noise. Evaluations on the SOTOPIA benchmark demonstrate that SDPO-tuned agents consistently outperform both existing DPO-based methods and proprietary LLMs like GPT-4o, underscoring SDPO's potential to advance the social intelligence of LLM-based agents. We release our code and data at this https URL.

Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2501.01821 [cs.AI]
	(or arXiv:2501.01821v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2501.01821

Submission history

From: Aobo Kong [view email]
[v1] Fri, 3 Jan 2025 14:09:46 UTC (989 KB)

Computer Science > Artificial Intelligence

Title:SDPO: Segment-Level Direct Preference Optimization for Social Agents

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:SDPO: Segment-Level Direct Preference Optimization for Social Agents

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators