DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models

Jung, Sunghee; Lee, Donghun; Lee, Shinbok; Seo, Gaeun; Lee, Daniel; Ko, Byeongil; Cho, Junrae; Kim, Kihyun; Kim, Eunggyun; Shin, Myeongcheol

Computer Science > Computation and Language

arXiv:2504.02882 (cs)

[Submitted on 2 Apr 2025]

Title:DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models

Authors:Sunghee Jung, Donghun Lee, Shinbok Lee, Gaeun Seo, Daniel Lee, Byeongil Ko, Junrae Cho, Kihyun Kim, Eunggyun Kim, Myeongcheol Shin

View PDF HTML (experimental)

Abstract:Tool-Augmented Larage Language Models (TA-LLMs) have shown promise in real-world applications, but face challenges in handling incomplete queries and out-of-scope requests. While existing approaches rely mainly on Supervised Fine-Tuning with expert trajectories, we propose DiaTool-DPO, a novel method that enhances TA-LLM's dialogue capabilities through Direct Preference Optimization. We model TA-LLM interactions as a Markov Decision Process with 5 distinct dialogue states and categorize user queries into 3 types based on their state transition trajectories. We automatically construct paired trajectory datasets of correct and incorrect dialogue flows and introduce a specialized objective loss for dialogue control. Our comprehensive evaluation demonstrates that DiaTool-DPO approaches GPT-4o's performance (94.8% in information gathering, 91% in tool call rejection) with substantial improvements over baseline (44% and 9.6% respectively) while maintaining core functionality. Our approach opens new possibilities for developing TA-LLMs that can handle diverse real-world scenarios without requiring additional expert demonstrations or human labeling.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2504.02882 [cs.CL]
	(or arXiv:2504.02882v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2504.02882

Submission history

From: Sunghee Jung [view email]
[v1] Wed, 2 Apr 2025 05:47:28 UTC (5,826 KB)

Computer Science > Computation and Language

Title:DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators