Retrieval Models Aren't Tool-Savvy: Benchmarking Tool Retrieval for Large Language Models

Shi, Zhengliang; Wang, Yuhan; Yan, Lingyong; Ren, Pengjie; Wang, Shuaiqiang; Yin, Dawei; Ren, Zhaochun

Computer Science > Computation and Language

arXiv:2503.01763 (cs)

[Submitted on 3 Mar 2025]

Title:Retrieval Models Aren't Tool-Savvy: Benchmarking Tool Retrieval for Large Language Models

Authors:Zhengliang Shi, Yuhan Wang, Lingyong Yan, Pengjie Ren, Shuaiqiang Wang, Dawei Yin, Zhaochun Ren

View PDF HTML (experimental)

Abstract:Tool learning aims to augment large language models (LLMs) with diverse tools, enabling them to act as agents for solving practical tasks. Due to the limited context length of tool-using LLMs, adopting information retrieval (IR) models to select useful tools from large toolsets is a critical initial step. However, the performance of IR models in tool retrieval tasks remains underexplored and unclear. Most tool-use benchmarks simplify this step by manually pre-annotating a small set of relevant tools for each task, which is far from the real-world scenarios. In this paper, we propose ToolRet, a heterogeneous tool retrieval benchmark comprising 7.6k diverse retrieval tasks, and a corpus of 43k tools, collected from existing datasets. We benchmark six types of models on ToolRet. Surprisingly, even the models with strong performance in conventional IR benchmarks, exhibit poor performance on ToolRet. This low retrieval quality degrades the task pass rate of tool-use LLMs. As a further step, we contribute a large-scale training dataset with over 200k instances, which substantially optimizes the tool retrieval ability of IR models.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Cite as:	arXiv:2503.01763 [cs.CL]
	(or arXiv:2503.01763v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2503.01763

Submission history

From: Zhengliang Shi [view email]
[v1] Mon, 3 Mar 2025 17:37:16 UTC (13,521 KB)

Computer Science > Computation and Language

Title:Retrieval Models Aren't Tool-Savvy: Benchmarking Tool Retrieval for Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Retrieval Models Aren't Tool-Savvy: Benchmarking Tool Retrieval for Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators