Leveraging Uncertainty Estimation for Efficient LLM Routing

Zhang, Tuo; Mehradfar, Asal; Dimitriadis, Dimitrios; Avestimehr, Salman

Computer Science > Networking and Internet Architecture

arXiv:2502.11021 (cs)

[Submitted on 16 Feb 2025]

Title:Leveraging Uncertainty Estimation for Efficient LLM Routing

Authors:Tuo Zhang, Asal Mehradfar, Dimitrios Dimitriadis, Salman Avestimehr

View PDF HTML (experimental)

Abstract:Deploying large language models (LLMs) in edge-cloud environments requires an efficient routing strategy to balance cost and response quality. Traditional approaches prioritize either human-preference data or accuracy metrics from benchmark datasets as routing criteria, but these methods suffer from rigidity and subjectivity. Moreover, existing routing frameworks primarily focus on accuracy and cost, neglecting response quality from a human preference perspective. In this work, we propose the Confidence-Driven LLM Router, a novel framework that leverages uncertainty estimation to optimize routing decisions. To comprehensively assess routing performance, we evaluate both system cost efficiency and response quality. In particular, we introduce the novel use of LLM-as-a-Judge to simulate human rating preferences, providing the first systematic assessment of response quality across different routing strategies. Extensive experiments on MT-Bench, GSM8K, and MMLU demonstrate that our approach outperforms state-of-the-art routing methods, achieving superior response quality while maintaining cost efficiency.

Subjects:	Networking and Internet Architecture (cs.NI); Computation and Language (cs.CL)
Cite as:	arXiv:2502.11021 [cs.NI]
	(or arXiv:2502.11021v1 [cs.NI] for this version)
	https://doi.org/10.48550/arXiv.2502.11021

Submission history

From: Tuo Zhang [view email]
[v1] Sun, 16 Feb 2025 07:08:47 UTC (764 KB)

Computer Science > Networking and Internet Architecture

Title:Leveraging Uncertainty Estimation for Efficient LLM Routing

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Networking and Internet Architecture

Title:Leveraging Uncertainty Estimation for Efficient LLM Routing

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators