TETRIS: Optimal Draft Token Selection for Batch Speculative Decoding

Wu, Zhaoxuan; Zhou, Zijian; Verma, Arun; Prakash, Alok; Rus, Daniela; Low, Bryan Kian Hsiang

Computer Science > Computation and Language

arXiv:2502.15197 (cs)

[Submitted on 21 Feb 2025]

Title:TETRIS: Optimal Draft Token Selection for Batch Speculative Decoding

Authors:Zhaoxuan Wu, Zijian Zhou, Arun Verma, Alok Prakash, Daniela Rus, Bryan Kian Hsiang Low

View PDF HTML (experimental)

Abstract:We propose TETRIS, a novel method that optimizes the total throughput of batch speculative decoding in multi-request settings. Unlike existing methods that optimize for a single request or a group of requests as a whole, TETRIS actively selects the most promising draft tokens (for every request in a batch) to be accepted when verified in parallel, resulting in fewer rejected tokens and hence less wasted computing resources. Such an effective resource utilization to achieve fast inference in large language models (LLMs) is especially important to service providers with limited inference capacity. Compared to baseline speculative decoding, TETRIS yields a consistently higher acceptance rate and more effective utilization of the limited inference capacity. We show theoretically and empirically that TETRIS outperforms baseline speculative decoding and existing methods that dynamically select draft tokens, leading to a more efficient batch inference in LLMs.

Comments:	15 pages, 10 figures, 5 tables
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2502.15197 [cs.CL]
	(or arXiv:2502.15197v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.15197

Submission history

From: Zhaoxuan Wu [view email]
[v1] Fri, 21 Feb 2025 04:19:24 UTC (385 KB)

Computer Science > Computation and Language

Title:TETRIS: Optimal Draft Token Selection for Batch Speculative Decoding

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:TETRIS: Optimal Draft Token Selection for Batch Speculative Decoding

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators