Learning Harmonized Representations for Speculative Sampling

Zhang, Lefan; Wang, Xiaodan; Huang, Yanhua; Xu, Ruiwen

Computer Science > Machine Learning

arXiv:2408.15766 (cs)

[Submitted on 28 Aug 2024 (v1), last revised 19 Sep 2024 (this version, v2)]

Title:Learning Harmonized Representations for Speculative Sampling

Authors:Lefan Zhang, Xiaodan Wang, Yanhua Huang, Ruiwen Xu

View PDF HTML (experimental)

Abstract:Speculative sampling is a promising approach to accelerate the decoding stage for Large Language Models (LLMs). Recent advancements that leverage target LLM's contextual information, such as hidden states and KV cache, have shown significant practical improvements. However, these approaches suffer from inconsistent context between training and decoding. We also observe another discrepancy between the training and decoding objectives in existing speculative sampling methods. In this work, we propose a solution named HArmonized Speculative Sampling (HASS) that learns harmonized representations to address these issues. HASS accelerates the decoding stage without adding inference overhead through harmonized objective distillation and harmonized context alignment. Experiments on four LLaMA models demonstrate that HASS achieves 2.81x-4.05x wall-clock time speedup ratio averaging across three datasets, surpassing EAGLE-2 by 8%-20%.

Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:	arXiv:2408.15766 [cs.LG]
	(or arXiv:2408.15766v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2408.15766

Submission history

From: Yanhua Huang [view email]
[v1] Wed, 28 Aug 2024 12:59:12 UTC (123 KB)
[v2] Thu, 19 Sep 2024 15:46:57 UTC (168 KB)

Computer Science > Machine Learning

Title:Learning Harmonized Representations for Speculative Sampling

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Learning Harmonized Representations for Speculative Sampling

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators