Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method

Tan, Shicheng; Tam, Weng Lam; Wang, Yuanchun; Gong, Wenwen; Zhao, Shu; Zhang, Peng; Tang, Jie

Computer Science > Computation and Language

arXiv:2306.06625 (cs)

[Submitted on 11 Jun 2023]

Title:Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method

Authors:Shicheng Tan, Weng Lam Tam, Yuanchun Wang, Wenwen Gong, Shu Zhao, Peng Zhang, Jie Tang

View PDF

Abstract:The large scale of pre-trained language models poses a challenge for their deployment on various devices, with a growing emphasis on methods to compress these models, particularly knowledge distillation. However, current knowledge distillation methods rely on the model's intermediate layer features and the golden labels (also called hard labels), which usually require aligned model architecture and enough labeled data respectively. Moreover, the parameters of vocabulary are usually neglected in existing methods. To address these problems, we propose a general language model distillation (GLMD) method that performs two-stage word prediction distillation and vocabulary compression, which is simple and surprisingly shows extremely strong performance. Specifically, GLMD supports more general application scenarios by eliminating the constraints of dimension and structure between models and the need for labeled datasets through the absence of intermediate layers and golden labels. Meanwhile, based on the long-tailed distribution of word frequencies in the data, GLMD designs a strategy of vocabulary compression through decreasing vocabulary size instead of dimensionality. Experimental results show that our method outperforms 25 state-of-the-art methods on the SuperGLUE benchmark, achieving an average score that surpasses the best method by 3%.

Comments:	Accepted to Findings of ACL2023
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2306.06625 [cs.CL]
	(or arXiv:2306.06625v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2306.06625

Submission history

From: Shicheng Tan [view email]
[v1] Sun, 11 Jun 2023 08:53:27 UTC (453 KB)

Computer Science > Computation and Language

Title:Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators