AutoDisc: Automatic Distillation Schedule for Large Language Model Compression

Zhang, Chen; Yang, Yang; Wang, Qifan; Liu, Jiahao; Wang, Jingang; Wu, Wei; Song, Dawei

Computer Science > Computation and Language

arXiv:2205.14570v1 (cs)

[Submitted on 29 May 2022 (this version), latest version 29 Jan 2024 (v3)]

Title:AutoDisc: Automatic Distillation Schedule for Large Language Model Compression

Authors:Chen Zhang, Yang Yang, Qifan Wang, Jiahao Liu, Jingang Wang, Wei Wu, Dawei Song

View PDF

Abstract:Driven by the teacher-student paradigm, knowledge distillation is one of the de facto ways for language model compression. Recent studies have uncovered that conventional distillation is less effective when facing a large capacity gap between the teacher and the student, and introduced teacher assistant-based distillation to bridge the gap. As a connection, the scale and the performance of the teacher assistant is crucial for transferring the knowledge from the teacher to the student. However, existing teacher assistant-based methods manually select the scale of the teacher assistant, which fails to identify the teacher assistant with the optimal scale-performance tradeoff. To this end, we propose an Automatic Distillation Schedule (AutoDisc) for large language model compression. In particular, AutoDisc first specifies a set of teacher assistant candidates at different scales with gridding and pruning, and then optimizes all candidates in an once-for-all optimization with two approximations. The best teacher assistant scale is automatically selected according to the scale-performance tradeoff. AutoDisc is evaluated with an extensive set of experiments on a language understanding benchmark GLUE. Experimental results demonstrate the improved performance and applicability of our AutoDisc. We further apply AutoDisc on a language model with over one billion parameters and show the scalability of AutoDisc.

Comments:	Work in progress. Code will be available soon
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2205.14570 [cs.CL]
	(or arXiv:2205.14570v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2205.14570

Submission history

From: Chen Zhang [view email]
[v1] Sun, 29 May 2022 04:22:48 UTC (527 KB)
[v2] Thu, 4 May 2023 05:36:51 UTC (7,014 KB)
[v3] Mon, 29 Jan 2024 03:58:35 UTC (155 KB)

Computer Science > Computation and Language

Title:AutoDisc: Automatic Distillation Schedule for Large Language Model Compression

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:AutoDisc: Automatic Distillation Schedule for Large Language Model Compression

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators