Ensemble Knowledge Distillation for CTR Prediction

Zhu, Jieming; Liu, Jinyang; Li, Weiqi; Lai, Jincai; He, Xiuqiang; Chen, Liang; Zheng, Zibin

Computer Science > Machine Learning

arXiv:2011.04106 (cs)

[Submitted on 8 Nov 2020 (v1), last revised 5 Jul 2023 (this version, v2)]

Title:Ensemble Knowledge Distillation for CTR Prediction

Authors:Jieming Zhu, Jinyang Liu, Weiqi Li, Jincai Lai, Xiuqiang He, Liang Chen, Zibin Zheng

View PDF

Abstract:Recently, deep learning-based models have been widely studied for click-through rate (CTR) prediction and lead to improved prediction accuracy in many industrial applications. However, current research focuses primarily on building complex network architectures to better capture sophisticated feature interactions and dynamic user behaviors. The increased model complexity may slow down online inference and hinder its adoption in real-time applications. Instead, our work targets at a new model training strategy based on knowledge distillation (KD). KD is a teacher-student learning framework to transfer knowledge learned from a teacher model to a student model. The KD strategy not only allows us to simplify the student model as a vanilla DNN model but also achieves significant accuracy improvements over the state-of-the-art teacher models. The benefits thus motivate us to further explore the use of a powerful ensemble of teachers for more accurate student model training. We also propose some novel techniques to facilitate ensembled CTR prediction, including teacher gating and early stopping by distillation loss. We conduct comprehensive experiments against 12 existing models and across three industrial datasets. Both offline and online A/B testing results show the effectiveness of our KD-based training strategy.

Comments:	Published in CIKM'2020
Subjects:	Machine Learning (cs.LG); Information Retrieval (cs.IR)
Cite as:	arXiv:2011.04106 [cs.LG]
	(or arXiv:2011.04106v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2011.04106

Submission history

From: Jieming Zhu [view email]
[v1] Sun, 8 Nov 2020 23:37:58 UTC (2,124 KB)
[v2] Wed, 5 Jul 2023 03:27:45 UTC (1,965 KB)

Computer Science > Machine Learning

Title:Ensemble Knowledge Distillation for CTR Prediction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Ensemble Knowledge Distillation for CTR Prediction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators