Estimating and Maximizing Mutual Information for Knowledge Distillation

Shrivastava, Aman; Qi, Yanjun; Ordonez, Vicente

Computer Science > Computer Vision and Pattern Recognition

arXiv:2110.15946v1 (cs)

[Submitted on 29 Oct 2021 (this version), latest version 11 May 2023 (v3)]

Title:Estimating and Maximizing Mutual Information for Knowledge Distillation

Authors:Aman Shrivastava, Yanjun Qi, Vicente Ordonez

View PDF

Abstract:Knowledge distillation is a widely used general technique to transfer knowledge from a teacher network to a student network. In this work, we propose Mutual Information Maximization Knowledge Distillation (MIMKD). Our method uses a contrastive objective to simultaneously estimate and maximize a lower bound on the mutual information between intermediate and global feature representations from the teacher and the student networks. Our method is flexible, as the proposed mutual information maximization does not impose significant constraints on the structure of the intermediate features of the networks. As such, we can distill knowledge from arbitrary teachers to arbitrary students. Our empirical results show that our method outperforms competing approaches across a wide range of student-teacher pairs with different capacities, with different architectures, and when student networks are with extremely low capacity. We are able to obtain 74.55% accuracy on CIFAR100 with a ShufflenetV2 from a baseline accuracy of 69.8% by distilling knowledge from ResNet50.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT)
Cite as:	arXiv:2110.15946 [cs.CV]
	(or arXiv:2110.15946v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2110.15946

Submission history

From: Aman Shrivastava [view email]
[v1] Fri, 29 Oct 2021 17:49:56 UTC (1,183 KB)
[v2] Mon, 29 Nov 2021 18:24:36 UTC (1,704 KB)
[v3] Thu, 11 May 2023 13:08:01 UTC (2,018 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Estimating and Maximizing Mutual Information for Knowledge Distillation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Estimating and Maximizing Mutual Information for Knowledge Distillation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators