Estimating and Maximizing Mutual Information for Knowledge Distillation

Shrivastava, Aman; Qi, Yanjun; Ordonez, Vicente

Computer Science > Computer Vision and Pattern Recognition

arXiv:2110.15946 (cs)

[Submitted on 29 Oct 2021 (v1), last revised 11 May 2023 (this version, v3)]

Title:Estimating and Maximizing Mutual Information for Knowledge Distillation

Authors:Aman Shrivastava, Yanjun Qi, Vicente Ordonez

View PDF

Abstract:In this work, we propose Mutual Information Maximization Knowledge Distillation (MIMKD). Our method uses a contrastive objective to simultaneously estimate and maximize a lower bound on the mutual information of local and global feature representations between a teacher and a student network. We demonstrate through extensive experiments that this can be used to improve the performance of low capacity models by transferring knowledge from more performant but computationally expensive models. This can be used to produce better models that can be run on devices with low computational resources. Our method is flexible, we can distill knowledge from teachers with arbitrary network architectures to arbitrary student networks. Our empirical results show that MIMKD outperforms competing approaches across a wide range of student-teacher pairs with different capacities, with different architectures, and when student networks are with extremely low capacity. We are able to obtain 74.55% accuracy on CIFAR100 with a ShufflenetV2 from a baseline accuracy of 69.8% by distilling knowledge from ResNet-50. On Imagenet we improve a ResNet-18 network from 68.88% to 70.32% accuracy (1.44%+) using a ResNet-34 teacher network.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT)
Cite as:	arXiv:2110.15946 [cs.CV]
	(or arXiv:2110.15946v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2110.15946

Submission history

From: Aman Shrivastava [view email]
[v1] Fri, 29 Oct 2021 17:49:56 UTC (1,183 KB)
[v2] Mon, 29 Nov 2021 18:24:36 UTC (1,704 KB)
[v3] Thu, 11 May 2023 13:08:01 UTC (2,018 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Estimating and Maximizing Mutual Information for Knowledge Distillation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Estimating and Maximizing Mutual Information for Knowledge Distillation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators