Towards Practical Lipreading with Distilled and Efficient Models

Ma, Pingchuan; Martinez, Brais; Petridis, Stavros; Pantic, Maja

Computer Science > Computer Vision and Pattern Recognition

arXiv:2007.06504 (cs)

[Submitted on 13 Jul 2020 (v1), last revised 2 Jun 2021 (this version, v3)]

Title:Towards Practical Lipreading with Distilled and Efficient Models

Authors:Pingchuan Ma, Brais Martinez, Stavros Petridis, Maja Pantic

View PDF

Abstract:Lipreading has witnessed a lot of progress due to the resurgence of neural networks. Recent works have placed emphasis on aspects such as improving performance by finding the optimal architecture or improving generalization. However, there is still a significant gap between the current methodologies and the requirements for an effective deployment of lipreading in practical scenarios. In this work, we propose a series of innovations that significantly bridge that gap: first, we raise the state-of-the-art performance by a wide margin on LRW and LRW-1000 to 88.5% and 46.6%, respectively using self-distillation. Secondly, we propose a series of architectural changes, including a novel Depthwise Separable Temporal Convolutional Network (DS-TCN) head, that slashes the computational cost to a fraction of the (already quite efficient) original model. Thirdly, we show that knowledge distillation is a very effective tool for recovering performance of the lightweight models. This results in a range of models with different accuracy-efficiency trade-offs. However, our most promising lightweight models are on par with the current state-of-the-art while showing a reduction of 8.2x and 3.9x in terms of computational cost and number of parameters, respectively, which we hope will enable the deployment of lipreading models in practical applications.

Comments:	Accepted to ICASSP 2021
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2007.06504 [cs.CV]
	(or arXiv:2007.06504v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2007.06504

Submission history

From: Pingchuan Ma [view email]
[v1] Mon, 13 Jul 2020 16:56:27 UTC (192 KB)
[v2] Fri, 12 Feb 2021 15:50:02 UTC (208 KB)
[v3] Wed, 2 Jun 2021 09:02:09 UTC (208 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Towards Practical Lipreading with Distilled and Efficient Models

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Towards Practical Lipreading with Distilled and Efficient Models

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators