Understanding the Training Speedup from Sampling with Approximate Losses

Das, Rudrajit; Chen, Xi; Ieong, Bertram; Bansal, Parikshit; Sanghavi, Sujay

Computer Science > Machine Learning

arXiv:2402.07052 (cs)

[Submitted on 10 Feb 2024]

Title:Understanding the Training Speedup from Sampling with Approximate Losses

Authors:Rudrajit Das, Xi Chen, Bertram Ieong, Parikshit Bansal, Sujay Sanghavi

View PDF

Abstract:It is well known that selecting samples with large losses/gradients can significantly reduce the number of training steps. However, the selection overhead is often too high to yield any meaningful gains in terms of overall training time. In this work, we focus on the greedy approach of selecting samples with large \textit{approximate losses} instead of exact losses in order to reduce the selection overhead. For smooth convex losses, we show that such a greedy strategy can converge to a constant factor of the minimum value of the average loss in fewer iterations than the standard approach of random selection. We also theoretically quantify the effect of the approximation level. We then develop SIFT which uses early exiting to obtain approximate losses with an intermediate layer's representations for sample selection. We evaluate SIFT on the task of training a 110M parameter 12-layer BERT base model and show significant gains (in terms of training hours and number of backpropagation steps) without any optimized implementation over vanilla training. For e.g., to reach 64% validation accuracy, SIFT with exit at the first layer takes ~43 hours compared to ~57 hours of vanilla training.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2402.07052 [cs.LG]
	(or arXiv:2402.07052v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2402.07052

Submission history

From: Rudrajit Das [view email]
[v1] Sat, 10 Feb 2024 21:51:59 UTC (72 KB)

Computer Science > Machine Learning

Title:Understanding the Training Speedup from Sampling with Approximate Losses

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Understanding the Training Speedup from Sampling with Approximate Losses

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators