Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate

Nacson, Mor Shpigel; Srebro, Nathan; Soudry, Daniel

Statistics > Machine Learning

arXiv:1806.01796 (stat)

[Submitted on 5 Jun 2018 (v1), last revised 18 Apr 2022 (this version, v3)]

Title:Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate

Authors:Mor Shpigel Nacson, Nathan Srebro, Daniel Soudry

View PDF

Abstract:Stochastic Gradient Descent (SGD) is a central tool in machine learning. We prove that SGD converges to zero loss, even with a fixed (non-vanishing) learning rate - in the special case of homogeneous linear classifiers with smooth monotone loss functions, optimized on linearly separable data. Previous works assumed either a vanishing learning rate, iterate averaging, or loss assumptions that do not hold for monotone loss functions used for classification, such as the logistic loss. We prove our result on a fixed dataset, both for sampling with or without replacement. Furthermore, for logistic loss (and similar exponentially-tailed losses), we prove that with SGD the weight vector converges in direction to the $L_2$ max margin vector as $O(1/\log(t))$ for almost all separable datasets, and the loss converges as $O(1/t)$ - similarly to gradient descent. Lastly, we examine the case of a fixed learning rate proportional to the minibatch size. We prove that in this case, the asymptotic convergence rate of SGD (with replacement) does not depend on the minibatch size in terms of epochs, if the support vectors span the data. These results may suggest an explanation to similar behaviors observed in deep networks, when trained with SGD.

Comments:	Fixed a typo (Eq. (4) - missing σ_{max}^2 term in the denominator)
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:1806.01796 [stat.ML]
	(or arXiv:1806.01796v3 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.1806.01796

Submission history

From: Mor Shpigel Nacson [view email]
[v1] Tue, 5 Jun 2018 16:37:19 UTC (461 KB)
[v2] Sun, 24 Mar 2019 09:47:18 UTC (1,270 KB)
[v3] Mon, 18 Apr 2022 14:12:57 UTC (1,272 KB)

Statistics > Machine Learning

Title:Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators