Scaling ResNets in the Large-depth Regime

Marion, Pierre; Fermanian, Adeline; Biau, Gérard; Vert, Jean-Philippe

Computer Science > Machine Learning

arXiv:2206.06929 (cs)

[Submitted on 14 Jun 2022 (v1), last revised 1 Mar 2025 (this version, v3)]

Title:Scaling ResNets in the Large-depth Regime

Authors:Pierre Marion, Adeline Fermanian, Gérard Biau, Jean-Philippe Vert

View PDF HTML (experimental)

Abstract:Deep ResNets are recognized for achieving state-of-the-art results in complex machine learning tasks. However, the remarkable performance of these architectures relies on a training procedure that needs to be carefully crafted to avoid vanishing or exploding gradients, particularly as the depth $L$ increases. No consensus has been reached on how to mitigate this issue, although a widely discussed strategy consists in scaling the output of each layer by a factor $\alpha_L$. We show in a probabilistic setting that with standard i.i.d.~initializations, the only non-trivial dynamics is for $\alpha_L = \frac{1}{\sqrt{L}}$; other choices lead either to explosion or to identity mapping. This scaling factor corresponds in the continuous-time limit to a neural stochastic differential equation, contrarily to a widespread interpretation that deep ResNets are discretizations of neural ordinary differential equations. By contrast, in the latter regime, stability is obtained with specific correlated initializations and $\alpha_L = \frac{1}{L}$. Our analysis suggests a strong interplay between scaling and regularity of the weights as a function of the layer index. Finally, in a series of experiments, we exhibit a continuous range of regimes driven by these two parameters, which jointly impact performance before and after training.

Comments:	48 pages, 15 figures. Accepted to JMLR
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2206.06929 [cs.LG]
	(or arXiv:2206.06929v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2206.06929

Submission history

From: Pierre Marion [view email]
[v1] Tue, 14 Jun 2022 15:49:10 UTC (1,780 KB)
[v2] Mon, 10 Jun 2024 14:28:26 UTC (1,479 KB)
[v3] Sat, 1 Mar 2025 00:23:02 UTC (2,886 KB)

Computer Science > Machine Learning

Title:Scaling ResNets in the Large-depth Regime

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Scaling ResNets in the Large-depth Regime

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators