On the Selection of Initialization and Activation Function for Deep Neural Networks

Hayou, Soufiane; Doucet, Arnaud; Rousseau, Judith

Statistics > Machine Learning

arXiv:1805.08266 (stat)

[Submitted on 21 May 2018 (v1), last revised 7 Oct 2018 (this version, v2)]

Title:On the Selection of Initialization and Activation Function for Deep Neural Networks

Authors:Soufiane Hayou, Arnaud Doucet, Judith Rousseau

View PDF

Abstract:The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the training procedure. An inappropriate selection can lead to the loss of information of the input during forward propagation and the exponential vanishing/exploding of gradients during back-propagation. Understanding the theoretical properties of untrained random networks is key to identifying which deep networks may be trained successfully as recently demonstrated by Schoenholz et al. (2017) who showed that for deep feedforward neural networks only a specific choice of hyperparameters known as the `edge of chaos' can lead to good performance. We complete this analysis by providing quantitative results showing that, for a class of ReLU-like activation functions, the information propagates indeed deeper for an initialization at the edge of chaos. By further extending this analysis, we identify a class of activation functions that improve the information propagation over ReLU-like functions. This class includes the Swish activation, $\phi_{swish}(x) = x \cdot \text{sigmoid}(x)$, used in Hendrycks & Gimpel (2016), Elfwing et al. (2017) and Ramachandran et al. (2017). This provides a theoretical grounding for the excellent empirical performance of $\phi_{swish}$ observed in these contributions. We complement those previous results by illustrating the benefit of using a random initialization on the edge of chaos in this context.

Comments:	8 pages, 15 figures
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:1805.08266 [stat.ML]
	(or arXiv:1805.08266v2 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.1805.08266

Submission history

From: Soufiane Hayou [view email]
[v1] Mon, 21 May 2018 19:23:39 UTC (2,689 KB)
[v2] Sun, 7 Oct 2018 18:20:25 UTC (1,281 KB)

Statistics > Machine Learning

Title:On the Selection of Initialization and Activation Function for Deep Neural Networks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:On the Selection of Initialization and Activation Function for Deep Neural Networks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators