Analysis of the expected $L_2$ error of an over-parametrized deep neural network estimate learned by gradient descent without regularization

Drews, Selina; Kohler, Michael

Statistics > Machine Learning

arXiv:2311.14609 (stat)

[Submitted on 24 Nov 2023]

Title:Analysis of the expected $L_2$ error of an over-parametrized deep neural network estimate learned by gradient descent without regularization

Authors:Selina Drews, Michael Kohler

View PDF

Abstract:Recent results show that estimates defined by over-parametrized deep neural networks learned by applying gradient descent to a regularized empirical $L_2$ risk are universally consistent and achieve good rates of convergence. In this paper, we show that the regularization term is not necessary to obtain similar results. In the case of a suitably chosen initialization of the network, a suitable number of gradient descent steps, and a suitable step size we show that an estimate without a regularization term is universally consistent for bounded predictor variables. Additionally, we show that if the regression function is Hölder smooth with Hölder exponent $1/2 \leq p \leq 1$, the $L_2$ error converges to zero with a convergence rate of approximately $n^{-1/(1+d)}$. Furthermore, in case of an interaction model, where the regression function consists of a sum of Hölder smooth functions with $d^*$ components, a rate of convergence is derived which does not depend on the input dimension $d$.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
MSC classes:	62G08
Cite as:	arXiv:2311.14609 [stat.ML]
	(or arXiv:2311.14609v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2311.14609

Submission history

From: Selina Drews [view email]
[v1] Fri, 24 Nov 2023 17:04:21 UTC (31 KB)

Statistics > Machine Learning

Title:Analysis of the expected $L_2$ error of an over-parametrized deep neural network estimate learned by gradient descent without regularization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Analysis of the expected $L_2$ error of an over-parametrized deep neural network estimate learned by gradient descent without regularization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators