Why do Larger Models Generalize Better? A Theoretical Perspective via the XOR Problem

Brutzkus, Alon; Globerson, Amir

Computer Science > Machine Learning

arXiv:1810.03037 (cs)

[Submitted on 6 Oct 2018 (v1), last revised 29 Jan 2019 (this version, v2)]

Title:Why do Larger Models Generalize Better? A Theoretical Perspective via the XOR Problem

Authors:Alon Brutzkus, Amir Globerson

View PDF

Abstract:Empirical evidence suggests that neural networks with ReLU activations generalize better with over-parameterization. However, there is currently no theoretical analysis that explains this observation. In this work, we provide theoretical and empirical evidence that, in certain cases, overparameterized convolutional networks generalize better than small networks because of an interplay between weight clustering and feature exploration at initialization. We demonstrate this theoretically for a 3-layer convolutional neural network with max-pooling, in a novel setting which extends the XOR problem. We show that this interplay implies that with overparamterization, gradient descent converges to global minima with better generalization performance compared to global minima of small networks. Empirically, we demonstrate these phenomena for a 3-layer convolutional neural network in the MNIST task.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1810.03037 [cs.LG]
	(or arXiv:1810.03037v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1810.03037

Submission history

From: Alon Brutzkus [view email]
[v1] Sat, 6 Oct 2018 18:44:51 UTC (1,270 KB)
[v2] Tue, 29 Jan 2019 14:21:01 UTC (1,911 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2018-10

Change to browse by:

cs
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Alon Brutzkus
Amir Globerson

export BibTeX citation

Computer Science > Machine Learning

Title:Why do Larger Models Generalize Better? A Theoretical Perspective via the XOR Problem

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Why do Larger Models Generalize Better? A Theoretical Perspective via the XOR Problem

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators