Spurious Correlations in High Dimensional Regression: The Roles of Regularization, Simplicity Bias and Over-Parameterization

Bombari, Simone; Mondelli, Marco

Statistics > Machine Learning

arXiv:2502.01347 (stat)

[Submitted on 3 Feb 2025]

Title:Spurious Correlations in High Dimensional Regression: The Roles of Regularization, Simplicity Bias and Over-Parameterization

Authors:Simone Bombari, Marco Mondelli

View PDF HTML (experimental)

Abstract:Learning models have been shown to rely on spurious correlations between non-predictive features and the associated labels in the training data, with negative implications on robustness, bias and fairness. In this work, we provide a statistical characterization of this phenomenon for high-dimensional regression, when the data contains a predictive core feature $x$ and a spurious feature $y$. Specifically, we quantify the amount of spurious correlations $C$ learned via linear regression, in terms of the data covariance and the strength $\lambda$ of the ridge regularization. As a consequence, we first capture the simplicity of $y$ through the spectrum of its covariance, and its correlation with $x$ through the Schur complement of the full data covariance. Next, we prove a trade-off between $C$ and the in-distribution test loss $L$, by showing that the value of $\lambda$ that minimizes $L$ lies in an interval where $C$ is increasing. Finally, we investigate the effects of over-parameterization via the random features model, by showing its equivalence to regularized linear regression. Our theoretical results are supported by numerical experiments on Gaussian, Color-MNIST, and CIFAR-10 datasets.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2502.01347 [stat.ML]
	(or arXiv:2502.01347v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2502.01347

Submission history

From: Simone Bombari [view email]
[v1] Mon, 3 Feb 2025 13:38:42 UTC (308 KB)

Statistics > Machine Learning

Title:Spurious Correlations in High Dimensional Regression: The Roles of Regularization, Simplicity Bias and Over-Parameterization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Spurious Correlations in High Dimensional Regression: The Roles of Regularization, Simplicity Bias and Over-Parameterization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators