Model Agnostic High-Dimensional Error-in-Variable Regression

Agarwal, Anish; Shah, Devavrat; Shen, Dennis; Song, Dogyoon

Computer Science > Machine Learning

arXiv:1902.10920v2 (cs)

[Submitted on 28 Feb 2019 (v1), revised 12 Mar 2019 (this version, v2), latest version 19 May 2021 (v10)]

Title:Model Agnostic High-Dimensional Error-in-Variable Regression

Authors:Anish Agarwal, Devavrat Shah, Dennis Shen, Dogyoon Song

View PDF

Abstract:We consider the problem of high-dimensional error-in-variable regression where we only observe a sparse, noisy version of the covariate data. We propose an algorithm that utilizes matrix estimation (ME) as a key subroutine to de-noise the corrupted data, and then performs ordinary least squares regression. When the ME subroutine is instantiated with hard singular value thresholding (HSVT), our results indicate that if the number of samples scales as $\omega( \rho^{-4} r \log^5 (p))$, then our in- and out-of-sample prediction error decays to $0$ as $p \rightarrow \infty$; $\rho$ represents the fraction of observed data, $r$ is the (approximate) rank of the true covariate matrix, and $p$ is the number of covariates. As an important byproduct of our approach, we demonstrate that HSVT with regression acts as implicit $\ell_0$-regularization since HSVT aims to find a low-rank structure within the covariance matrix. Thus, we can view the sparsity of the estimated parameter as a consequence of the covariate structure rather than a model assumption as is often considered in the literature. Moreover, our non-asymptotic bounds match (up to $\log^4(p)$ factors) the best guaranteed sample complexity results in the literature for algorithms that require precise knowledge of the underlying model; we highlight that our approach is model agnostic. In our analysis, we obtain two technical results of independent interest: first, we provide a simple bound on the spectral norm of random matrices with independent sub-exponential rows with randomly missing entries; second, we bound the max column sum error -- a nonstandard error metric -- for HSVT. Our setting enables us to apply our results to applications such as synthetic control for causal inference, time series analysis, and regression with privacy. It is important to note that the existing inventory of methods is unable to analyze these applications.

Comments:	51 pages
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1902.10920 [cs.LG]
	(or arXiv:1902.10920v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1902.10920

Submission history

From: Dennis Shen [view email]
[v1] Thu, 28 Feb 2019 06:34:52 UTC (600 KB)
[v2] Tue, 12 Mar 2019 22:58:13 UTC (628 KB)
[v3] Thu, 30 May 2019 04:04:35 UTC (1,402 KB)
[v4] Thu, 6 Jun 2019 02:25:01 UTC (891 KB)
[v5] Wed, 12 Jun 2019 16:41:24 UTC (1,385 KB)
[v6] Mon, 13 Jan 2020 19:33:57 UTC (4,111 KB)
[v7] Mon, 20 Jan 2020 04:49:20 UTC (4,118 KB)
[v8] Mon, 10 Aug 2020 15:25:12 UTC (4,237 KB)
[v9] Sat, 12 Dec 2020 16:56:13 UTC (6,701 KB)
[v10] Wed, 19 May 2021 14:40:43 UTC (6,728 KB)

Computer Science > Machine Learning

Title:Model Agnostic High-Dimensional Error-in-Variable Regression

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Model Agnostic High-Dimensional Error-in-Variable Regression

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators