Unbiased estimates for linear regression via volume sampling

Dereziński, Michał; Warmuth, Manfred K.

Computer Science > Machine Learning

arXiv:1705.06908 (cs)

[Submitted on 19 May 2017 (v1), last revised 5 Jun 2018 (this version, v5)]

Title:Unbiased estimates for linear regression via volume sampling

Authors:Michał Dereziński, Manfred K. Warmuth

View PDF

Abstract:Given a full rank matrix $X$ with more columns than rows, consider the task of estimating the pseudo inverse $X^+$ based on the pseudo inverse of a sampled subset of columns (of size at least the number of rows). We show that this is possible if the subset of columns is chosen proportional to the squared volume spanned by the rows of the chosen submatrix (ie, volume sampling). The resulting estimator is unbiased and surprisingly the covariance of the estimator also has a closed form: It equals a specific factor times $X^{+\top}X^+$. Pseudo inverse plays an important part in solving the linear least squares problem, where we try to predict a label for each column of $X$. We assume labels are expensive and we are only given the labels for the small subset of columns we sample from $X$. Using our methods we show that the weight vector of the solution for the sub problem is an unbiased estimator of the optimal solution for the whole problem based on all column labels. We believe that these new formulas establish a fundamental connection between linear least squares and volume sampling. We use our methods to obtain an algorithm for volume sampling that is faster than state-of-the-art and for obtaining bounds for the total loss of the estimated least-squares solution on all labeled columns.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:1705.06908 [cs.LG]
	(or arXiv:1705.06908v5 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1705.06908

Submission history

From: Michał Dereziński [view email]
[v1] Fri, 19 May 2017 09:43:41 UTC (23 KB)
[v2] Wed, 7 Jun 2017 22:42:47 UTC (23 KB)
[v3] Tue, 13 Jun 2017 22:36:50 UTC (23 KB)
[v4] Wed, 3 Jan 2018 00:31:28 UTC (75 KB)
[v5] Tue, 5 Jun 2018 22:46:03 UTC (23 KB)

Computer Science > Machine Learning

Title:Unbiased estimates for linear regression via volume sampling

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Unbiased estimates for linear regression via volume sampling

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators