Low Rank Approximation and Regression in Input Sparsity Time

Clarkson, Kenneth L.; Woodruff, David P.

Computer Science > Data Structures and Algorithms

arXiv:1207.6365 (cs)

[Submitted on 26 Jul 2012 (v1), last revised 5 Apr 2013 (this version, v4)]

Title:Low Rank Approximation and Regression in Input Sparsity Time

Authors:Kenneth L. Clarkson, David P. Woodruff

View PDF

Abstract:We design a new distribution over $\poly(r \eps^{-1}) \times n$ matrices $S$ so that for any fixed $n \times d$ matrix $A$ of rank $r$, with probability at least 9/10, $\norm{SAx}_2 = (1 \pm \eps)\norm{Ax}_2$ simultaneously for all $x \in \mathbb{R}^d$. Such a matrix $S$ is called a \emph{subspace embedding}. Furthermore, $SA$ can be computed in $\nnz(A) + \poly(d \eps^{-1})$ time, where $\nnz(A)$ is the number of non-zero entries of $A$. This improves over all previous subspace embeddings, which required at least $\Omega(nd \log d)$ time to achieve this property. We call our matrices $S$ \emph{sparse embedding matrices}.
Using our sparse embedding matrices, we obtain the fastest known algorithms for $(1+\eps)$-approximation for overconstrained least-squares regression, low-rank approximation, approximating all leverage scores, and $\ell_p$-regression. The leading order term in the time complexity of our algorithms is $O(\nnz(A))$ or $O(\nnz(A)\log n)$.
We optimize the low-order $\poly(d/\eps)$ terms in our running times (or for rank-$k$ approximation, the $n*\poly(k/eps)$ term), and show various tradeoffs. For instance, we also use our methods to design new preconditioners that improve the dependence on $\eps$ in least squares regression to $\log 1/\eps$. Finally, we provide preliminary experimental results which suggest that our algorithms are competitive in practice.

Comments:	Included optimization of subspace embedding dimension from (d/eps)^4 to O~(d/eps)^2 in Section 4, by more careful analysis of perfect hashing, and minor improvements to regression / low rank approximation because of it
Subjects:	Data Structures and Algorithms (cs.DS)
Cite as:	arXiv:1207.6365 [cs.DS]
	(or arXiv:1207.6365v4 [cs.DS] for this version)
	https://doi.org/10.48550/arXiv.1207.6365

Submission history

From: David Woodruff [view email]
[v1] Thu, 26 Jul 2012 18:50:00 UTC (60 KB)
[v2] Sun, 14 Oct 2012 06:21:24 UTC (74 KB)
[v3] Wed, 31 Oct 2012 07:39:45 UTC (80 KB)
[v4] Fri, 5 Apr 2013 19:09:27 UTC (87 KB)

Computer Science > Data Structures and Algorithms

Title:Low Rank Approximation and Regression in Input Sparsity Time

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Data Structures and Algorithms

Title:Low Rank Approximation and Regression in Input Sparsity Time

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators