Computer Science > Data Structures and Algorithms
[Submitted on 22 Dec 2008 (v1), last revised 12 May 2010 (this version, v2)]
Title:An Improved Approximation Algorithm for the Column Subset Selection Problem
View PDFAbstract:We consider the problem of selecting the best subset of exactly $k$ columns from an $m \times n$ matrix $A$. We present and analyze a novel two-stage algorithm that runs in $O(\min\{mn^2,m^2n\})$ time and returns as output an $m \times k$ matrix $C$ consisting of exactly $k$ columns of $A$. In the first (randomized) stage, the algorithm randomly selects $\Theta(k \log k)$ columns according to a judiciously-chosen probability distribution that depends on information in the top-$k$ right singular subspace of $A$. In the second (deterministic) stage, the algorithm applies a deterministic column-selection procedure to select and return exactly $k$ columns from the set of columns selected in the first stage. Let $C$ be the $m \times k$ matrix containing those $k$ columns, let $P_C$ denote the projection matrix onto the span of those columns, and let $A_k$ denote the best rank-$k$ approximation to the matrix $A$. Then, we prove that, with probability at least 0.8, $$ \FNorm{A - P_CA} \leq \Theta(k \log^{1/2} k) \FNorm{A-A_k}. $$ This Frobenius norm bound is only a factor of $\sqrt{k \log k}$ worse than the best previously existing existential result and is roughly $O(\sqrt{k!})$ better than the best previous algorithmic result for the Frobenius norm version of this Column Subset Selection Problem (CSSP). We also prove that, with probability at least 0.8, $$ \TNorm{A - P_CA} \leq \Theta(k \log^{1/2} k)\TNorm{A-A_k} + \Theta(k^{3/4}\log^{1/4}k)\FNorm{A-A_k}. $$ This spectral norm bound is not directly comparable to the best previously existing bounds for the spectral norm version of this CSSP. Our bound depends on $\FNorm{A-A_k}$, whereas previous results depend on $\sqrt{n-k}\TNorm{A-A_k}$; if these two quantities are comparable, then our bound is asymptotically worse by a $(k \log k)^{1/4}$ factor.
Submission history
From: Michael Mahoney [view email][v1] Mon, 22 Dec 2008 21:16:55 UTC (61 KB)
[v2] Wed, 12 May 2010 02:44:25 UTC (27 KB)
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.