Sketch-and-Lift: Scalable Subsampled Semidefinite Program for $K$-means Clustering

Zhuang, Yubo; Chen, Xiaohui; Yang, Yun

Statistics > Machine Learning

arXiv:2201.08226 (stat)

[Submitted on 20 Jan 2022 (v1), last revised 9 Feb 2022 (this version, v2)]

Title:Sketch-and-Lift: Scalable Subsampled Semidefinite Program for $K$-means Clustering

Authors:Yubo Zhuang, Xiaohui Chen, Yun Yang

View PDF

Abstract:Semidefinite programming (SDP) is a powerful tool for tackling a wide range of computationally hard problems such as clustering. Despite the high accuracy, semidefinite programs are often too slow in practice with poor scalability on large (or even moderate) datasets. In this paper, we introduce a linear time complexity algorithm for approximating an SDP relaxed $K$-means clustering. The proposed sketch-and-lift (SL) approach solves an SDP on a subsampled dataset and then propagates the solution to all data points by a nearest-centroid rounding procedure. It is shown that the SL approach enjoys a similar exact recovery threshold as the $K$-means SDP on the full dataset, which is known to be information-theoretically tight under the Gaussian mixture model. The SL method can be made adaptive with enhanced theoretic properties when the cluster sizes are unbalanced. Our simulation experiments demonstrate that the statistical accuracy of the proposed method outperforms state-of-the-art fast clustering algorithms without sacrificing too much computational efficiency, and is comparable to the original $K$-means SDP with substantially reduced runtime.

Comments:	Accepted at AISTATS 2022
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2201.08226 [stat.ML]
	(or arXiv:2201.08226v2 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2201.08226

Submission history

From: Yubo Zhuang [view email]
[v1] Thu, 20 Jan 2022 15:31:28 UTC (362 KB)
[v2] Wed, 9 Feb 2022 03:37:02 UTC (366 KB)

Statistics > Machine Learning

Title:Sketch-and-Lift: Scalable Subsampled Semidefinite Program for $K$-means Clustering

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Sketch-and-Lift: Scalable Subsampled Semidefinite Program for $K$-means Clustering

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators