The Hardness of Approximation of Euclidean k-means

Awasthi, Pranjal; Charikar, Moses; Krishnaswamy, Ravishankar; Sinop, Ali Kemal

Abstract:The Euclidean $k$-means problem is a classical problem that has been extensively studied in the theoretical computer science, machine learning and the computational geometry communities. In this problem, we are given a set of $n$ points in Euclidean space $R^d$, and the goal is to choose $k$ centers in $R^d$ so that the sum of squared distances of each point to its nearest center is minimized. The best approximation algorithms for this problem include a polynomial time constant factor approximation for general $k$ and a $(1+\epsilon)$-approximation which runs in time $poly(n) 2^{O(k/\epsilon)}$. At the other extreme, the only known computational complexity result for this problem is NP-hardness [ADHP'09]. The main difficulty in obtaining hardness results stems from the Euclidean nature of the problem, and the fact that any point in $R^d$ can be a potential center. This gap in understanding left open the intriguing possibility that the problem might admit a PTAS for all $k,d$.
In this paper we provide the first hardness of approximation for the Euclidean $k$-means problem. Concretely, we show that there exists a constant $\epsilon > 0$ such that it is NP-hard to approximate the $k$-means objective to within a factor of $(1+\epsilon)$. We show this via an efficient reduction from the vertex cover problem on triangle-free graphs: given a triangle-free graph, the goal is to choose the fewest number of vertices which are incident on all the edges. Additionally, we give a proof that the current best hardness results for vertex cover can be carried over to triangle-free graphs. To show this we transform $G$, a known hard vertex cover instance, by taking a graph product with a suitably chosen graph $H$, and showing that the size of the (normalized) maximum independent set is almost exactly preserved in the product graph using a spectral analysis, which might be of independent interest.

Subjects:	Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS)
Cite as:	arXiv:1502.03316 [cs.CC]
	(or arXiv:1502.03316v1 [cs.CC] for this version)
	https://doi.org/10.48550/arXiv.1502.03316

Computer Science > Computational Complexity

Title:The Hardness of Approximation of Euclidean k-means

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators