Clustering under Perturbation Resilience

Balcan, Maria Florina; Liang, Yingyu

Computer Science > Machine Learning

arXiv:1112.0826v2 (cs)

[Submitted on 5 Dec 2011 (v1), revised 27 Dec 2011 (this version, v2), latest version 11 Dec 2016 (v5)]

Title:Clustering under Perturbation Resilience

Authors:Maria Florina Balcan, Yingyu Liang

View PDF

Abstract:Recently, Bilu and Linial formalized an implicit assumption often made when choosing a clustering objective: that the optimum clustering to the objective should be preserved under small multiplicative perturbations to distances between points. They showed that for max-cut clustering it is possible to circumvent NP-hardness and obtain polynomial-time algorithms for instances resilient to large (factor $O(\sqrt{n})$) perturbations, and subsequently Awasthi et al. considered center-based objectives, giving algorithms for instances resilient to O(1) factor perturbations.
In this paper, we greatly advance this line of work. For center-based objectives, we present an algorithm that can optimally cluster instances resilient to $(1 + \sqrt{2})$-factor perturbations, solving an open problem of Awasthi et al. For a commonly used center-based objective $k$-median, we additionally give algorithms for a more relaxed assumption in which we allow the optimal solution to change in a small $\epsilon$ fraction of the points after perturbation. We give the first bounds known for this more realistic and more general setting. We also provide positive results for min-sum clustering which is a generally much harder objective than $k$-median (and also non-center-based). Our algorithms are based on new linkage criteria that may be of independent interest.
Additionally, we give sublinear-time algorithms, showing algorithms that can return an implicit clustering from only access to a small random sample.

Subjects:	Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)
Cite as:	arXiv:1112.0826 [cs.LG]
	(or arXiv:1112.0826v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1112.0826

Submission history

From: Yingyu Liang [view email]
[v1] Mon, 5 Dec 2011 03:42:07 UTC (120 KB)
[v2] Tue, 27 Dec 2011 19:49:32 UTC (120 KB)
[v3] Fri, 30 Dec 2011 03:37:24 UTC (120 KB)
[v4] Fri, 8 Aug 2014 02:27:56 UTC (60 KB)
[v5] Sun, 11 Dec 2016 21:41:33 UTC (815 KB)

Computer Science > Machine Learning

Title:Clustering under Perturbation Resilience

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Clustering under Perturbation Resilience

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators