Clustering under Perturbation Resilience

Balcan, Maria Florina; Liang, Yingyu

Computer Science > Machine Learning

arXiv:1112.0826 (cs)

[Submitted on 5 Dec 2011 (v1), last revised 11 Dec 2016 (this version, v5)]

Title:Clustering under Perturbation Resilience

Authors:Maria Florina Balcan, Yingyu Liang

View PDF

Abstract:Motivated by the fact that distances between data points in many real-world clustering instances are often based on heuristic measures, Bilu and Linial~\cite{BL} proposed analyzing objective based clustering problems under the assumption that the optimum clustering to the objective is preserved under small multiplicative perturbations to distances between points. The hope is that by exploiting the structure in such instances, one can overcome worst case hardness results.
In this paper, we provide several results within this framework. For center-based objectives, we present an algorithm that can optimally cluster instances resilient to perturbations of factor $(1 + \sqrt{2})$, solving an open problem of Awasthi et al.~\cite{ABS10}. For $k$-median, a center-based objective of special interest, we additionally give algorithms for a more relaxed assumption in which we allow the optimal solution to change in a small $\epsilon$ fraction of the points after perturbation. We give the first bounds known for $k$-median under this more realistic and more general assumption. We also provide positive results for min-sum clustering which is typically a harder objective than center-based objectives from approximability standpoint. Our algorithms are based on new linkage criteria that may be of independent interest.
Additionally, we give sublinear-time algorithms, showing algorithms that can return an implicit clustering from only access to a small random sample.

Comments:	54 pages. Appears in SIAM Journal on Computing (SICOMP), 2016
Subjects:	Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)
MSC classes:	68Q25, 68Q32, 68T05, 68W25, 68W40
Cite as:	arXiv:1112.0826 [cs.LG]
	(or arXiv:1112.0826v5 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1112.0826

Submission history

From: Yingyu Liang [view email]
[v1] Mon, 5 Dec 2011 03:42:07 UTC (120 KB)
[v2] Tue, 27 Dec 2011 19:49:32 UTC (120 KB)
[v3] Fri, 30 Dec 2011 03:37:24 UTC (120 KB)
[v4] Fri, 8 Aug 2014 02:27:56 UTC (60 KB)
[v5] Sun, 11 Dec 2016 21:41:33 UTC (815 KB)

Computer Science > Machine Learning

Title:Clustering under Perturbation Resilience

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Clustering under Perturbation Resilience

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators