Computational Feasibility of Clustering under Clusterability Assumptions

Ben-David, Shai

Computer Science > Computational Complexity

arXiv:1501.00437 (cs)

[Submitted on 2 Jan 2015]

Title:Computational Feasibility of Clustering under Clusterability Assumptions

Authors:Shai Ben-David

View PDF

Abstract:It is well known that most of the common clustering objectives are NP-hard to optimize. In practice, however, clustering is being routinely carried out. One approach for providing theoretical understanding of this seeming discrepancy is to come up with notions of clusterability that distinguish realistically interesting input data from worst-case data sets. The hope is that there will be clustering algorithms that are provably efficient on such 'clusterable' instances. In other words, hope that "Clustering is difficult only when it does not matter" (CDNM thesis, for short).
We believe that to some extent this may indeed be the case. This paper provides a survey of recent papers along this line of research and a critical evaluation their results. Our bottom line conclusion is that that CDNM thesis is still far from being formally substantiated. We start by discussing which requirements should be met in order to provide formal support the validity of the CDNM thesis. In particular, we list some implied requirements for notions of clusterability. We then examine existing results in view of those requirements and outline some research challenges and open questions.

Subjects:	Computational Complexity (cs.CC); Machine Learning (cs.LG)
MSC classes:	68Q25, 68Q32
ACM classes:	F.1.3; F.2.2; H.3.3; G.3
Cite as:	arXiv:1501.00437 [cs.CC]
	(or arXiv:1501.00437v1 [cs.CC] for this version)
	https://doi.org/10.48550/arXiv.1501.00437

Submission history

From: Shai Ben-David [view email]
[v1] Fri, 2 Jan 2015 17:10:52 UTC (21 KB)

Computer Science > Computational Complexity

Title:Computational Feasibility of Clustering under Clusterability Assumptions

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computational Complexity

Title:Computational Feasibility of Clustering under Clusterability Assumptions

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators