K-sets+: a Linear-time Clustering Algorithm for Data Points with a Sparse Similarity Measure

Chang, Cheng-Shang; Chang, Chia-Tai; Lee, Duan-Shin; Liou, Li-Heng

Computer Science > Data Structures and Algorithms

arXiv:1705.04249 (cs)

[Submitted on 11 May 2017]

Title:K-sets+: a Linear-time Clustering Algorithm for Data Points with a Sparse Similarity Measure

Authors:Cheng-Shang Chang, Chia-Tai Chang, Duan-Shin Lee, Li-Heng Liou

View PDF

Abstract:In this paper, we first propose a new iterative algorithm, called the K-sets+ algorithm for clustering data points in a semi-metric space, where the distance measure does not necessarily satisfy the triangular inequality. We show that the K-sets+ algorithm converges in a finite number of iterations and it retains the same performance guarantee as the K-sets algorithm for clustering data points in a metric space. We then extend the applicability of the K-sets+ algorithm from data points in a semi-metric space to data points that only have a symmetric similarity measure. Such an extension leads to great reduction of computational complexity. In particular, for an n * n similarity matrix with m nonzero elements in the matrix, the computational complexity of the K-sets+ algorithm is O((Kn + m)I), where I is the number of iterations. The memory complexity to achieve that computational complexity is O(Kn + m). As such, both the computational complexity and the memory complexity are linear in n when the n * n similarity matrix is sparse, i.e., m = O(n). We also conduct various experiments to show the effectiveness of the K-sets+ algorithm by using a synthetic dataset from the stochastic block model and a real network from the WonderNetwork website.

Subjects:	Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
Cite as:	arXiv:1705.04249 [cs.DS]
	(or arXiv:1705.04249v1 [cs.DS] for this version)
	https://doi.org/10.48550/arXiv.1705.04249

Submission history

From: Li Heng Liou [view email]
[v1] Thu, 11 May 2017 15:39:48 UTC (301 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.DS

< prev | next >

new | recent | 2017-05

Change to browse by:

cs
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Cheng-Shang Chang
Chia-Tai Chang
Duan-Shin Lee
Li-Heng Liou

export BibTeX citation

Computer Science > Data Structures and Algorithms

Title:K-sets+: a Linear-time Clustering Algorithm for Data Points with a Sparse Similarity Measure

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Data Structures and Algorithms

Title:K-sets+: a Linear-time Clustering Algorithm for Data Points with a Sparse Similarity Measure

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators