$k$-Means Clustering for Persistent Homology

Leung, Prudence; Cao, Yueqi; Monod, Anthea

Statistics > Applications

arXiv:2210.10003v1 (stat)

[Submitted on 18 Oct 2022 (this version), latest version 25 Nov 2023 (v4)]

Title:$k$-Means Clustering for Persistent Homology

Authors:Prudence Leung, Yueqi Cao, Anthea Monod

View PDF

Abstract:Persistent homology is a fundamental methodology from topological data analysis that summarizes the lifetimes of topological features within a dataset as a persistence diagram; it has recently gained much popularity from its myriad successful applications to many domains. However, a significant challenge to its widespread implementation, especially in statistical methodology and machine learning algorithms, is the format of the persistence diagram as a multiset of half-open intervals. In this paper, we comprehensively study $k$-means clustering where the input is various embeddings of persistence diagrams, as well as persistence diagrams themselves and their generalizations as persistence measures. We show that the clustering performance directly on persistence diagrams and measures far outperform their vectorized representations, despite their more complex representations. Moreover, we prove convergence of the algorithm on persistence diagram space and establish theoretical properties of the solution to the optimization problem in the Karush--Kuhn--Tucker framework.

Comments:	13 pages, 3 figures
Subjects:	Applications (stat.AP); Optimization and Control (math.OC); Machine Learning (stat.ML)
Cite as:	arXiv:2210.10003 [stat.AP]
	(or arXiv:2210.10003v1 [stat.AP] for this version)
	https://doi.org/10.48550/arXiv.2210.10003

Submission history

From: Yueqi Cao [view email]
[v1] Tue, 18 Oct 2022 17:18:51 UTC (2,894 KB)
[v2] Tue, 21 Feb 2023 22:56:25 UTC (2,895 KB)
[v3] Sun, 30 Jul 2023 12:58:54 UTC (3,531 KB)
[v4] Sat, 25 Nov 2023 13:04:28 UTC (3,531 KB)

Statistics > Applications

Title:$k$-Means Clustering for Persistent Homology

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Applications

Title:$k$-Means Clustering for Persistent Homology

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators