Deep Clustering Evaluation: How to Validate Internal Clustering Validation Measures

Wang, Zeya; Ye, Chenglong

Statistics > Machine Learning

arXiv:2403.14830 (stat)

[Submitted on 21 Mar 2024]

Title:Deep Clustering Evaluation: How to Validate Internal Clustering Validation Measures

Authors:Zeya Wang, Chenglong Ye

View PDF HTML (experimental)

Abstract:Deep clustering, a method for partitioning complex, high-dimensional data using deep neural networks, presents unique evaluation challenges. Traditional clustering validation measures, designed for low-dimensional spaces, are problematic for deep clustering, which involves projecting data into lower-dimensional embeddings before partitioning. Two key issues are identified: 1) the curse of dimensionality when applying these measures to raw data, and 2) the unreliable comparison of clustering results across different embedding spaces stemming from variations in training procedures and parameter settings in different clustering models. This paper addresses these challenges in evaluating clustering quality in deep learning. We present a theoretical framework to highlight ineffectiveness arising from using internal validation measures on raw and embedded data and propose a systematic approach to applying clustering validity indices in deep clustering contexts. Experiments show that this framework aligns better with external validation measures, effectively reducing the misguidance from the improper use of clustering validity indices in deep learning.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2403.14830 [stat.ML]
	(or arXiv:2403.14830v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2403.14830

Submission history

From: Zeya Wang [view email]
[v1] Thu, 21 Mar 2024 20:43:44 UTC (33,320 KB)

Statistics > Machine Learning

Title:Deep Clustering Evaluation: How to Validate Internal Clustering Validation Measures

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Deep Clustering Evaluation: How to Validate Internal Clustering Validation Measures

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators