Statistical Inference for Clustering-based Anomaly Detection

Phu, Nguyen Thi Minh; Loc, Duong Tan; Duy, Vo Nguyen Le

Statistics > Machine Learning

arXiv:2504.18633 (stat)

[Submitted on 25 Apr 2025]

Title:Statistical Inference for Clustering-based Anomaly Detection

Authors:Nguyen Thi Minh Phu, Duong Tan Loc, Vo Nguyen Le Duy

View PDF HTML (experimental)

Abstract:Unsupervised anomaly detection (AD) is a fundamental problem in machine learning and statistics. A popular approach to unsupervised AD is clustering-based detection. However, this method lacks the ability to guarantee the reliability of the detected anomalies. In this paper, we propose SI-CLAD (Statistical Inference for CLustering-based Anomaly Detection), a novel statistical framework for testing the clustering-based AD results. The key strength of SI-CLAD lies in its ability to rigorously control the probability of falsely identifying anomalies, maintaining it below a pre-specified significance level $\alpha$ (e.g., $\alpha = 0.05$). By analyzing the selection mechanism inherent in clustering-based AD and leveraging the Selective Inference (SI) framework, we prove that false detection control is attainable. Moreover, we introduce a strategy to boost the true detection rate, enhancing the overall performance of SI-CLAD. Extensive experiments on synthetic and real-world datasets provide strong empirical support for our theoretical findings, showcasing the superior performance of the proposed method.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2504.18633 [stat.ML]
	(or arXiv:2504.18633v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2504.18633

Submission history

From: Vo Nguyen Le Duy [view email]
[v1] Fri, 25 Apr 2025 18:21:26 UTC (2,162 KB)

Statistics > Machine Learning

Title:Statistical Inference for Clustering-based Anomaly Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Statistical Inference for Clustering-based Anomaly Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators