Enabling DBSCAN for Very Large-Scale High-Dimensional Spaces

Wang, Yongyu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2411.11421 (cs)

[Submitted on 18 Nov 2024 (v1), last revised 3 Dec 2024 (this version, v3)]

Title:Enabling DBSCAN for Very Large-Scale High-Dimensional Spaces

Authors:Yongyu Wang

View PDF HTML (experimental)

Abstract:DBSCAN is one of the most important non-parametric unsupervised data analysis tools. By applying DBSCAN to a dataset, two key analytical results can be obtained: (1) clustering data points based on density distribution and (2) identifying outliers in the dataset. However, the time complexity of the DBSCAN algorithm is $O(n^2 \beta)$, where $n$ is the number of data points and $\beta = O(D)$, with $D$ representing the dimensionality of the data space. As a result, DBSCAN becomes computationally infeasible when both $n$ and $D$ are large. In this paper, we propose a DBSCAN method based on spectral data compression, capable of efficiently processing datasets with a large number of data points ($n$) and high dimensionality ($D$). By preserving only the most critical structural information during the compression process, our method effectively removes substantial redundancy and noise. Consequently, the solution quality of DBSCAN is significantly improved, enabling more accurate and reliable results.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2411.11421 [cs.CV]
	(or arXiv:2411.11421v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2411.11421

Submission history

From: Yongyu Wang [view email]
[v1] Mon, 18 Nov 2024 09:46:45 UTC (1,590 KB)
[v2] Fri, 29 Nov 2024 08:02:35 UTC (498 KB)
[v3] Tue, 3 Dec 2024 10:23:16 UTC (497 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Enabling DBSCAN for Very Large-Scale High-Dimensional Spaces

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Enabling DBSCAN for Very Large-Scale High-Dimensional Spaces

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators