A Large-scale Study on Unsupervised Outlier Model Selection: Evaluating the Internal Model Evaluation Strategies

Ma, Martin Q.; Zhao, Yue; Zhang, Xiaorong; Akoglu, Leman

Computer Science > Machine Learning

arXiv:2104.01422v1 (cs)

[Submitted on 3 Apr 2021 (this version), latest version 12 Apr 2021 (v2)]

Title:A Large-scale Study on Unsupervised Outlier Model Selection: Evaluating the Internal Model Evaluation Strategies

Authors:Martin Q. Ma, Yue Zhao, Xiaorong Zhang, Leman Akoglu

View PDF

Abstract:Given an unsupervised outlier detection task, how should one select a detection algorithm as well as its hyperparameters (jointly called a model)? Unsupervised model selection is notoriously difficult, in the absence of hold-out validation data with ground-truth labels. Therefore, the problem is vastly understudied. In this work, we study the feasibility of employing internal model evaluation strategies for selecting a model for outlier detection. These so-called internal strategies solely rely on the input data (without labels) and the output (outlier scores) of the candidate models. We setup (and open-source) a large testbed with 39 detection tasks and 297 candidate models comprised of 8 detectors and various hyperparameter configurations. We evaluate 7 different strategies on their ability to discriminate between models w.r.t. detection performance, without using any labels. Our study reveals room for progress -- we find that none would be practically useful, as they select models only comparable to a state-of-the-art detector (with random configuration).

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2104.01422 [cs.LG]
	(or arXiv:2104.01422v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2104.01422

Submission history

From: Martin Q. Ma [view email]
[v1] Sat, 3 Apr 2021 14:56:29 UTC (426 KB)
[v2] Mon, 12 Apr 2021 19:24:44 UTC (426 KB)

Computer Science > Machine Learning

Title:A Large-scale Study on Unsupervised Outlier Model Selection: Evaluating the Internal Model Evaluation Strategies

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A Large-scale Study on Unsupervised Outlier Model Selection: Evaluating the Internal Model Evaluation Strategies

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators