Better-Than-Chance Classification for Signal Detection

Rosenblatt, Jonathan D.; Benjamini, Yuval; Gilron, Roee; Mukamel, Roy; Goeman, Jelle J.

doi:10.1093/biostatistics/kxz035

Statistics > Methodology

arXiv:1608.08873v2 (stat)

[Submitted on 31 Aug 2016 (v1), last revised 14 Dec 2017 (this version, v2)]

Title:Better-Than-Chance Classification for Signal Detection

Authors:Jonathan D. Rosenblatt, Yuval Benjamini, Roee Gilron, Roy Mukamel, Jelle J. Goeman

View PDF

Abstract:The estimated accuracy of a classifier is a random quantity with variability. A common practice in supervised machine learning, is thus to test if the estimated accuracy is significantly better than chance level. This method of signal detection is particularly popular in neuroimaging and genetics. We provide evidence that using a classifier's accuracy as a test statistic can be an underpowered strategy for finding differences between populations, compared to a bona-fide statistical test. It is also computationally more demanding than a statistical test. Via simulation, we compare test statistics that are based on classification accuracy, to others based on multivariate test statistics. We find that probability of detecting differences between two distributions is lower for accuracy based statistics. We examine several candidate causes for the low power of accuracy tests. These causes include: the discrete nature of the accuracy test statistic, the type of signal accuracy tests are designed to detect, their inefficient use of the data, and their regularization. When the purposes of the analysis is not signal detection, but rather, the evaluation of a particular classifier, we suggest several improvements to increase power. In particular, to replace V-fold cross validation with the Leave-One-Out Bootstrap.

Subjects:	Methodology (stat.ME)
Cite as:	arXiv:1608.08873 [stat.ME]
	(or arXiv:1608.08873v2 [stat.ME] for this version)
	https://doi.org/10.48550/arXiv.1608.08873
Related DOI:	https://doi.org/10.1093/biostatistics/kxz035

Submission history

From: Jonathan Rosenblatt [view email]
[v1] Wed, 31 Aug 2016 14:15:38 UTC (348 KB)
[v2] Thu, 14 Dec 2017 06:45:38 UTC (364 KB)

Statistics > Methodology

Title:Better-Than-Chance Classification for Signal Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Methodology

Title:Better-Than-Chance Classification for Signal Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators