Should we really use post-hoc tests based on mean-ranks?

Benavoli, Alessio; Corani, Giorgio; Mangili, Francesca

Computer Science > Machine Learning

arXiv:1505.02288 (cs)

[Submitted on 9 May 2015]

Title:Should we really use post-hoc tests based on mean-ranks?

Authors:Alessio Benavoli, Giorgio Corani, Francesca Mangili

View PDF

Abstract:The statistical comparison of multiple algorithms over multiple data sets is fundamental in machine learning. This is typically carried out by the Friedman test. When the Friedman test rejects the null hypothesis, multiple comparisons are carried out to establish which are the significant differences among algorithms. The multiple comparisons are usually performed using the mean-ranks test. The aim of this technical note is to discuss the inconsistencies of the mean-ranks post-hoc test with the goal of discouraging its use in machine learning as well as in medicine, psychology, etc.. We show that the outcome of the mean-ranks test depends on the pool of algorithms originally included in the experiment. In other words, the outcome of the comparison between algorithms A and B depends also on the performance of the other algorithms included in the original experiment. This can lead to paradoxical situations. For instance the difference between A and B could be declared significant if the pool comprises algorithms C, D, E and not significant if the pool comprises algorithms F, G, H. To overcome these issues, we suggest instead to perform the multiple comparison using a test whose outcome only depends on the two algorithms being compared, such as the sign-test or the Wilcoxon signed-rank test.

Subjects:	Machine Learning (cs.LG); Statistics Theory (math.ST); Data Analysis, Statistics and Probability (physics.data-an); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)
Cite as:	arXiv:1505.02288 [cs.LG]
	(or arXiv:1505.02288v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1505.02288

Submission history

From: Alessio Benavoli [view email]
[v1] Sat, 9 May 2015 15:54:56 UTC (103 KB)

Computer Science > Machine Learning

Title:Should we really use post-hoc tests based on mean-ranks?

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Should we really use post-hoc tests based on mean-ranks?

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators