Sequential Permutation Testing of Random Forest Variable Importance Measures

Hapfelmeier, Alexander; Hornung, Roman; Haller, Bernhard

doi:10.1016/j.csda.2022.107689

Statistics > Methodology

arXiv:2206.01284 (stat)

[Submitted on 2 Jun 2022]

Title:Sequential Permutation Testing of Random Forest Variable Importance Measures

Authors:Alexander Hapfelmeier, Roman Hornung, Bernhard Haller

View PDF

Abstract:Hypothesis testing of random forest (RF) variable importance measures (VIMP) remains the subject of ongoing research. Among recent developments, heuristic approaches to parametric testing have been proposed whose distributional assumptions are based on empirical evidence. Other formal tests under regularity conditions were derived analytically. However, these approaches can be computationally expensive or even practically infeasible. This problem also occurs with non-parametric permutation tests, which are, however, distribution-free and can generically be applied to any type of RF and VIMP. Embracing this advantage, it is proposed here to use sequential permutation tests and sequential p-value estimation to reduce the high computational costs associated with conventional permutation tests. The popular and widely used permutation VIMP serves as a practical and relevant application example. The results of simulation studies confirm that the theoretical properties of the sequential tests apply, that is, the type-I error probability is controlled at a nominal level and a high power is maintained with considerably fewer permutations needed in comparison to conventional permutation testing. The numerical stability of the methods is investigated in two additional application studies. In summary, theoretically sound sequential permutation testing of VIMP is possible at greatly reduced computational costs. Recommendations for application are given. A respective implementation is provided through the accompanying R package $rfvimptest$. The approach can also be easily applied to any kind of prediction model.

Subjects:	Methodology (stat.ME); Machine Learning (cs.LG); Applications (stat.AP); Computation (stat.CO); Machine Learning (stat.ML)
Cite as:	arXiv:2206.01284 [stat.ME]
	(or arXiv:2206.01284v1 [stat.ME] for this version)
	https://doi.org/10.48550/arXiv.2206.01284
Journal reference:	Computational Statistics & Data Analysis 181 (2023): 107689
Related DOI:	https://doi.org/10.1016/j.csda.2022.107689

Submission history

From: Alexander Hapfelmeier [view email]
[v1] Thu, 2 Jun 2022 20:16:50 UTC (330 KB)

Statistics > Methodology

Title:Sequential Permutation Testing of Random Forest Variable Importance Measures

Submission history

Access Paper:

Ancillary files (details):

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Methodology

Title:Sequential Permutation Testing of Random Forest Variable Importance Measures

Submission history

Access Paper:

Ancillary files (details):

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators