Stability of Random Forests and Coverage of Random-Forest Prediction Intervals

Wang, Yan; Wu, Huaiqing; Nettleton, Dan

Statistics > Machine Learning

arXiv:2310.18814 (stat)

[Submitted on 28 Oct 2023]

Title:Stability of Random Forests and Coverage of Random-Forest Prediction Intervals

Authors:Yan Wang, Huaiqing Wu, Dan Nettleton

View PDF

Abstract:We establish stability of random forests under the mild condition that the squared response ($Y^2$) does not have a heavy tail. In particular, our analysis holds for the practical version of random forests that is implemented in popular packages like \texttt{randomForest} in \texttt{R}. Empirical results show that stability may persist even beyond our assumption and hold for heavy-tailed $Y^2$. Using the stability property, we prove a non-asymptotic lower bound for the coverage probability of prediction intervals constructed from the out-of-bag error of random forests. With another mild condition that is typically satisfied when $Y$ is continuous, we also establish a complementary upper bound, which can be similarly established for the jackknife prediction interval constructed from an arbitrary stable algorithm. We also discuss the asymptotic coverage probability under assumptions weaker than those considered in previous literature. Our work implies that random forests, with its stability property, is an effective machine learning method that can provide not only satisfactory point prediction but also justified interval prediction at almost no extra computational cost.

Comments:	NeurIPS 2023
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2310.18814 [stat.ML]
	(or arXiv:2310.18814v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2310.18814

Submission history

From: Yan Wang [view email]
[v1] Sat, 28 Oct 2023 20:38:53 UTC (155 KB)

Statistics > Machine Learning

Title:Stability of Random Forests and Coverage of Random-Forest Prediction Intervals

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Stability of Random Forests and Coverage of Random-Forest Prediction Intervals

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators