Sharp Analysis of a Simple Model for Random Forests

Klusowski, Jason M.

Statistics > Machine Learning

arXiv:1805.02587v6 (stat)

[Submitted on 7 May 2018 (v1), revised 17 Jun 2019 (this version, v6), latest version 22 Jun 2020 (v7)]

Title:Sharp Analysis of a Simple Model for Random Forests

Authors:Jason M. Klusowski

View PDF

Abstract:Random forests have become an important tool for improving accuracy in regression problems since their popularization by [Breiman, 2001] and others. In this paper, we revisit a random forest model originally proposed by [Breiman, 2004] and later studied by [Biau, 2012], where a feature is selected at random and the split occurs at the midpoint of the box containing the chosen feature. If the Lipschitz regression function is sparse and only depends on a small, unknown subset of $S$ out of $d$ features, we show that, given access to $n$ observations, this random forest model outputs a predictor that has a mean-squared prediction error $O((n(\sqrt{\log n})^{S-1})^{-\frac{1}{S\log2+1}})$. This positively answers an outstanding question of [Biau, 2012] about whether the rate of convergence therein could be improved. The second part of this article shows that the aforementioned prediction error cannot generally be improved, which we accomplish by characterizing the variance and by showing that the bias is tight for any linear model with nonzero parameter vector. As a striking consequence of our analysis, we show the variance of this forest is similar in form to the best-case variance lower bound of [Lin and Jeon, 2006] among all random forest models with nonadaptive splitting schemes (i.e., where the split protocol is independent of the training data).

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
MSC classes:	62G08, 68W20
Cite as:	arXiv:1805.02587 [stat.ML]
	(or arXiv:1805.02587v6 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.1805.02587

Submission history

From: Jason Klusowski M [view email]
[v1] Mon, 7 May 2018 15:52:48 UTC (188 KB)
[v2] Wed, 16 May 2018 21:52:46 UTC (191 KB)
[v3] Tue, 12 Jun 2018 00:38:32 UTC (196 KB)
[v4] Tue, 26 Jun 2018 18:40:29 UTC (122 KB)
[v5] Fri, 21 Sep 2018 04:32:56 UTC (123 KB)
[v6] Mon, 17 Jun 2019 01:08:34 UTC (134 KB)
[v7] Mon, 22 Jun 2020 22:46:20 UTC (39 KB)

Statistics > Machine Learning

Title:Sharp Analysis of a Simple Model for Random Forests

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Sharp Analysis of a Simple Model for Random Forests

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators