Inference with Randomized Regression Trees

Bakshi, Soham; Huang, Yiling; Panigrahi, Snigdha; Dempsey, Walter

Abstract:Regression trees are a popular machine learning algorithm that fit piecewise constant models by recursively partitioning the predictor space. In this paper, we focus on performing statistical inference in a data-dependent model obtained from the fitted tree. We introduce Randomized Regression Trees (RRT), a novel selective inference method that adds independent Gaussian noise to the gain function underlying the splitting rules of classic regression trees.
The RRT method offers several advantages. First, it utilizes the added randomization to obtain an exact pivot using the full dataset, while accounting for the data-dependent structure of the fitted tree. Second, with a small amount of randomization, the RRT method achieves predictive accuracy similar to a model trained on the entire dataset. At the same time, it provides significantly more powerful inference than data splitting methods, which rely only on a held-out portion of the data for inference. Third, unlike data splitting approaches, it yields intervals that adapt to the signal strength in the data. Our empirical analyses highlight these advantages of the RRT method and its ability to convert a purely predictive algorithm into a method capable of performing reliable and powerful inference in the tree model.

Comments:	49 pages, 6 figures
Subjects:	Methodology (stat.ME)
Cite as:	arXiv:2412.20535 [stat.ME]
	(or arXiv:2412.20535v1 [stat.ME] for this version)
	https://doi.org/10.48550/arXiv.2412.20535

Statistics > Methodology

Title:Inference with Randomized Regression Trees

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators