An algorithm-based multiple detection influence measure for high dimensional regression using expectile

Barry, Amadou; Bhagwat, Nikhil; Misic, Bratislav; Poline, Jean-Baptiste; Greenwood, Celia M. T.

Statistics > Methodology

arXiv:2105.12286 (stat)

[Submitted on 26 May 2021]

Title:An algorithm-based multiple detection influence measure for high dimensional regression using expectile

Authors:Amadou Barry, Nikhil Bhagwat, Bratislav Misic, Jean-Baptiste Poline, Celia M. T. Greenwood

View PDF

Abstract:The identification of influential observations is an important part of data analysis that can prevent erroneous conclusions drawn from biased estimators. However, in high dimensional data, this identification is challenging. Classical and recently-developed methods often perform poorly when there are multiple influential observations in the same dataset. In particular, current methods can fail when there is masking several influential observations with similar characteristics, or swamping when the influential observations are near the boundary of the space spanned by well-behaved observations. Therefore, we propose an algorithm-based, multi-step, multiple detection procedure to identify influential observations that addresses current limitations. Our three-step algorithm to identify and capture undesirable variability in the data, $\asymMIP,$ is based on two complementary statistics, inspired by asymmetric correlations, and built on expectiles. Simulations demonstrate higher detection power than competing methods. Use of the resulting asymptotic distribution leads to detection of influential observations without the need for computationally demanding procedures such as the bootstrap. The application of our method to the Autism Brain Imaging Data Exchange neuroimaging dataset resulted in a more balanced and accurate prediction of brain maturity based on cortical thickness. See our GitHub for a free R package that implements our algorithm: \texttt{asymMIP} (\url{this http URL}).

Comments:	38 pages, 11 figures
Subjects:	Methodology (stat.ME); Applications (stat.AP); Computation (stat.CO); Machine Learning (stat.ML)
Cite as:	arXiv:2105.12286 [stat.ME]
	(or arXiv:2105.12286v1 [stat.ME] for this version)
	https://doi.org/10.48550/arXiv.2105.12286

Submission history

From: Amadou Barry [view email]
[v1] Wed, 26 May 2021 01:16:24 UTC (655 KB)

Statistics > Methodology

Title:An algorithm-based multiple detection influence measure for high dimensional regression using expectile

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Methodology

Title:An algorithm-based multiple detection influence measure for high dimensional regression using expectile

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators