When can you trust feature selection? -- II: On the effects of random data on condition in statistics and optimisation

Bastounis, Alexander; Cucker, Felipe; Hansen, Anders C.

Abstract:In Part I, we defined a LASSO condition number and developed an algorithm -- for computing support sets (feature selection) of the LASSO minimisation problem -- that runs in polynomial time in the number of variables and the logarithm of the condition number. The algorithm is trustworthy in the sense that if the condition number is infinite, the algorithm will run forever and never produce an incorrect output. In this Part II article, we demonstrate how finite precision algorithms (for example algorithms running floating point arithmetic) will fail on open sets when the condition number is large -- but still finite. This augments Part I's result: If an algorithm takes inputs from an open set that includes at least one point with an infinite condition number, it fails to compute the correct support set for all inputs within that set. Hence, for any finite precision algorithm working on open sets for the LASSO problem with random inputs, our LASSO condition number -- as a random variable -- will estimate the probability of success/failure of the algorithm. We show that a finite precision version of our algorithm works on traditional Gaussian data for LASSO with high probability. The algorithm is trustworthy, specifically, in the random cases where the algorithm fails, it will not produce an output. Finally, we demonstrate classical random ensembles for which the condition number will be large with high probability, and hence where any finite precision algorithm on open sets will fail. We show numerically how commercial software fails on these cases.

Comments:	24 pages, 1 figure
Subjects:	Optimization and Control (math.OC)
Cite as:	arXiv:2312.11429 [math.OC]
	(or arXiv:2312.11429v1 [math.OC] for this version)
	https://doi.org/10.48550/arXiv.2312.11429

Mathematics > Optimization and Control

Title:When can you trust feature selection? -- II: On the effects of random data on condition in statistics and optimisation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators