Machine learning meets false discovery rate

Marandon, Ariane; Lei, Lihua; Mary, David; Roquain, Etienne

Statistics > Methodology

arXiv:2208.06685v1 (stat)

[Submitted on 13 Aug 2022 (this version), latest version 25 Oct 2023 (v3)]

Title:Machine learning meets false discovery rate

Authors:Ariane Marandon, Lihua Lei, David Mary, Etienne Roquain

View PDF

Abstract:Classical false discovery rate (FDR) controlling procedures offer strong and interpretable guarantees, while they often lack of flexibility. On the other hand, recent machine learning classification algorithms, as those based on random forests (RF) or neural networks (NN), have great practical performances but lack of interpretation and of theoretical guarantees. In this paper, we make these two meet by introducing a new adaptive novelty detection procedure with FDR control, called AdaDetect. It extends the scope of recent works of multiple testing literature to the high dimensional setting, notably the one in Yang et al. (2021). AdaDetect is shown to both control strongly the FDR and to have a power that mimics the one of the oracle in a specific sense. The interest and validity of our approach is demonstrated with theoretical results, numerical experiments on several benchmark datasets and with an application to astrophysical data. In particular, while AdaDetect can be used in combination with any classifier, it is particularly efficient on real-world datasets with RF, and on images with NN.

Subjects:	Methodology (stat.ME); Statistics Theory (math.ST); Machine Learning (stat.ML)
Cite as:	arXiv:2208.06685 [stat.ME]
	(or arXiv:2208.06685v1 [stat.ME] for this version)
	https://doi.org/10.48550/arXiv.2208.06685

Submission history

From: Ariane Marandon [view email]
[v1] Sat, 13 Aug 2022 17:14:55 UTC (5,365 KB)
[v2] Sat, 22 Oct 2022 08:35:12 UTC (5,319 KB)
[v3] Wed, 25 Oct 2023 11:29:17 UTC (5,442 KB)

Statistics > Methodology

Title:Machine learning meets false discovery rate

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Methodology

Title:Machine learning meets false discovery rate

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators