Guaranteed Validity for Empirical Approaches to Adaptive Data Analysis

Rogers, Ryan; Roth, Aaron; Smith, Adam; Srebro, Nathan; Thakkar, Om; Woodworth, Blake

Computer Science > Machine Learning

arXiv:1906.09231 (cs)

[Submitted on 21 Jun 2019 (v1), last revised 9 Mar 2020 (this version, v2)]

Title:Guaranteed Validity for Empirical Approaches to Adaptive Data Analysis

Authors:Ryan Rogers, Aaron Roth, Adam Smith, Nathan Srebro, Om Thakkar, Blake Woodworth

View PDF

Abstract:We design a general framework for answering adaptive statistical queries that focuses on providing explicit confidence intervals along with point estimates. Prior work in this area has either focused on providing tight confidence intervals for specific analyses, or providing general worst-case bounds for point estimates. Unfortunately, as we observe, these worst-case bounds are loose in many settings --- often not even beating simple baselines like sample splitting. Our main contribution is to design a framework for providing valid, instance-specific confidence intervals for point estimates that can be generated by heuristics. When paired with good heuristics, this method gives guarantees that are orders of magnitude better than the best worst-case bounds. We provide a Python library implementing our method.

Comments:	Accepted to appear in the proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS) 2020
Subjects:	Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
Cite as:	arXiv:1906.09231 [cs.LG]
	(or arXiv:1906.09231v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1906.09231

Submission history

From: Om Thakkar [view email]
[v1] Fri, 21 Jun 2019 16:33:02 UTC (332 KB)
[v2] Mon, 9 Mar 2020 05:30:40 UTC (3,605 KB)

Computer Science > Machine Learning

Title:Guaranteed Validity for Empirical Approaches to Adaptive Data Analysis

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Guaranteed Validity for Empirical Approaches to Adaptive Data Analysis

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators