Optimal Data Splitting for Holdout Cross-Validation in Large Covariance Matrix Estimation

Lamrani, Lamia; Bongiorno, Christian; Potters, Marc

Mathematics > Statistics Theory

arXiv:2503.15186 (math)

[Submitted on 19 Mar 2025]

Title:Optimal Data Splitting for Holdout Cross-Validation in Large Covariance Matrix Estimation

Authors:Lamia Lamrani, Christian Bongiorno, Marc Potters

View PDF HTML (experimental)

Abstract:Cross-validation is a statistical tool that can be used to improve large covariance matrix estimation. Although its efficiency is observed in practical applications, the theoretical reasons behind it remain largely intuitive, with formal proofs currently lacking. To carry on analytical analysis, we focus on the holdout method, a single iteration of cross-validation, rather than the traditional $k$-fold approach. We derive a closed-form expression for the estimation error when the population matrix follows a white inverse Wishart distribution, and we observe the optimal train-test split scales as the square root of the matrix dimension. For general population matrices, we connected the error to the variance of eigenvalues distribution, but approximations are necessary. Interestingly, in the high-dimensional asymptotic regime, both the holdout and $k$-fold cross-validation methods converge to the optimal estimator when the train-test ratio scales with the square root of the matrix dimension.

Subjects:	Statistics Theory (math.ST); Portfolio Management (q-fin.PM); Risk Management (q-fin.RM); Applications (stat.AP)
Cite as:	arXiv:2503.15186 [math.ST]
	(or arXiv:2503.15186v1 [math.ST] for this version)
	https://doi.org/10.48550/arXiv.2503.15186

Submission history

From: Lamia Lamrani [view email]
[v1] Wed, 19 Mar 2025 13:15:15 UTC (96 KB)

Full-text links:

Access Paper:

view license

Current browse context:

math.ST

< prev | next >

new | recent | 2025-03

Change to browse by:

math
q-fin
q-fin.PM
q-fin.RM
stat
stat.AP
stat.TH

References & Citations

export BibTeX citation

Mathematics > Statistics Theory

Title:Optimal Data Splitting for Holdout Cross-Validation in Large Covariance Matrix Estimation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Mathematics > Statistics Theory

Title:Optimal Data Splitting for Holdout Cross-Validation in Large Covariance Matrix Estimation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators