A Link between Coding Theory and Cross-Validation with Applications

Pahikkala, Tapio; Movahedi, Parisa; Montoya, Ileana; Miikonen, Havu; Jambor, Ivan; Airola, Antti; Major, Laszlo

Computer Science > Machine Learning

arXiv:2103.11856v1 (cs)

[Submitted on 22 Mar 2021 (this version), latest version 9 Feb 2024 (v3)]

Title:A Link between Coding Theory and Cross-Validation with Applications

Authors:Tapio Pahikkala, Parisa Movahedi, Ileana Montoya, Havu Miikonen, Ivan Jambor, Antti Airola, Laszlo Major

View PDF

Abstract:We study the combinatorics of cross-validation based AUC estimation under the null hypothesis that the binary class labels are exchangeable, that is, the data are randomly assigned into two classes given a fixed class proportion. In particular, we study how the estimators based on leave-pair-out cross-validation (LPOCV), in which every possible pair of data with different class labels is held out from the training set at a time, behave under the null without any prior assumptions of the learning algorithm or the data. It is shown that the maximal number of different fixed proportion label assignments on a sample of data, for which a learning algorithm can achieve zero LPOCV error, is the maximal size of a constant weight error correcting code, whose length is the sample size, weight is the number of data labeled with one, and the Hamming distance between code words is four. We then introduce the concept of a light constant weight code and show similar results for nonzero LPOCV errors. We also prove both upper and lower bounds on the maximal sizes of the light constant weight codes that are similar to the classical results for contant weight codes. These results pave the way towards the design of new LPOCV based statistical tests for the learning algorithms ability of distinguishing two classes from each other that are analogous to the classical Wilcoxon-Mann-Whitney U test for fixed functions. Behavior of some representative examples of learning algorithms and data are simulated in an experimental case study.

Subjects:	Machine Learning (cs.LG); Information Theory (cs.IT); Combinatorics (math.CO)
Cite as:	arXiv:2103.11856 [cs.LG]
	(or arXiv:2103.11856v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2103.11856

Submission history

From: Tapio Pahikkala [view email]
[v1] Mon, 22 Mar 2021 13:57:45 UTC (131 KB)
[v2] Thu, 25 Jan 2024 08:55:05 UTC (129 KB)
[v3] Fri, 9 Feb 2024 09:48:46 UTC (129 KB)

Computer Science > Machine Learning

Title:A Link between Coding Theory and Cross-Validation with Applications

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A Link between Coding Theory and Cross-Validation with Applications

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators