Empirical investigation of multi-source cross-validation in clinical ECG classification

Leinonen, Tuija; Wong, David; Vasankari, Antti; Wahab, Ali; Nadarajah, Ramesh; Kaisti, Matti; Airola, Antti

doi:10.1016/j.compbiomed.2024.109271

Computer Science > Machine Learning

arXiv:2403.15012 (cs)

[Submitted on 22 Mar 2024 (v1), last revised 23 Oct 2024 (this version, v2)]

Title:Empirical investigation of multi-source cross-validation in clinical ECG classification

Authors:Tuija Leinonen, David Wong, Antti Vasankari, Ali Wahab, Ramesh Nadarajah, Matti Kaisti, Antti Airola

View PDF HTML (experimental)

Abstract:Traditionally, machine learning-based clinical prediction models have been trained and evaluated on patient data from a single source, such as a hospital. Cross-validation methods can be used to estimate the accuracy of such models on new patients originating from the same source, by repeated random splitting of the data. However, such estimates tend to be highly overoptimistic when compared to accuracy obtained from deploying models to sources not represented in the dataset, such as a new hospital. The increasing availability of multi-source medical datasets provides new opportunities for obtaining more comprehensive and realistic evaluations of expected accuracy through source-level cross-validation designs.
In this study, we present a systematic empirical evaluation of standard K-fold cross-validation and leave-source-out cross-validation methods in a multi-source setting. We consider the task of electrocardiogram based cardiovascular disease classification, combining and harmonizing the openly available PhysioNet CinC Challenge 2021 and the Shandong Provincial Hospital datasets for our study.
Our results show that K-fold cross-validation, both on single-source and multi-source data, systemically overestimates prediction performance when the end goal is to generalize to new sources. Leave-source-out cross-validation provides more reliable performance estimates, having close to zero bias though larger variability. The evaluation highlights the dangers of obtaining misleading cross-validation results on medical data and demonstrates how these issues can be mitigated when having access to multi-source data.

Comments:	19 pages, 5 figures
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2403.15012 [cs.LG]
	(or arXiv:2403.15012v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2403.15012
Journal reference:	Computers in Biology and Medicine, 183, 109271 (2024)
Related DOI:	https://doi.org/10.1016/j.compbiomed.2024.109271

Submission history

From: Antti Airola [view email]
[v1] Fri, 22 Mar 2024 07:56:31 UTC (596 KB)
[v2] Wed, 23 Oct 2024 06:27:26 UTC (1,782 KB)

Computer Science > Machine Learning

Title:Empirical investigation of multi-source cross-validation in clinical ECG classification

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Empirical investigation of multi-source cross-validation in clinical ECG classification

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators