Confidence intervals for performance estimates in 3D medical image segmentation

Jurdi, R. El; Varoquax, G.; Colliot, O.

Electrical Engineering and Systems Science > Image and Video Processing

arXiv:2307.10926v1 (eess)

[Submitted on 20 Jul 2023 (this version), latest version 23 Mar 2025 (v3)]

Title:Confidence intervals for performance estimates in 3D medical image segmentation

Authors:R. El Jurdi, G. Varoquax, O. Colliot

View PDF

Abstract:Medical segmentation models are evaluated empirically. As such an evaluation is based on a limited set of example images, it is unavoidably noisy. Beyond a mean performance measure, reporting confidence intervals is thus crucial. However, this is rarely done in medical image segmentation. The width of the confidence interval depends on the test set size and on the spread of the performance measure (its standard-deviation across of the test set). For classification, many test images are needed to avoid wide confidence intervals. Segmentation, however, has not been studied, and it differs by the amount of information brought by a given test image. In this paper, we study the typical confidence intervals in medical image segmentation. We carry experiments on 3D image segmentation using the standard nnU-net framework, two datasets from the Medical Decathlon challenge and two performance measures: the Dice accuracy and the Hausdorff distance. We show that the parametric confidence intervals are reasonable approximations of the bootstrap estimates for varying test set sizes and spread of the performance metric. Importantly, we show that the test size needed to achieve a given precision is often much lower than for classification tasks. Typically, a 1% wide confidence interval requires about 100-200 test samples when the spread is low (standard-deviation around 3%). More difficult segmentation tasks may lead to higher spreads and require over 1000 samples.

Comments:	10 pages
Subjects:	Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2307.10926 [eess.IV]
	(or arXiv:2307.10926v1 [eess.IV] for this version)
	https://doi.org/10.48550/arXiv.2307.10926

Submission history

From: Rosana El Jurdi [view email]
[v1] Thu, 20 Jul 2023 14:52:45 UTC (992 KB)
[v2] Fri, 21 Jul 2023 09:47:01 UTC (992 KB)
[v3] Sun, 23 Mar 2025 18:22:12 UTC (357 KB)

Electrical Engineering and Systems Science > Image and Video Processing

Title:Confidence intervals for performance estimates in 3D medical image segmentation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Image and Video Processing

Title:Confidence intervals for performance estimates in 3D medical image segmentation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators