A Consequentialist Critique of Binary Classification Evaluation Practices

Flores, Gerardo; Schiff, Abigail; Smith, Alyssa H.; Fukuyama, Julia A; Wilson, Ashia C.

Computer Science > Machine Learning

arXiv:2504.04528 (cs)

[Submitted on 6 Apr 2025]

Title:A Consequentialist Critique of Binary Classification Evaluation Practices

Authors:Gerardo Flores, Abigail Schiff, Alyssa H. Smith, Julia A Fukuyama, Ashia C. Wilson

View PDF HTML (experimental)

Abstract:ML-supported decisions, such as ordering tests or determining preventive custody, often involve binary classification based on probabilistic forecasts. Evaluation frameworks for such forecasts typically consider whether to prioritize independent-decision metrics (e.g., Accuracy) or top-K metrics (e.g., Precision@K), and whether to focus on fixed thresholds or threshold-agnostic measures like AUC-ROC. We highlight that a consequentialist perspective, long advocated by decision theorists, should naturally favor evaluations that support independent decisions using a mixture of thresholds given their prevalence, such as Brier scores and Log loss. However, our empirical analysis reveals a strong preference for top-K metrics or fixed thresholds in evaluations at major conferences like ICML, FAccT, and CHIL. To address this gap, we use this decision-theoretic framework to map evaluation metrics to their optimal use cases, along with a Python package, briertools, to promote the broader adoption of Brier scores. In doing so, we also uncover new theoretical connections, including a reconciliation between the Brier Score and Decision Curve Analysis, which clarifies and responds to a longstanding critique by (Assel, et al. 2017) regarding the clinical utility of proper scoring rules.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Methodology (stat.ME); Machine Learning (stat.ML)
Cite as:	arXiv:2504.04528 [cs.LG]
	(or arXiv:2504.04528v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2504.04528

Submission history

From: Gerardo Flores [view email]
[v1] Sun, 6 Apr 2025 15:58:01 UTC (271 KB)

Computer Science > Machine Learning

Title:A Consequentialist Critique of Binary Classification Evaluation Practices

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A Consequentialist Critique of Binary Classification Evaluation Practices

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators