Standardness Fogs Meaning: A Position Regarding the Informed Usage of Standard Datasets

Cech, Tim; Wegen, Ole; Atzberger, Daniel; Richter, Rico; Scheibel, Willy; Döllner, Jürgen

Computer Science > Machine Learning

arXiv:2406.13552 (cs)

[Submitted on 19 Jun 2024]

Title:Standardness Fogs Meaning: A Position Regarding the Informed Usage of Standard Datasets

Authors:Tim Cech, Ole Wegen, Daniel Atzberger, Rico Richter, Willy Scheibel, Jürgen Döllner

View PDF HTML (experimental)

Abstract:Standard datasets are frequently used to train and evaluate Machine Learning models. However, the assumed standardness of these datasets leads to a lack of in-depth discussion on how their labels match the derived categories for the respective use case. In other words, the standardness of the datasets seems to fog coherency and applicability, thus impeding the trust in Machine Learning models. We propose to adopt Grounded Theory and Hypotheses Testing through Visualization as methods to evaluate the match between use case, derived categories, and labels of standard datasets. To showcase the approach, we apply it to the 20 Newsgroups dataset and the MNIST dataset. For the 20 Newsgroups dataset, we demonstrate that the labels are imprecise. Therefore, we argue that neither a Machine Learning model can learn a meaningful abstraction of derived categories nor one can draw conclusions from achieving high accuracy. For the MNIST dataset, we demonstrate how the labels can be confirmed to be defined well. We conclude that a concept of standardness of a dataset implies that there is a match between use case, derived categories, and class labels, as in the case of the MNIST dataset. We argue that this is necessary to learn a meaningful abstraction and, thus, improve trust in the Machine Learning model.

Subjects:	Machine Learning (cs.LG); Human-Computer Interaction (cs.HC)
Cite as:	arXiv:2406.13552 [cs.LG]
	(or arXiv:2406.13552v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.13552

Submission history

From: Willy Scheibel [view email]
[v1] Wed, 19 Jun 2024 13:39:05 UTC (7,460 KB)

Computer Science > Machine Learning

Title:Standardness Fogs Meaning: A Position Regarding the Informed Usage of Standard Datasets

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Standardness Fogs Meaning: A Position Regarding the Informed Usage of Standard Datasets

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators