The Accuracy Cost of Weakness: A Theoretical Analysis of Fixed-Segment Weak Labeling for Events in Time

Martinsson, John; Mogren, Olof; Virtanen, Tuomas; Sandsten, Maria

Computer Science > Machine Learning

arXiv:2502.09363 (cs)

[Submitted on 13 Feb 2025]

Title:The Accuracy Cost of Weakness: A Theoretical Analysis of Fixed-Segment Weak Labeling for Events in Time

Authors:John Martinsson, Olof Mogren, Tuomas Virtanen, Maria Sandsten

View PDF HTML (experimental)

Abstract:Accurate labels are critical for deriving robust machine learning models. Labels are used to train supervised learning models and to evaluate most machine learning paradigms. In this paper, we model the accuracy and cost of a common weak labeling process where annotators assign presence or absence labels to fixed-length data segments for a given event class. The annotator labels a segment as "present" if it sufficiently covers an event from that class, e.g., a birdsong sound event in audio data. We analyze how the segment length affects the label accuracy and the required number of annotations, and compare this fixed-length labeling approach with an oracle method that uses the true event activations to construct the segments. Furthermore, we quantify the gap between these methods and verify that in most realistic scenarios the oracle method is better than the fixed-length labeling method in both accuracy and cost. Our findings provide a theoretical justification for adaptive weak labeling strategies that mimic the oracle process, and a foundation for optimizing weak labeling processes in sequence labeling tasks.

Comments:	Submitted to TMLR
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2502.09363 [cs.LG]
	(or arXiv:2502.09363v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2502.09363

Submission history

From: John Martinsson [view email]
[v1] Thu, 13 Feb 2025 14:31:49 UTC (1,396 KB)

Computer Science > Machine Learning

Title:The Accuracy Cost of Weakness: A Theoretical Analysis of Fixed-Segment Weak Labeling for Events in Time

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:The Accuracy Cost of Weakness: A Theoretical Analysis of Fixed-Segment Weak Labeling for Events in Time

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators