Efficient Annotator Reliability Assessment with EffiARA

Cook, Owen; Vasilakes, Jake; Roberts, Ian; Song, Xingyi

Computer Science > Computation and Language

arXiv:2504.00589 (cs)

[Submitted on 1 Apr 2025 (v1), last revised 3 Apr 2025 (this version, v2)]

Title:Efficient Annotator Reliability Assessment with EffiARA

Authors:Owen Cook, Jake Vasilakes, Ian Roberts, Xingyi Song

View PDF HTML (experimental)

Abstract:Data annotation is an essential component of the machine learning pipeline; it is also a costly and time-consuming process. With the introduction of transformer-based models, annotation at the document level is increasingly popular; however, there is no standard framework for structuring such tasks. The EffiARA annotation framework is, to our knowledge, the first project to support the whole annotation pipeline, from understanding the resources required for an annotation task to compiling the annotated dataset and gaining insights into the reliability of individual annotators as well as the dataset as a whole. The framework's efficacy is supported by two previous studies: one improving classification performance through annotator-reliability-based soft label aggregation and sample weighting, and the other increasing the overall agreement among annotators through removing identifying and replacing an unreliable annotator. This work introduces the EffiARA Python package and its accompanying webtool, which provides an accessible graphical user interface for the system. We open-source the EffiARA Python package at this https URL and the webtool is publicly accessible at this https URL.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2504.00589 [cs.CL]
	(or arXiv:2504.00589v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2504.00589

Submission history

From: Owen Cook [view email]
[v1] Tue, 1 Apr 2025 09:48:09 UTC (679 KB)
[v2] Thu, 3 Apr 2025 22:24:47 UTC (679 KB)

Computer Science > Computation and Language

Title:Efficient Annotator Reliability Assessment with EffiARA

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Efficient Annotator Reliability Assessment with EffiARA

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators