DynClean: Training Dynamics-based Label Cleaning for Distantly-Supervised Named Entity Recognition

Zhang, Qi; Pan, Huitong; Chen, Zhijia; Latecki, Longin Jan; Caragea, Cornelia; Dragut, Eduard

Computer Science > Computation and Language

arXiv:2504.04616 (cs)

[Submitted on 6 Apr 2025]

Title:DynClean: Training Dynamics-based Label Cleaning for Distantly-Supervised Named Entity Recognition

Authors:Qi Zhang, Huitong Pan, Zhijia Chen, Longin Jan Latecki, Cornelia Caragea, Eduard Dragut

View PDF HTML (experimental)

Abstract:Distantly Supervised Named Entity Recognition (DS-NER) has attracted attention due to its scalability and ability to automatically generate labeled data. However, distant annotation introduces many mislabeled instances, limiting its performance. Most of the existing work attempt to solve this problem by developing intricate models to learn from the noisy labels. An alternative approach is to attempt to clean the labeled data, thus increasing the quality of distant labels. This approach has received little attention for NER. In this paper, we propose a training dynamics-based label cleaning approach, which leverages the behavior of a model as training progresses to characterize the distantly annotated samples. We also introduce an automatic threshold estimation strategy to locate the errors in distant labels. Extensive experimental results demonstrate that: (1) models trained on our cleaned DS-NER datasets, which were refined by directly removing identified erroneous annotations, achieve significant improvements in F1-score, ranging from 3.18% to 8.95%; and (2) our method outperforms numerous advanced DS-NER approaches across four datasets.

Comments:	Accepted to NAACL2025-Findings
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2504.04616 [cs.CL]
	(or arXiv:2504.04616v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2504.04616

Submission history

From: Qi Zhang [view email]
[v1] Sun, 6 Apr 2025 20:54:42 UTC (963 KB)

Computer Science > Computation and Language

Title:DynClean: Training Dynamics-based Label Cleaning for Distantly-Supervised Named Entity Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:DynClean: Training Dynamics-based Label Cleaning for Distantly-Supervised Named Entity Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators