EnTDA: Entity-to-Text based Data Augmentation Approach for Named Entity Recognition Tasks

Hu, Xuming; Jiang, Yong; Liu, Aiwei; Huang, Zhongqiang; Xie, Pengjun; Huang, Fei; Wen, Lijie; Yu, Philip S.

Computer Science > Computation and Language

arXiv:2210.10343v1 (cs)

[Submitted on 19 Oct 2022 (this version), latest version 26 May 2023 (v2)]

Title:EnTDA: Entity-to-Text based Data Augmentation Approach for Named Entity Recognition Tasks

Authors:Xuming Hu, Yong Jiang, Aiwei Liu, Zhongqiang Huang, Pengjun Xie, Fei Huang, Lijie Wen, Philip S. Yu

View PDF

Abstract:Data augmentation techniques have been used to improve the generalization capability of models in the named entity recognition (NER) tasks. Existing augmentation methods either manipulate the words in the original text that require hand-crafted in-domain knowledge, or leverage generative models which solicit dependency order among entities. To alleviate the excessive reliance on the dependency order among entities in existing augmentation paradigms, we develop an entity-to-text instead of text-to-entity based data augmentation method named: EnTDA to decouple the dependencies between entities by adding, deleting, replacing and swapping entities, and adopt these augmented data to bootstrap the generalization ability of the NER model. Furthermore, we introduce a diversity beam search to increase the diversity of the augmented data. Experiments on thirteen NER datasets across three tasks (flat NER, nested NER, and discontinuous NER) and two settings (full data NER and low resource NER) show that EnTDA could consistently outperform the baselines.

Comments:	13 pages, 4 figures, 9 tables
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2210.10343 [cs.CL]
	(or arXiv:2210.10343v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2210.10343

Submission history

From: Xuming Hu [view email]
[v1] Wed, 19 Oct 2022 07:24:40 UTC (1,461 KB)
[v2] Fri, 26 May 2023 16:14:43 UTC (1,695 KB)

Computer Science > Computation and Language

Title:EnTDA: Entity-to-Text based Data Augmentation Approach for Named Entity Recognition Tasks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:EnTDA: Entity-to-Text based Data Augmentation Approach for Named Entity Recognition Tasks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators