Entity-to-Text based Data Augmentation for various Named Entity Recognition Tasks

Hu, Xuming; Jiang, Yong; Liu, Aiwei; Huang, Zhongqiang; Xie, Pengjun; Huang, Fei; Wen, Lijie; Yu, Philip S.

Computer Science > Computation and Language

arXiv:2210.10343 (cs)

[Submitted on 19 Oct 2022 (v1), last revised 26 May 2023 (this version, v2)]

Title:Entity-to-Text based Data Augmentation for various Named Entity Recognition Tasks

Authors:Xuming Hu, Yong Jiang, Aiwei Liu, Zhongqiang Huang, Pengjun Xie, Fei Huang, Lijie Wen, Philip S. Yu

View PDF

Abstract:Data augmentation techniques have been used to alleviate the problem of scarce labeled data in various NER tasks (flat, nested, and discontinuous NER tasks). Existing augmentation techniques either manipulate the words in the original text that break the semantic coherence of the text, or exploit generative models that ignore preserving entities in the original text, which impedes the use of augmentation techniques on nested and discontinuous NER tasks. In this work, we propose a novel Entity-to-Text based data augmentation technique named EnTDA to add, delete, replace or swap entities in the entity list of the original texts, and adopt these augmented entity lists to generate semantically coherent and entity preserving texts for various NER tasks. Furthermore, we introduce a diversity beam search to increase the diversity during the text generation process. Experiments on thirteen NER datasets across three tasks (flat, nested, and discontinuous NER tasks) and two settings (full data and low resource settings) show that EnTDA could bring more performance improvements compared to the baseline augmentation techniques.

Comments:	Accepted to ACL 2023 (Findings), Long Paper, 14 pages
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2210.10343 [cs.CL]
	(or arXiv:2210.10343v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2210.10343

Submission history

From: Xuming Hu [view email]
[v1] Wed, 19 Oct 2022 07:24:40 UTC (1,461 KB)
[v2] Fri, 26 May 2023 16:14:43 UTC (1,695 KB)

Computer Science > Computation and Language

Title:Entity-to-Text based Data Augmentation for various Named Entity Recognition Tasks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Entity-to-Text based Data Augmentation for various Named Entity Recognition Tasks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators