Rethink the Effectiveness of Text Data Augmentation: An Empirical Analysis

Shi, Zhengxiang; Lipani, Aldo

Computer Science > Computation and Language

arXiv:2306.07664 (cs)

[Submitted on 13 Jun 2023]

Title:Rethink the Effectiveness of Text Data Augmentation: An Empirical Analysis

Authors:Zhengxiang Shi, Aldo Lipani

View PDF

Abstract:In recent years, language models (LMs) have made remarkable progress in advancing the field of natural language processing (NLP). However, the impact of data augmentation (DA) techniques on the fine-tuning (FT) performance of these LMs has been a topic of ongoing debate. In this study, we evaluate the effectiveness of three different FT methods in conjugation with back-translation across an array of 7 diverse NLP tasks, including classification and regression types, covering single-sentence and sentence-pair tasks. Contrary to prior assumptions that DA does not contribute to the enhancement of LMs' FT performance, our findings reveal that continued pre-training on augmented data can effectively improve the FT performance of the downstream tasks. In the most favourable case, continued pre-training improves the performance of FT by more than 10% in the few-shot learning setting. Our finding highlights the potential of DA as a powerful tool for bolstering LMs' performance.

Comments:	Accepted at ESANN 2023
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2306.07664 [cs.CL]
	(or arXiv:2306.07664v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2306.07664

Submission history

From: Zhengxiang Shi [view email]
[v1] Tue, 13 Jun 2023 10:14:58 UTC (181 KB)

Computer Science > Computation and Language

Title:Rethink the Effectiveness of Text Data Augmentation: An Empirical Analysis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Rethink the Effectiveness of Text Data Augmentation: An Empirical Analysis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators