Performance of Data Augmentation Methods for Brazilian Portuguese Text Classification

Amadeus, Marcellus; Branco, Paulo

Computer Science > Computation and Language

arXiv:2304.02785 (cs)

[Submitted on 5 Apr 2023]

Title:Performance of Data Augmentation Methods for Brazilian Portuguese Text Classification

Authors:Marcellus Amadeus, Paulo Branco

View PDF

Abstract:Improving machine learning performance while increasing model generalization has been a constantly pursued goal by AI researchers. Data augmentation techniques are often used towards achieving this target, and most of its evaluation is made using English corpora. In this work, we took advantage of different existing data augmentation methods to analyze their performances applied to text classification problems using Brazilian Portuguese corpora. As a result, our analysis shows some putative improvements in using some of these techniques; however, it also suggests further exploitation of language bias and non-English text data scarcity.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2304.02785 [cs.CL]
	(or arXiv:2304.02785v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2304.02785

Submission history

From: Marcellus Amadeus [view email]
[v1] Wed, 5 Apr 2023 23:13:37 UTC (875 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2023-04

Change to browse by:

References & Citations

export BibTeX citation

Computer Science > Computation and Language

Title:Performance of Data Augmentation Methods for Brazilian Portuguese Text Classification

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Performance of Data Augmentation Methods for Brazilian Portuguese Text Classification

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators