Comparing effectiveness of regularization methods on text classification: Simple and complex model in data shortage situation

Lee, Jongga; Yim, Jaeseung; Park, Seohee; Lim, Changwon

Computer Science > Computation and Language

arXiv:2403.00825 (cs)

[Submitted on 27 Feb 2024]

Title:Comparing effectiveness of regularization methods on text classification: Simple and complex model in data shortage situation

Authors:Jongga Lee, Jaeseung Yim, Seohee Park, Changwon Lim

View PDF HTML (experimental)

Abstract:Text classification is the task of assigning a document to a predefined class. However, it is expensive to acquire enough labeled documents or to label them. In this paper, we study the regularization methods' effects on various classification models when only a few labeled data are available. We compare a simple word embedding-based model, which is simple but effective, with complex models (CNN and BiLSTM). In supervised learning, adversarial training can further regularize the model. When an unlabeled dataset is available, we can regularize the model using semi-supervised learning methods such as the Pi model and virtual adversarial training. We evaluate the regularization effects on four text classification datasets (AG news, DBpedia, Yahoo! Answers, Yelp Polarity), using only 0.1% to 0.5% of the original labeled training documents. The simple model performs relatively well in fully supervised learning, but with the help of adversarial training and semi-supervised learning, both simple and complex models can be regularized, showing better results for complex models. Although the simple model is robust to overfitting, a complex model with well-designed prior beliefs can be also robust to overfitting.

Comments:	13 pages, 2 figures
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2403.00825 [cs.CL]
	(or arXiv:2403.00825v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2403.00825

Submission history

From: Changwon Lim [view email]
[v1] Tue, 27 Feb 2024 07:26:16 UTC (58 KB)

Computer Science > Computation and Language

Title:Comparing effectiveness of regularization methods on text classification: Simple and complex model in data shortage situation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Comparing effectiveness of regularization methods on text classification: Simple and complex model in data shortage situation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators