A Tutorial on the Pretrain-Finetune Paradigm for Natural Language Processing

Wang, Yu

Computer Science > Computation and Language

arXiv:2403.02504v1 (cs)

[Submitted on 4 Mar 2024 (this version), latest version 2 Aug 2024 (v3)]

Title:A Tutorial on the Pretrain-Finetune Paradigm for Natural Language Processing

Authors:Yu Wang

View PDF HTML (experimental)

Abstract:The pretrain-finetune paradigm represents a transformative approach in natural language processing (NLP). This paradigm distinguishes itself through the use of large pretrained language models, demonstrating remarkable efficiency in finetuning tasks, even with limited training data. This efficiency is especially beneficial for research in social sciences, where the number of annotated samples is often quite limited. Our tutorial offers a comprehensive introduction to the pretrain-finetune paradigm. We first delve into the fundamental concepts of pretraining and finetuning, followed by practical exercises using real-world applications. We demonstrate the application of the paradigm across various tasks, including multi-class classification and regression. Emphasizing its efficacy and user-friendliness, the tutorial aims to encourage broader adoption of this paradigm. To this end, we have provided open access to all our code and datasets. The tutorial is particularly valuable for quantitative researchers in psychology, offering them an insightful guide into this innovative approach.

Comments:	16 pages, 6 figures, 2 tables
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2403.02504 [cs.CL]
	(or arXiv:2403.02504v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2403.02504

Submission history

From: Yu Wang [view email]
[v1] Mon, 4 Mar 2024 21:51:11 UTC (1,091 KB)
[v2] Fri, 19 Jul 2024 07:47:18 UTC (1,341 KB)
[v3] Fri, 2 Aug 2024 04:44:29 UTC (1,343 KB)

Computer Science > Computation and Language

Title:A Tutorial on the Pretrain-Finetune Paradigm for Natural Language Processing

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:A Tutorial on the Pretrain-Finetune Paradigm for Natural Language Processing

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators