The Impact of Example Selection in Few-Shot Prompting on Automated Essay Scoring Using GPT Models

Yoshida, Lui

doi:10.1007/978-3-031-64315-6_5

Computer Science > Computation and Language

arXiv:2411.18924 (cs)

[Submitted on 28 Nov 2024]

Title:The Impact of Example Selection in Few-Shot Prompting on Automated Essay Scoring Using GPT Models

Authors:Lui Yoshida

View PDF

Abstract:This study investigates the impact of example selection on the performance of au-tomated essay scoring (AES) using few-shot prompting with GPT models. We evaluate the effects of the choice and order of examples in few-shot prompting on several versions of GPT-3.5 and GPT-4 models. Our experiments involve 119 prompts with different examples, and we calculate the quadratic weighted kappa (QWK) to measure the agreement between GPT and human rater scores. Regres-sion analysis is used to quantitatively assess biases introduced by example selec-tion. The results show that the impact of example selection on QWK varies across models, with GPT-3.5 being more influenced by examples than GPT-4. We also find evidence of majority label bias, which is a tendency to favor the majority la-bel among the examples, and recency bias, which is a tendency to favor the label of the most recent example, in GPT-generated essay scores and QWK, with these biases being more pronounced in GPT-3.5. Notably, careful example selection enables GPT-3.5 models to outperform some GPT-4 models. However, among the GPT models, the June 2023 version of GPT-4, which is not the latest model, exhibits the highest stability and performance. Our findings provide insights into the importance of example selection in few-shot prompting for AES, especially in GPT-3.5 models, and highlight the need for individual performance evaluations of each model, even for minor versions.

Comments:	Accepted in AIED2024. This preprint has not undergone any post-submission improvements or corrections. The Version of Record of this contribution is published in Communications in Com-puter and Information Science, vol 2150, and is available online at this https URL
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2411.18924 [cs.CL]
	(or arXiv:2411.18924v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2411.18924
Related DOI:	https://doi.org/10.1007/978-3-031-64315-6_5

Submission history

From: Lui Yoshida [view email]
[v1] Thu, 28 Nov 2024 05:24:51 UTC (358 KB)

Computer Science > Computation and Language

Title:The Impact of Example Selection in Few-Shot Prompting on Automated Essay Scoring Using GPT Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:The Impact of Example Selection in Few-Shot Prompting on Automated Essay Scoring Using GPT Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators