Performance of the Pre-Trained Large Language Model GPT-4 on Automated Short Answer Grading

Kortemeyer, Gerd

Computer Science > Computation and Language

arXiv:2309.09338 (cs)

[Submitted on 17 Sep 2023]

Title:Performance of the Pre-Trained Large Language Model GPT-4 on Automated Short Answer Grading

Authors:Gerd Kortemeyer

View PDF

Abstract:Automated Short Answer Grading (ASAG) has been an active area of machine-learning research for over a decade. It promises to let educators grade and give feedback on free-form responses in large-enrollment courses in spite of limited availability of human graders. Over the years, carefully trained models have achieved increasingly higher levels of performance. More recently, pre-trained Large Language Models (LLMs) emerged as a commodity, and an intriguing question is how a general-purpose tool without additional training compares to specialized models. We studied the performance of GPT-4 on the standard benchmark 2-way and 3-way datasets SciEntsBank and Beetle, where in addition to the standard task of grading the alignment of the student answer with a reference answer, we also investigated withholding the reference answer. We found that overall, the performance of the pre-trained general-purpose GPT-4 LLM is comparable to hand-engineered models, but worse than pre-trained LLMs that had specialized training.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2309.09338 [cs.CL]
	(or arXiv:2309.09338v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2309.09338

Submission history

From: Gerd Kortemeyer [view email]
[v1] Sun, 17 Sep 2023 18:04:34 UTC (11 KB)

Computer Science > Computation and Language

Title:Performance of the Pre-Trained Large Language Model GPT-4 on Automated Short Answer Grading

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Performance of the Pre-Trained Large Language Model GPT-4 on Automated Short Answer Grading

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators