Can Large Language Models Capture Video Game Engagement?

Melhart, David; Barthet, Matthew; Yannakakis, Georgios N.

Computer Science > Computer Vision and Pattern Recognition

arXiv:2502.04379 (cs)

[Submitted on 5 Feb 2025]

Title:Can Large Language Models Capture Video Game Engagement?

Authors:David Melhart, Matthew Barthet, Georgios N. Yannakakis

View PDF HTML (experimental)

Abstract:Can out-of-the-box pretrained Large Language Models (LLMs) detect human affect successfully when observing a video? To address this question, for the first time, we evaluate comprehensively the capacity of popular LLMs to annotate and successfully predict continuous affect annotations of videos when prompted by a sequence of text and video frames in a multimodal fashion. Particularly in this paper, we test LLMs' ability to correctly label changes of in-game engagement in 80 minutes of annotated videogame footage from 20 first-person shooter games of the GameVibe corpus. We run over 2,400 experiments to investigate the impact of LLM architecture, model size, input modality, prompting strategy, and ground truth processing method on engagement prediction. Our findings suggest that while LLMs rightfully claim human-like performance across multiple domains, they generally fall behind capturing continuous experience annotations provided by humans. We examine some of the underlying causes for the relatively poor overall performance, highlight the cases where LLMs exceed expectations, and draw a roadmap for the further exploration of automated emotion labelling via LLMs.

Comments:	This work has been submitted to the IEEE for possible publication
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Cite as:	arXiv:2502.04379 [cs.CV]
	(or arXiv:2502.04379v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2502.04379

Submission history

From: David Melhart [view email]
[v1] Wed, 5 Feb 2025 17:14:47 UTC (16,198 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Can Large Language Models Capture Video Game Engagement?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Can Large Language Models Capture Video Game Engagement?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators