Beyond Vision: How Large Language Models Interpret Facial Expressions from Valence-Arousal Values

Mehra, Vaibhav; Laban, Guy; Gunes, Hatice

Computer Science > Computer Vision and Pattern Recognition

arXiv:2502.06875 (cs)

[Submitted on 8 Feb 2025]

Title:Beyond Vision: How Large Language Models Interpret Facial Expressions from Valence-Arousal Values

Authors:Vaibhav Mehra, Guy Laban, Hatice Gunes

View PDF HTML (experimental)

Abstract:Large Language Models primarily operate through text-based inputs and outputs, yet human emotion is communicated through both verbal and non-verbal cues, including facial expressions. While Vision-Language Models analyze facial expressions from images, they are resource-intensive and may depend more on linguistic priors than visual understanding. To address this, this study investigates whether LLMs can infer affective meaning from dimensions of facial expressions-Valence and Arousal values, structured numerical representations, rather than using raw visual input. VA values were extracted using Facechannel from images of facial expressions and provided to LLMs in two tasks: (1) categorizing facial expressions into basic (on the IIMI dataset) and complex emotions (on the Emotic dataset) and (2) generating semantic descriptions of facial expressions (on the Emotic dataset). Results from the categorization task indicate that LLMs struggle to classify VA values into discrete emotion categories, particularly for emotions beyond basic polarities (e.g., happiness, sadness). However, in the semantic description task, LLMs produced textual descriptions that align closely with human-generated interpretations, demonstrating a stronger capacity for free text affective inference of facial expressions.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2502.06875 [cs.CV]
	(or arXiv:2502.06875v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2502.06875

Submission history

From: Guy Laban [view email]
[v1] Sat, 8 Feb 2025 09:54:03 UTC (7,315 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Beyond Vision: How Large Language Models Interpret Facial Expressions from Valence-Arousal Values

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Beyond Vision: How Large Language Models Interpret Facial Expressions from Valence-Arousal Values

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators