Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts

Li, Jingxuan; Yang, Yuning; Yang, Shengqi; Zhang, Linfan; Wu, Ying Nian

Computer Science > Computation and Language

arXiv:2411.11479 (cs)

[Submitted on 18 Nov 2024 (v1), last revised 19 Dec 2024 (this version, v2)]

Title:Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts

Authors:Jingxuan Li, Yuning Yang, Shengqi Yang, Linfan Zhang, Ying Nian Wu

View PDF HTML (experimental)

Abstract:The recent progress in Vision-Language Models (VLMs) has broadened the scope of multimodal applications. However, evaluations often remain limited to functional tasks, neglecting abstract dimensions such as personality traits and human values. To address this gap, we introduce Value-Spectrum, a novel Visual Question Answering (VQA) benchmark aimed at assessing VLMs based on Schwartz's value dimensions that capture core values guiding people's preferences and actions. We designed a VLM agent pipeline to simulate video browsing and constructed a vector database comprising over 50,000 short videos from TikTok, YouTube Shorts, and Instagram Reels. These videos span multiple months and cover diverse topics, including family, health, hobbies, society, technology, etc. Benchmarking on Value-Spectrum highlights notable variations in how VLMs handle value-oriented content. Beyond identifying VLMs' intrinsic preferences, we also explored the ability of VLM agents to adopt specific personas when explicitly prompted, revealing insights into the adaptability of the model in role-playing scenarios. These findings highlight the potential of Value-Spectrum as a comprehensive evaluation set for tracking VLM alignments in value-based tasks and abilities to simulate diverse personas.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2411.11479 [cs.CL]
	(or arXiv:2411.11479v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2411.11479

Submission history

From: Jingxuan Li [view email]
[v1] Mon, 18 Nov 2024 11:31:10 UTC (11,228 KB)
[v2] Thu, 19 Dec 2024 21:14:07 UTC (44,094 KB)

Computer Science > Computation and Language

Title:Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators