Omni-SILA: Towards Omni-scene Driven Visual Sentiment Identifying, Locating and Attributing in Videos

Luo, Jiamin; Wang, Jingjing; Ma, Junxiao; Jin, Yujie; Li, Shoushan; Zhou, Guodong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2503.00049 (cs)

[Submitted on 26 Feb 2025]

Title:Omni-SILA: Towards Omni-scene Driven Visual Sentiment Identifying, Locating and Attributing in Videos

Authors:Jiamin Luo, Jingjing Wang, Junxiao Ma, Yujie Jin, Shoushan Li, Guodong Zhou

View PDF HTML (experimental)

Abstract:Prior studies on Visual Sentiment Understanding (VSU) primarily rely on the explicit scene information (e.g., facial expression) to judge visual sentiments, which largely ignore implicit scene information (e.g., human action, objection relation and visual background), while such information is critical for precisely discovering visual sentiments. Motivated by this, this paper proposes a new Omni-scene driven visual Sentiment Identifying, Locating and Attributing in videos (Omni-SILA) task, aiming to interactively and precisely identify, locate and attribute visual sentiments through both explicit and implicit scene information. Furthermore, this paper believes that this Omni-SILA task faces two key challenges: modeling scene and highlighting implicit scene beyond explicit. To this end, this paper proposes an Implicit-enhanced Causal MoE (ICM) approach for addressing the Omni-SILA task. Specifically, a Scene-Balanced MoE (SBM) and an Implicit-Enhanced Causal (IEC) blocks are tailored to model scene information and highlight the implicit scene information beyond explicit, respectively. Extensive experimental results on our constructed explicit and implicit Omni-SILA datasets demonstrate the great advantage of the proposed ICM approach over advanced Video-LLMs.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2503.00049 [cs.CV]
	(or arXiv:2503.00049v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2503.00049

Submission history

From: Jiamin Luo [view email]
[v1] Wed, 26 Feb 2025 12:05:07 UTC (680 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Omni-SILA: Towards Omni-scene Driven Visual Sentiment Identifying, Locating and Attributing in Videos

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Omni-SILA: Towards Omni-scene Driven Visual Sentiment Identifying, Locating and Attributing in Videos

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators