Dynamic Multimodal Sentiment Analysis: Leveraging Cross-Modal Attention for Enabled Classification

Lee, Hui; Suniljit, Singh; Ong, Yong Siang

Computer Science > Computation and Language

arXiv:2501.08085 (cs)

[Submitted on 14 Jan 2025]

Title:Dynamic Multimodal Sentiment Analysis: Leveraging Cross-Modal Attention for Enabled Classification

Authors:Hui Lee, Singh Suniljit, Yong Siang Ong

View PDF HTML (experimental)

Abstract:This paper explores the development of a multimodal sentiment analysis model that integrates text, audio, and visual data to enhance sentiment classification. The goal is to improve emotion detection by capturing the complex interactions between these modalities, thereby enabling more accurate and nuanced sentiment interpretation. The study evaluates three feature fusion strategies -- late stage fusion, early stage fusion, and multi-headed attention -- within a transformer-based architecture. Experiments were conducted using the CMU-MOSEI dataset, which includes synchronized text, audio, and visual inputs labeled with sentiment scores. Results show that early stage fusion significantly outperforms late stage fusion, achieving an accuracy of 71.87\%, while the multi-headed attention approach offers marginal improvement, reaching 72.39\%. The findings suggest that integrating modalities early in the process enhances sentiment classification, while attention mechanisms may have limited impact within the current framework. Future work will focus on refining feature fusion techniques, incorporating temporal data, and exploring dynamic feature weighting to further improve model performance.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2501.08085 [cs.CL]
	(or arXiv:2501.08085v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2501.08085

Submission history

From: Hui Lee [view email]
[v1] Tue, 14 Jan 2025 12:54:19 UTC (205 KB)

Computer Science > Computation and Language

Title:Dynamic Multimodal Sentiment Analysis: Leveraging Cross-Modal Attention for Enabled Classification

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Dynamic Multimodal Sentiment Analysis: Leveraging Cross-Modal Attention for Enabled Classification

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators