CofiPara: A Coarse-to-fine Paradigm for Multimodal Sarcasm Target Identification with Large Multimodal Models

Lin, Hongzhan; Chen, Zixin; Luo, Ziyang; Cheng, Mingfei; Ma, Jing; Chen, Guang

Computer Science > Computation and Language

arXiv:2405.00390v1 (cs)

[Submitted on 1 May 2024 (this version), latest version 20 May 2024 (v2)]

Title:CofiPara: A Coarse-to-fine Paradigm for Multimodal Sarcasm Target Identification with Large Multimodal Models

Authors:Hongzhan Lin, Zixin Chen, Ziyang Luo, Mingfei Cheng, Jing Ma, Guang Chen

View PDF

Abstract:Social media abounds with multimodal sarcasm, and identifying sarcasm targets is particularly challenging due to the implicit incongruity not directly evident in the text and image modalities. Current methods for Multimodal Sarcasm Target Identification (MSTI) predominantly focus on superficial indicators in an end-to-end manner, overlooking the nuanced understanding of multimodal sarcasm conveyed through both the text and image. This paper proposes a versatile MSTI framework with a coarse-to-fine paradigm, by augmenting sarcasm explainability with reasoning and pre-training knowledge. Inspired by the powerful capacity of Large Multimodal Models (LMMs) on multimodal reasoning, we first engage LMMs to generate competing rationales for coarser-grained pre-training of a small language model on multimodal sarcasm detection. We then propose fine-tuning the model for finer-grained sarcasm target identification. Our framework is thus empowered to adeptly unveil the intricate targets within multimodal sarcasm and mitigate the negative impact posed by potential noise inherently in LMMs. Experimental results demonstrate that our model far outperforms state-of-the-art MSTI methods, and markedly exhibits explainability in deciphering sarcasm as well.

Comments:	25 pages, 7 figures, and 18 tables
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2405.00390 [cs.CL]
	(or arXiv:2405.00390v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2405.00390

Submission history

From: Lin Hongzhan [view email]
[v1] Wed, 1 May 2024 08:44:44 UTC (17,709 KB)
[v2] Mon, 20 May 2024 07:25:27 UTC (17,708 KB)

Computer Science > Computation and Language

Title:CofiPara: A Coarse-to-fine Paradigm for Multimodal Sarcasm Target Identification with Large Multimodal Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:CofiPara: A Coarse-to-fine Paradigm for Multimodal Sarcasm Target Identification with Large Multimodal Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators