ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection

Sun, Zhihao; Jiang, Haoran; Chen, Haoran; Cao, Yixin; Qiu, Xipeng; Wu, Zuxuan; Jiang, Yu-Gang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2411.19466 (cs)

[Submitted on 29 Nov 2024]

Title:ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection

Authors:Zhihao Sun, Haoran Jiang, Haoran Chen, Yixin Cao, Xipeng Qiu, Zuxuan Wu, Yu-Gang Jiang

View PDF HTML (experimental)

Abstract:Multimodal large language models have unlocked new possibilities for various multimodal tasks. However, their potential in image manipulation detection remains unexplored. When directly applied to the IMD task, M-LLMs often produce reasoning texts that suffer from hallucinations and overthinking. To address this, in this work, we propose ForgerySleuth, which leverages M-LLMs to perform comprehensive clue fusion and generate segmentation outputs indicating specific regions that are tampered with. Moreover, we construct the ForgeryAnalysis dataset through the Chain-of-Clues prompt, which includes analysis and reasoning text to upgrade the image manipulation detection task. A data engine is also introduced to build a larger-scale dataset for the pre-training phase. Our extensive experiments demonstrate the effectiveness of ForgeryAnalysis and show that ForgerySleuth significantly outperforms existing methods in generalization, robustness, and explainability.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2411.19466 [cs.CV]
	(or arXiv:2411.19466v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2411.19466

Submission history

From: Zhihao Sun [view email]
[v1] Fri, 29 Nov 2024 04:35:18 UTC (5,601 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators