TextSleuth: Towards Explainable Tampered Text Detection

Qu, Chenfan; Liu, Jian; Chen, Haoxing; Yu, Baihan; Liu, Jingjing; Wang, Weiqiang; Jin, Lianwen

Computer Science > Computer Vision and Pattern Recognition

arXiv:2412.14816 (cs)

[Submitted on 19 Dec 2024 (v1), last revised 15 Jan 2025 (this version, v3)]

Title:TextSleuth: Towards Explainable Tampered Text Detection

Authors:Chenfan Qu, Jian Liu, Haoxing Chen, Baihan Yu, Jingjing Liu, Weiqiang Wang, Lianwen Jin

View PDF HTML (experimental)

Abstract:Recently, tampered text detection has attracted increasing attention due to its essential role in information security. Although existing methods can detect the tampered text region, the interpretation of such detection remains unclear, making the prediction unreliable. To address this problem, we propose to explain the basis of tampered text detection with natural language via large multimodal models. To fill the data gap for this task, we propose a large-scale, comprehensive dataset, ETTD, which contains both pixel-level annotations for tampered text region and natural language annotations describing the anomaly of the tampered text. Multiple methods are employed to improve the quality of the proposed data. For example, elaborate queries are introduced to generate high-quality anomaly descriptions with GPT4o. A fused mask prompt is proposed to reduce confusion when querying GPT4o to generate anomaly descriptions. To automatically filter out low-quality annotations, we also propose to prompt GPT4o to recognize tampered texts before describing the anomaly, and to filter out the responses with low OCR accuracy. To further improve explainable tampered text detection, we propose a simple yet effective model called TextSleuth, which achieves improved fine-grained perception and cross-domain generalization by focusing on the suspected region, with a two-stage analysis paradigm and an auxiliary grounding prompt. Extensive experiments on both the ETTD dataset and the public dataset have verified the effectiveness of the proposed methods. In-depth analysis is also provided to inspire further research. Our dataset and code will be open-source.

Comments:	The first work for explainable tampered text detection
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2412.14816 [cs.CV]
	(or arXiv:2412.14816v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2412.14816

Submission history

From: Chenfan Qu [view email]
[v1] Thu, 19 Dec 2024 13:10:03 UTC (828 KB)
[v2] Sat, 21 Dec 2024 08:53:10 UTC (828 KB)
[v3] Wed, 15 Jan 2025 16:54:36 UTC (2,504 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:TextSleuth: Towards Explainable Tampered Text Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:TextSleuth: Towards Explainable Tampered Text Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators