Know "No" Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP

Park, Junsung; Lee, Jungbeom; Song, Jongyoon; Yu, Sangwon; Jung, Dahuin; Yoon, Sungroh

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.10913 (cs)

[Submitted on 19 Jan 2025]

Title:Know "No" Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP

Authors:Junsung Park, Jungbeom Lee, Jongyoon Song, Sangwon Yu, Dahuin Jung, Sungroh Yoon

View PDF HTML (experimental)

Abstract:While CLIP has significantly advanced multimodal understanding by bridging vision and language, the inability to grasp negation - such as failing to differentiate concepts like "parking" from "no parking" - poses substantial challenges. By analyzing the data used in the public CLIP model's pre-training, we posit this limitation stems from a lack of negation-inclusive data. To address this, we introduce data generation pipelines that employ a large language model (LLM) and a multimodal LLM to produce negation-inclusive captions. Fine-tuning CLIP with data generated from our pipelines, we develop NegationCLIP, which enhances negation awareness while preserving the generality. Moreover, to enable a comprehensive evaluation of negation understanding, we propose NegRefCOCOg-a benchmark tailored to test VLMs' ability to interpret negation across diverse expressions and positions within a sentence. Experiments on various CLIP architectures validate the effectiveness of our data generation pipelines in enhancing CLIP's ability to perceive negation accurately. Additionally, NegationCLIP's enhanced negation awareness has practical applications across various multimodal tasks, demonstrated by performance gains in text-to-image generation and referring image segmentation.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Cite as:	arXiv:2501.10913 [cs.CV]
	(or arXiv:2501.10913v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2501.10913

Submission history

From: Junsung Park [view email]
[v1] Sun, 19 Jan 2025 01:17:05 UTC (48,119 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Know "No" Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Know "No" Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators