Mitigating Hallucinations in Multimodal Spatial Relations through Constraint-Aware Prompting

Wu, Jiarui; Liu, Zhuo; He, Hangfeng

Computer Science > Computation and Language

arXiv:2502.08317 (cs)

[Submitted on 12 Feb 2025]

Title:Mitigating Hallucinations in Multimodal Spatial Relations through Constraint-Aware Prompting

Authors:Jiarui Wu, Zhuo Liu, Hangfeng He

View PDF HTML (experimental)

Abstract:Spatial relation hallucinations pose a persistent challenge in large vision-language models (LVLMs), leading to generate incorrect predictions about object positions and spatial configurations within an image. To address this issue, we propose a constraint-aware prompting framework designed to reduce spatial relation hallucinations. Specifically, we introduce two types of constraints: (1) bidirectional constraint, which ensures consistency in pairwise object relations, and (2) transitivity constraint, which enforces relational dependence across multiple objects. By incorporating these constraints, LVLMs can produce more spatially coherent and consistent outputs. We evaluate our method on three widely-used spatial relation datasets, demonstrating performance improvements over existing approaches. Additionally, a systematic analysis of various bidirectional relation analysis choices and transitivity reference selections highlights greater possibilities of our methods in incorporating constraints to mitigate spatial relation hallucinations.

Comments:	19 pages, accepted to NAACL Findings
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2502.08317 [cs.CL]
	(or arXiv:2502.08317v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.08317

Submission history

From: Jiarui Wu [view email]
[v1] Wed, 12 Feb 2025 11:32:19 UTC (6,285 KB)

Computer Science > Computation and Language

Title:Mitigating Hallucinations in Multimodal Spatial Relations through Constraint-Aware Prompting

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Mitigating Hallucinations in Multimodal Spatial Relations through Constraint-Aware Prompting

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators