GePBench: Evaluating Fundamental Geometric Perception for Multimodal Large Language Models

Xing, Shangyu; Xiang, Changhao; Han, Yuteng; Yue, Yifan; Wu, Zhen; Liu, Xinyu; Wu, Zhangtai; Zhao, Fei; Dai, Xinyu

Computer Science > Computation and Language

arXiv:2412.21036 (cs)

[Submitted on 30 Dec 2024]

Title:GePBench: Evaluating Fundamental Geometric Perception for Multimodal Large Language Models

Authors:Shangyu Xing, Changhao Xiang, Yuteng Han, Yifan Yue, Zhen Wu, Xinyu Liu, Zhangtai Wu, Fei Zhao, Xinyu Dai

View PDF HTML (experimental)

Abstract:Multimodal large language models (MLLMs) have achieved significant advancements in integrating visual and linguistic understanding. While existing benchmarks evaluate these models in context-rich, real-life scenarios, they often overlook fundamental perceptual skills essential for environments deviating from everyday realism. In particular, geometric perception, the ability to interpret spatial relationships and abstract visual patterns, remains underexplored. To address this limitation, we introduce GePBench, a novel benchmark designed to assess the geometric perception capabilities of MLLMs. Results from extensive evaluations reveal that current state-of-the-art MLLMs exhibit significant deficiencies in such tasks. Additionally, we demonstrate that models trained with data sourced from GePBench show notable improvements on a wide range of downstream tasks, underscoring the importance of geometric perception as a foundation for advanced multimodal applications. Our code and datasets will be publicly available.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2412.21036 [cs.CL]
	(or arXiv:2412.21036v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.21036

Submission history

From: Shangyu Xing [view email]
[v1] Mon, 30 Dec 2024 16:01:43 UTC (507 KB)

Computer Science > Computation and Language

Title:GePBench: Evaluating Fundamental Geometric Perception for Multimodal Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:GePBench: Evaluating Fundamental Geometric Perception for Multimodal Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators