DesignProbe: A Graphic Design Benchmark for Multimodal Large Language Models

Lin, Jieru; Huang, Danqing; Zhao, Tiejun; Zhan, Dechen; Lin, Chin-Yew

Computer Science > Computer Vision and Pattern Recognition

arXiv:2404.14801 (cs)

[Submitted on 23 Apr 2024]

Title:DesignProbe: A Graphic Design Benchmark for Multimodal Large Language Models

Authors:Jieru Lin, Danqing Huang, Tiejun Zhao, Dechen Zhan, Chin-Yew Lin

View PDF HTML (experimental)

Abstract:A well-executed graphic design typically achieves harmony in two levels, from the fine-grained design elements (color, font and layout) to the overall design. This complexity makes the comprehension of graphic design challenging, for it needs the capability to both recognize the design elements and understand the design. With the rapid development of Multimodal Large Language Models (MLLMs), we establish the DesignProbe, a benchmark to investigate the capability of MLLMs in design. Our benchmark includes eight tasks in total, across both the fine-grained element level and the overall design level. At design element level, we consider both the attribute recognition and semantic understanding tasks. At overall design level, we include style and metaphor. 9 MLLMs are tested and we apply GPT-4 as evaluator. Besides, further experiments indicates that refining prompts can enhance the performance of MLLMs. We first rewrite the prompts by different LLMs and found increased performances appear in those who self-refined by their own LLMs. We then add extra task knowledge in two different ways (text descriptions and image examples), finding that adding images boost much more performance over texts.

Comments:	work in progress
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2404.14801 [cs.CV]
	(or arXiv:2404.14801v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2404.14801

Submission history

From: Jieru Lin [view email]
[v1] Tue, 23 Apr 2024 07:31:19 UTC (2,731 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:DesignProbe: A Graphic Design Benchmark for Multimodal Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:DesignProbe: A Graphic Design Benchmark for Multimodal Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators