AesthetiQ: Enhancing Graphic Layout Design via Aesthetic-Aware Preference Alignment of Multi-modal Large Language Models

Patnaik, Sohan; Jain, Rishabh; Krishnamurthy, Balaji; Sarkar, Mausoom

Computer Science > Computer Vision and Pattern Recognition

arXiv:2503.00591 (cs)

[Submitted on 1 Mar 2025]

Title:AesthetiQ: Enhancing Graphic Layout Design via Aesthetic-Aware Preference Alignment of Multi-modal Large Language Models

Authors:Sohan Patnaik, Rishabh Jain, Balaji Krishnamurthy, Mausoom Sarkar

View PDF HTML (experimental)

Abstract:Visual layouts are essential in graphic design fields such as advertising, posters, and web interfaces. The application of generative models for content-aware layout generation has recently gained traction. However, these models fail to understand the contextual aesthetic requirements of layout design and do not align with human-like preferences, primarily treating it as a prediction task without considering the final rendered output. To overcome these problems, we offer Aesthetic-Aware Preference Alignment(AAPA), a novel technique to train a Multi-modal Large Language Model (MLLM) for layout prediction that uses MLLM's aesthetic preferences for Direct Preference Optimization over graphic layouts. We propose a data filtering protocol utilizing our layout-quality heuristics for AAPA to ensure training happens on high-quality layouts. Additionally, we introduce a novel evaluation metric that uses another MLLM to compute the win rate of the generated layout against the ground-truth layout based on aesthetics criteria. We also demonstrate the applicability of AAPA for MLLMs of varying scales (1B to 8B parameters) and LLM families (Qwen, Phi, InternLM). By conducting thorough qualitative and quantitative analyses, we verify the efficacy of our approach on two challenging benchmarks - Crello and Webui, showcasing 17%, and 16 improvement over current State-of-The-Art methods, thereby highlighting the potential of MLLMs in aesthetic-aware layout generation.

Comments:	Accepted for publication in CVPR 2025
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2503.00591 [cs.CV]
	(or arXiv:2503.00591v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2503.00591

Submission history

From: Rishabh Jain [view email]
[v1] Sat, 1 Mar 2025 19:05:02 UTC (34,395 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:AesthetiQ: Enhancing Graphic Layout Design via Aesthetic-Aware Preference Alignment of Multi-modal Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:AesthetiQ: Enhancing Graphic Layout Design via Aesthetic-Aware Preference Alignment of Multi-modal Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators