Boosting Text-To-Image Generation via Multilingual Prompting in Large Multimodal Models

Mu, Yongyu; Li, Hengyu; Wang, Junxin; Zhou, Xiaoxuan; Wang, Chenglong; Luo, Yingfeng; He, Qiaozhi; Xiao, Tong; Chen, Guocheng; Zhu, Jingbo

Computer Science > Computation and Language

arXiv:2501.07086 (cs)

[Submitted on 13 Jan 2025]

Title:Boosting Text-To-Image Generation via Multilingual Prompting in Large Multimodal Models

Authors:Yongyu Mu, Hengyu Li, Junxin Wang, Xiaoxuan Zhou, Chenglong Wang, Yingfeng Luo, Qiaozhi He, Tong Xiao, Guocheng Chen, Jingbo Zhu

View PDF HTML (experimental)

Abstract:Previous work on augmenting large multimodal models (LMMs) for text-to-image (T2I) generation has focused on enriching the input space of in-context learning (ICL). This includes providing a few demonstrations and optimizing image descriptions to be more detailed and logical. However, as demand for more complex and flexible image descriptions grows, enhancing comprehension of input text within the ICL paradigm remains a critical yet underexplored area. In this work, we extend this line of research by constructing parallel multilingual prompts aimed at harnessing the multilingual capabilities of LMMs. More specifically, we translate the input text into several languages and provide the models with both the original text and the translations. Experiments on two LMMs across 3 benchmarks show that our method, PMT2I, achieves superior performance in general, compositional, and fine-grained assessments, especially in human preference alignment. Additionally, with its advantage of generating more diverse images, PMT2I significantly outperforms baseline prompts when incorporated with reranking methods. Our code and parallel multilingual data can be found at this https URL.

Comments:	Accepted to ICASSP 2025
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2501.07086 [cs.CL]
	(or arXiv:2501.07086v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2501.07086

Submission history

From: Yongyu Mu [view email]
[v1] Mon, 13 Jan 2025 06:41:23 UTC (917 KB)

Computer Science > Computation and Language

Title:Boosting Text-To-Image Generation via Multilingual Prompting in Large Multimodal Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Boosting Text-To-Image Generation via Multilingual Prompting in Large Multimodal Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators