Personalized and Sequential Text-to-Image Generation

Nabati, Ofir; Tennenholtz, Guy; Hsu, ChihWei; Ryu, Moonkyung; Ramachandran, Deepak; Chow, Yinlam; Li, Xiang; Boutilier, Craig

Computer Science > Computer Vision and Pattern Recognition

arXiv:2412.10419 (cs)

[Submitted on 10 Dec 2024]

Title:Personalized and Sequential Text-to-Image Generation

Authors:Ofir Nabati, Guy Tennenholtz, ChihWei Hsu, Moonkyung Ryu, Deepak Ramachandran, Yinlam Chow, Xiang Li, Craig Boutilier

View PDF HTML (experimental)

Abstract:We address the problem of personalized, interactive text-to-image (T2I) generation, designing a reinforcement learning (RL) agent which iteratively improves a set of generated images for a user through a sequence of prompt expansions. Using human raters, we create a novel dataset of sequential preferences, which we leverage, together with large-scale open-source (non-sequential) datasets. We construct user-preference and user-choice models using an EM strategy and identify varying user preference types. We then leverage a large multimodal language model (LMM) and a value-based RL approach to suggest a personalized and diverse slate of prompt expansions to the user. Our Personalized And Sequential Text-to-image Agent (PASTA) extends T2I models with personalized multi-turn capabilities, fostering collaborative co-creation and addressing uncertainty or underspecification in a user's intent. We evaluate PASTA using human raters, showing significant improvement compared to baseline methods. We also release our sequential rater dataset and simulated user-rater interactions to support future research in personalized, multi-turn T2I generation.

Comments:	Link to PASTA dataset: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Systems and Control (eess.SY)
Cite as:	arXiv:2412.10419 [cs.CV]
	(or arXiv:2412.10419v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2412.10419

Submission history

From: Guy Tennenholtz [view email]
[v1] Tue, 10 Dec 2024 01:47:40 UTC (43,281 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Personalized and Sequential Text-to-Image Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Personalized and Sequential Text-to-Image Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators