Customizing Text-to-Image Models with a Single Image Pair

Jones, Maxwell; Wang, Sheng-Yu; Kumari, Nupur; Bau, David; Zhu, Jun-Yan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2405.01536 (cs)

[Submitted on 2 May 2024 (v1), last revised 28 Oct 2024 (this version, v2)]

Title:Customizing Text-to-Image Models with a Single Image Pair

Authors:Maxwell Jones, Sheng-Yu Wang, Nupur Kumari, David Bau, Jun-Yan Zhu

View PDF HTML (experimental)

Abstract:Art reinterpretation is the practice of creating a variation of a reference work, making a paired artwork that exhibits a distinct artistic style. We ask if such an image pair can be used to customize a generative model to capture the demonstrated stylistic difference. We propose Pair Customization, a new customization method that learns stylistic difference from a single image pair and then applies the acquired style to the generation process. Unlike existing methods that learn to mimic a single concept from a collection of images, our method captures the stylistic difference between paired images. This allows us to apply a stylistic change without overfitting to the specific image content in the examples. To address this new task, we employ a joint optimization method that explicitly separates the style and content into distinct LoRA weight spaces. We optimize these style and content weights to reproduce the style and content images while encouraging their orthogonality. During inference, we modify the diffusion process via a new style guidance based on our learned weights. Both qualitative and quantitative experiments show that our method can effectively learn style while avoiding overfitting to image content, highlighting the potential of modeling such stylistic differences from a single image pair.

Comments:	project page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
Cite as:	arXiv:2405.01536 [cs.CV]
	(or arXiv:2405.01536v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2405.01536

Submission history

From: Maxwell Jones [view email]
[v1] Thu, 2 May 2024 17:59:52 UTC (37,534 KB)
[v2] Mon, 28 Oct 2024 17:02:28 UTC (45,198 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Customizing Text-to-Image Models with a Single Image Pair

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Customizing Text-to-Image Models with a Single Image Pair

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators