LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data

Park, Jihye; Kim, Sunwoo; Kim, Soohyun; Yoo, Jaejun; Cho, Seokju; Uh, Youngjung; Kim, Seungryong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2208.14889v3 (cs)

[Submitted on 31 Aug 2022 (v1), revised 4 Mar 2023 (this version, v3), latest version 24 Apr 2023 (v4)]

Title:LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data

Authors:Jihye Park, Sunwoo Kim, Soohyun Kim, Jaejun Yoo, Seokju Cho, Youngjung Uh, Seungryong Kim

View PDF

Abstract:Existing techniques for image-to-image translation commonly have suffered from two critical problems: heavy reliance on per-sample domain annotation and/or inability of handling multiple attributes per image. Recent truly-unsupervised methods adopt clustering approaches to easily provide per-sample one-hot domain labels. However, they cannot account for the real-world setting: one sample may have multiple attributes. In addition, the semantics of the clusters are not easily coupled to the human understanding. To overcome these, we present a LANguage-driven Image-to-image Translation model, dubbed LANIT. We leverage easy-to-obtain candidate attributes given in texts for a dataset: the similarity between images and attributes indicates per-sample domain labels. This formulation naturally enables multi-hot label so that users can specify the target domain with a set of attributes in language. To account for the case that the initial prompts are inaccurate, we also present prompt learning. We further present domain regularization loss that enforces translated images be mapped to the corresponding domain. Experiments on several standard benchmarks demonstrate that LANIT achieves comparable or superior performance to existing models.

Comments:	Accepted to CVPR 2023. Project Page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2208.14889 [cs.CV]
	(or arXiv:2208.14889v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2208.14889

Submission history

From: Jihye Park [view email]
[v1] Wed, 31 Aug 2022 14:30:00 UTC (21,615 KB)
[v2] Thu, 23 Feb 2023 05:39:21 UTC (15,383 KB)
[v3] Sat, 4 Mar 2023 09:46:09 UTC (26,742 KB)
[v4] Mon, 24 Apr 2023 08:14:41 UTC (45,264 KB)

✅2024-10-01: arxiv.org is back to normal.✅

Computer Science > Computer Vision and Pattern Recognition

Title:LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

✅2024-10-01: arxiv.org is back to normal.✅

Computer Science > Computer Vision and Pattern Recognition

Title:LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators