LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data

Park, Jihye; Kim, Soohyun; Kim, Sunwoo; Yoo, Jaejun; Uh, Youngjung; Kim, Seungryong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2208.14889v1 (cs)

[Submitted on 31 Aug 2022 (this version), latest version 24 Apr 2023 (v4)]

Title:LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data

Authors:Jihye Park, Soohyun Kim, Sunwoo Kim, Jaejun Yoo, Youngjung Uh, Seungryong Kim

View PDF

Abstract:Existing techniques for image-to-image translation commonly have suffered from two critical problems: heavy reliance on per-sample domain annotation and/or inability of handling multiple attributes per image. Recent methods adopt clustering approaches to easily provide per-sample annotations in an unsupervised manner. However, they cannot account for the real-world setting; one sample may have multiple attributes. In addition, the semantics of the clusters are not easily coupled to human understanding. To overcome these, we present a LANguage-driven Image-to-image Translation model, dubbed LANIT. We leverage easy-to-obtain candidate domain annotations given in texts for a dataset and jointly optimize them during training. The target style is specified by aggregating multi-domain style vectors according to the multi-hot domain assignments. As the initial candidate domain texts might be inaccurate, we set the candidate domain texts to be learnable and jointly fine-tune them during training. Furthermore, we introduce a slack domain to cover samples that are not covered by the candidate domains. Experiments on several standard benchmarks demonstrate that LANIT achieves comparable or superior performance to the existing model.

Comments:	Project Page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2208.14889 [cs.CV]
	(or arXiv:2208.14889v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2208.14889

Submission history

From: Jihye Park [view email]
[v1] Wed, 31 Aug 2022 14:30:00 UTC (21,615 KB)
[v2] Thu, 23 Feb 2023 05:39:21 UTC (15,383 KB)
[v3] Sat, 4 Mar 2023 09:46:09 UTC (26,742 KB)
[v4] Mon, 24 Apr 2023 08:14:41 UTC (45,264 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators