FashionSD-X: Multimodal Fashion Garment Synthesis using Latent Diffusion

Singh, Abhishek Kumar; Patras, Ioannis

Computer Science > Computer Vision and Pattern Recognition

arXiv:2404.18591 (cs)

[Submitted on 26 Apr 2024]

Title:FashionSD-X: Multimodal Fashion Garment Synthesis using Latent Diffusion

Authors:Abhishek Kumar Singh, Ioannis Patras

View PDF HTML (experimental)

Abstract:The rapid evolution of the fashion industry increasingly intersects with technological advancements, particularly through the integration of generative AI. This study introduces a novel generative pipeline designed to transform the fashion design process by employing latent diffusion models. Utilizing ControlNet and LoRA fine-tuning, our approach generates high-quality images from multimodal inputs such as text and sketches. We leverage and enhance state-of-the-art virtual try-on datasets, including Multimodal Dress Code and VITON-HD, by integrating sketch data. Our evaluation, utilizing metrics like FID, CLIP Score, and KID, demonstrates that our model significantly outperforms traditional stable diffusion models. The results not only highlight the effectiveness of our model in generating fashion-appropriate outputs but also underscore the potential of diffusion models in revolutionizing fashion design workflows. This research paves the way for more interactive, personalized, and technologically enriched methodologies in fashion design and representation, bridging the gap between creative vision and practical application.

Comments:	9 pages, 8 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2404.18591 [cs.CV]
	(or arXiv:2404.18591v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2404.18591

Submission history

From: Abhishek Kumar Singh [view email]
[v1] Fri, 26 Apr 2024 14:59:42 UTC (172,980 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:FashionSD-X: Multimodal Fashion Garment Synthesis using Latent Diffusion

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:FashionSD-X: Multimodal Fashion Garment Synthesis using Latent Diffusion

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators