Generative AI for Vision: A Comprehensive Study of Frameworks and Applications

Bousetouane, Fouad

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.18033 (cs)

[Submitted on 29 Jan 2025]

Title:Generative AI for Vision: A Comprehensive Study of Frameworks and Applications

Authors:Fouad Bousetouane

View PDF HTML (experimental)

Abstract:Generative AI is transforming image synthesis, enabling the creation of high-quality, diverse, and photorealistic visuals across industries like design, media, healthcare, and autonomous systems. Advances in techniques such as image-to-image translation, text-to-image generation, domain transfer, and multimodal alignment have broadened the scope of automated visual content creation, supporting a wide spectrum of applications. These advancements are driven by models like Generative Adversarial Networks (GANs), conditional frameworks, and diffusion-based approaches such as Stable Diffusion. This work presents a structured classification of image generation techniques based on the nature of the input, organizing methods by input modalities like noisy vectors, latent representations, and conditional inputs. We explore the principles behind these models, highlight key frameworks including DALL-E, ControlNet, and DeepSeek Janus-Pro, and address challenges such as computational costs, data biases, and output alignment with user intent. By offering this input-centric perspective, this study bridges technical depth with practical insights, providing researchers and practitioners with a comprehensive resource to harness generative AI for real-world applications.

Comments:	53 pages, 18 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2501.18033 [cs.CV]
	(or arXiv:2501.18033v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2501.18033

Submission history

From: Fouad Bousetouane [view email]
[v1] Wed, 29 Jan 2025 22:42:05 UTC (17,347 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Generative AI for Vision: A Comprehensive Study of Frameworks and Applications

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Generative AI for Vision: A Comprehensive Study of Frameworks and Applications

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators