Weak Supervision Dynamic KL-Weighted Diffusion Models Guided by Large Language Models

Perry, Julian; Sanders, Frank; Scott, Carter

Abstract:In this paper, we presents a novel method for improving text-to-image generation by combining Large Language Models (LLMs) with diffusion models, a hybrid approach aimed at achieving both higher quality and efficiency in image synthesis from text descriptions. Our approach introduces a new dynamic KL-weighting strategy to optimize the diffusion process, along with incorporating semantic understanding from pre-trained LLMs to guide the generation process. The proposed method significantly improves both the visual quality and alignment of generated images with text descriptions, addressing challenges such as computational inefficiency, instability in training, and robustness to textual variability. We evaluate our method on the COCO dataset and demonstrate its superior performance over traditional GAN-based models, both quantitatively and qualitatively. Extensive experiments, including ablation studies and human evaluations, confirm that our method outperforms existing approaches in terms of image realism, relevance to the input text, and overall aesthetic quality. Our approach also shows promise in scalability to other multimodal tasks, making it a versatile solution for a wide range of generative applications.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2502.00826 [cs.CL]
	(or arXiv:2502.00826v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.00826

Computer Science > Computation and Language

Title:Weak Supervision Dynamic KL-Weighted Diffusion Models Guided by Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators