Computer Science > Computer Vision and Pattern Recognition
[Submitted on 21 Mar 2025]
Title:D2C: Unlocking the Potential of Continuous Autoregressive Image Generation with Discrete Tokens
View PDF HTML (experimental)Abstract:In the domain of image generation, latent-based generative models occupy a dominant status; however, these models rely heavily on image tokenizer. To meet modeling requirements, autoregressive models possessing the characteristics of scalability and flexibility embrace a discrete-valued tokenizer, but face the challenge of poor image generation quality. In contrast, diffusion models take advantage of the continuous-valued tokenizer to achieve better generation quality but are subject to low efficiency and complexity. The existing hybrid models are mainly to compensate for information loss and simplify the diffusion learning process. The potential of merging discrete-valued and continuous-valued tokens in the field of image generation has not yet been explored. In this paper, we propose D2C, a novel two-stage method to enhance model generation capacity. In the first stage, the discrete-valued tokens representing coarse-grained image features are sampled by employing a small discrete-valued generator. Then in the second stage, the continuous-valued tokens representing fine-grained image features are learned conditioned on the discrete token sequence. In addition, we design two kinds of fusion modules for seamless interaction. On the ImageNet-256 benchmark, extensive experiment results validate that our model achieves superior performance compared with several continuous-valued and discrete-valued generative models on the class-conditional image generation tasks.
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.