PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs

Zhou, Teng; Zhang, Xiaoyu; Tang, Yongchuan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2411.15867 (cs)

[Submitted on 24 Nov 2024]

Title:PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs

Authors:Teng Zhou, Xiaoyu Zhang, Yongchuan Tang

View PDF HTML (experimental)

Abstract:Panoramic Image Generation has emerged as an important task in image generation, driven by growing demands for large-scale visuals in creative and technical applications. While diffusion models have dominated this field, they face inherent limitations, including the multilevel-coherence challenge and implementation complexity, leading to suboptimal outcomes. In this paper, we introduce PanoLlama, a novel framework that redefines panoramic image generation as a next-token prediction task. Building on the pre-trained LlamaGen architecture, we generate images in an autoregressive manner and develop an expansion strategy to handle size limitations. This method aligns with the image token structure in a crop-wise and training-free manner, resulting in high-quality panoramas with minimal seams and maximum scalability. PanoLlama demonstrates its effectiveness and versatility in our experiments, achieving the best overall performance while offering flexibility for multi-scale, multi-layout, and multi-guidance generation. It overcomes the challenges that diffusion-based methods fail to address, setting a new paradigm for panoramic image generation tasks. Code is available at this https URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2411.15867 [cs.CV]
	(or arXiv:2411.15867v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2411.15867

Submission history

From: Teng Zhou [view email]
[v1] Sun, 24 Nov 2024 15:06:57 UTC (27,602 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators