Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model

Zhang, Shengjun; Li, Jinzhao; Fei, Xin; Liu, Hao; Duan, Yueqi

Computer Science > Computer Vision and Pattern Recognition

arXiv:2504.02764 (cs)

[Submitted on 3 Apr 2025]

Title:Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model

Authors:Shengjun Zhang, Jinzhao Li, Xin Fei, Hao Liu, Yueqi Duan

View PDF HTML (experimental)

Abstract:In this paper, we propose Scene Splatter, a momentum-based paradigm for video diffusion to generate generic scenes from single image. Existing methods, which employ video generation models to synthesize novel views, suffer from limited video length and scene inconsistency, leading to artifacts and distortions during further reconstruction. To address this issue, we construct noisy samples from original features as momentum to enhance video details and maintain scene consistency. However, for latent features with the perception field that spans both known and unknown regions, such latent-level momentum restricts the generative ability of video diffusion in unknown regions. Therefore, we further introduce the aforementioned consistent video as a pixel-level momentum to a directly generated video without momentum for better recovery of unseen regions. Our cascaded momentum enables video diffusion models to generate both high-fidelity and consistent novel views. We further finetune the global Gaussian representations with enhanced frames and render new frames for momentum update in the next step. In this manner, we can iteratively recover a 3D scene, avoiding the limitation of video length. Extensive experiments demonstrate the generalization capability and superior performance of our method in high-fidelity and consistent scene generation.

Comments:	CVPR 2025
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2504.02764 [cs.CV]
	(or arXiv:2504.02764v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2504.02764

Submission history

From: Shengjun Zhang [view email]
[v1] Thu, 3 Apr 2025 17:00:44 UTC (15,943 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators