CamCtrl3D: Single-Image Scene Exploration with Precise 3D Camera Control

Popov, Stefan; Raj, Amit; Krainin, Michael; Li, Yuanzhen; Freeman, William T.; Rubinstein, Michael

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.06006 (cs)

[Submitted on 10 Jan 2025]

Title:CamCtrl3D: Single-Image Scene Exploration with Precise 3D Camera Control

Authors:Stefan Popov, Amit Raj, Michael Krainin, Yuanzhen Li, William T. Freeman, Michael Rubinstein

View PDF HTML (experimental)

Abstract:We propose a method for generating fly-through videos of a scene, from a single image and a given camera trajectory. We build upon an image-to-video latent diffusion model. We condition its UNet denoiser on the camera trajectory, using four techniques. (1) We condition the UNet's temporal blocks on raw camera extrinsics, similar to MotionCtrl. (2) We use images containing camera rays and directions, similar to CameraCtrl. (3) We reproject the initial image to subsequent frames and use the resulting video as a condition. (4) We use 2D<=>3D transformers to introduce a global 3D representation, which implicitly conditions on the camera poses. We combine all conditions in a ContolNet-style architecture. We then propose a metric that evaluates overall video quality and the ability to preserve details with view changes, which we use to analyze the trade-offs of individual and combined conditions. Finally, we identify an optimal combination of conditions. We calibrate camera positions in our datasets for scale consistency across scenes, and we train our scene exploration model, CamCtrl3D, demonstrating state-of-theart results.

Comments:	To be published in 3DV 2025
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2501.06006 [cs.CV]
	(or arXiv:2501.06006v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2501.06006

Submission history

From: Stefan Popov [view email]
[v1] Fri, 10 Jan 2025 14:37:32 UTC (37,974 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:CamCtrl3D: Single-Image Scene Exploration with Precise 3D Camera Control

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:CamCtrl3D: Single-Image Scene Exploration with Precise 3D Camera Control

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators