Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Yuan, Xin; Baek, Jinoo; Xu, Keyang; Tov, Omer; Fei, Hongliang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2401.10404 (cs)

[Submitted on 18 Jan 2024]

Title:Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Authors:Xin Yuan, Jinoo Baek, Keyang Xu, Omer Tov, Hongliang Fei

View PDF HTML (experimental)

Abstract:We propose an efficient diffusion-based text-to-video super-resolution (SR) tuning approach that leverages the readily learned capacity of pixel level image diffusion model to capture spatial information for video generation. To accomplish this goal, we design an efficient architecture by inflating the weightings of the text-to-image SR model into our video generation framework. Additionally, we incorporate a temporal adapter to ensure temporal coherence across video frames. We investigate different tuning approaches based on our inflated architecture and report trade-offs between computational costs and super-resolution quality. Empirical evaluation, both quantitative and qualitative, on the Shutterstock video dataset, demonstrates that our approach is able to perform text-to-video SR generation with good visual quality and temporal consistency. To evaluate temporal coherence, we also present visualizations in video format in this https URL .

Comments:	WACV'24 workshop
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2401.10404 [cs.CV]
	(or arXiv:2401.10404v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2401.10404

Submission history

From: Xin Yuan [view email]
[v1] Thu, 18 Jan 2024 22:25:16 UTC (4,255 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators