Resource-Efficient Motion Control for Video Generation via Dynamic Mask Guidance

Feng, Sicong; Yang, Jielong; Peng, Li

Computer Science > Computer Vision and Pattern Recognition

arXiv:2503.18386 (cs)

[Submitted on 24 Mar 2025]

Title:Resource-Efficient Motion Control for Video Generation via Dynamic Mask Guidance

Authors:Sicong Feng, Jielong Yang, Li Peng

View PDF HTML (experimental)

Abstract:Recent advances in diffusion models bring new vitality to visual content creation. However, current text-to-video generation models still face significant challenges such as high training costs, substantial data requirements, and difficulties in maintaining consistency between given text and motion of the foreground object. To address these challenges, we propose mask-guided video generation, which can control video generation through mask motion sequences, while requiring limited training data. Our model enhances existing architectures by incorporating foreground masks for precise text-position matching and motion trajectory control. Through mask motion sequences, we guide the video generation process to maintain consistent foreground objects throughout the sequence. Additionally, through a first-frame sharing strategy and autoregressive extension approach, we achieve more stable and longer video generation. Extensive qualitative and quantitative experiments demonstrate that this approach excels in various video generation tasks, such as video editing and generating artistic videos, outperforming previous methods in terms of consistency and quality. Our generated results can be viewed in the supplementary materials.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2503.18386 [cs.CV]
	(or arXiv:2503.18386v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2503.18386

Submission history

From: Feng Sicong [view email]
[v1] Mon, 24 Mar 2025 06:53:08 UTC (41,645 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Resource-Efficient Motion Control for Video Generation via Dynamic Mask Guidance

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Resource-Efficient Motion Control for Video Generation via Dynamic Mask Guidance

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators