Learning to Animate Images from A Few Videos to Portray Delicate Human Actions

Li, Haoxin; Yu, Yingchen; Wu, Qilong; Zhang, Hanwang; Li, Boyang; Bai, Song

Computer Science > Computer Vision and Pattern Recognition

arXiv:2503.00276 (cs)

[Submitted on 1 Mar 2025]

Title:Learning to Animate Images from A Few Videos to Portray Delicate Human Actions

Authors:Haoxin Li, Yingchen Yu, Qilong Wu, Hanwang Zhang, Boyang Li, Song Bai

View PDF HTML (experimental)

Abstract:Despite recent progress, video generative models still struggle to animate human actions from static images, particularly when handling uncommon actions whose training data are limited. In this paper, we investigate the task of learning to animate human actions from a small number of videos -- 16 or fewer -- which is highly valuable in real-world applications like video and movie production. Few-shot learning of generalizable motion patterns while ensuring smooth transitions from the initial reference image is exceedingly challenging. We propose FLASH (Few-shot Learning to Animate and Steer Humans), which improves motion generalization by aligning motion features and inter-frame correspondence relations between videos that share the same motion but have different appearances. This approach minimizes overfitting to visual appearances in the limited training data and enhances the generalization of learned motion patterns. Additionally, FLASH extends the decoder with additional layers to compensate lost details in the latent space, fostering smooth transitions from the initial reference image. Experiments demonstrate that FLASH effectively animates images with unseen human or scene appearances into specified actions while maintaining smooth transitions from the reference image.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2503.00276 [cs.CV]
	(or arXiv:2503.00276v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2503.00276

Submission history

From: Haoxin Li [view email]
[v1] Sat, 1 Mar 2025 01:09:45 UTC (8,871 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Learning to Animate Images from A Few Videos to Portray Delicate Human Actions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Learning to Animate Images from A Few Videos to Portray Delicate Human Actions

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators