CustomTTT: Motion and Appearance Customized Video Generation via Test-Time Training

Bi, Xiuli; Lu, Jian; Liu, Bo; Cun, Xiaodong; Zhang, Yong; Li, Weisheng; Xiao, Bin

Computer Science > Computer Vision and Pattern Recognition

arXiv:2412.15646 (cs)

[Submitted on 20 Dec 2024 (v1), last revised 23 Dec 2024 (this version, v2)]

Title:CustomTTT: Motion and Appearance Customized Video Generation via Test-Time Training

Authors:Xiuli Bi, Jian Lu, Bo Liu, Xiaodong Cun, Yong Zhang, Weisheng Li, Bin Xiao

View PDF HTML (experimental)

Abstract:Benefiting from large-scale pre-training of text-video pairs, current text-to-video (T2V) diffusion models can generate high-quality videos from the text description. Besides, given some reference images or videos, the parameter-efficient fine-tuning method, i.e. LoRA, can generate high-quality customized concepts, e.g., the specific subject or the motions from a reference video. However, combining the trained multiple concepts from different references into a single network shows obvious artifacts. To this end, we propose CustomTTT, where we can joint custom the appearance and the motion of the given video easily. In detail, we first analyze the prompt influence in the current video diffusion model and find the LoRAs are only needed for the specific layers for appearance and motion customization. Besides, since each LoRA is trained individually, we propose a novel test-time training technique to update parameters after combination utilizing the trained customized models. We conduct detailed experiments to verify the effectiveness of the proposed methods. Our method outperforms several state-of-the-art works in both qualitative and quantitative evaluations.

Comments:	Accepted in AAAI 2025. Project Page: this https URL Code: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2412.15646 [cs.CV]
	(or arXiv:2412.15646v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2412.15646

Submission history

From: Lu Jian [view email]
[v1] Fri, 20 Dec 2024 08:05:13 UTC (4,974 KB)
[v2] Mon, 23 Dec 2024 06:52:45 UTC (3,345 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:CustomTTT: Motion and Appearance Customized Video Generation via Test-Time Training

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:CustomTTT: Motion and Appearance Customized Video Generation via Test-Time Training

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators