Improving Video Generation with Human Feedback

Liu, Jie; Liu, Gongye; Liang, Jiajun; Yuan, Ziyang; Liu, Xiaokun; Zheng, Mingwu; Wu, Xiele; Wang, Qiulin; Qin, Wenyu; Xia, Menghan; Wang, Xintao; Liu, Xiaohong; Yang, Fei; Wan, Pengfei; Zhang, Di; Gai, Kun; Yang, Yujiu; Ouyang, Wanli

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.13918 (cs)

[Submitted on 23 Jan 2025]

Title:Improving Video Generation with Human Feedback

Authors:Jie Liu, Gongye Liu, Jiajun Liang, Ziyang Yuan, Xiaokun Liu, Mingwu Zheng, Xiele Wu, Qiulin Wang, Wenyu Qin, Menghan Xia, Xintao Wang, Xiaohong Liu, Fei Yang, Pengfei Wan, Di Zhang, Kun Gai, Yujiu Yang, Wanli Ouyang

View PDF HTML (experimental)

Abstract:Video generation has achieved significant advances through rectified flow techniques, but issues like unsmooth motion and misalignment between videos and prompts persist. In this work, we develop a systematic pipeline that harnesses human feedback to mitigate these problems and refine the video generation model. Specifically, we begin by constructing a large-scale human preference dataset focused on modern video generation models, incorporating pairwise annotations across multi-dimensions. We then introduce VideoReward, a multi-dimensional video reward model, and examine how annotations and various design choices impact its rewarding efficacy. From a unified reinforcement learning perspective aimed at maximizing reward with KL regularization, we introduce three alignment algorithms for flow-based models by extending those from diffusion models. These include two training-time strategies: direct preference optimization for flow (Flow-DPO) and reward weighted regression for flow (Flow-RWR), and an inference-time technique, Flow-NRG, which applies reward guidance directly to noisy videos. Experimental results indicate that VideoReward significantly outperforms existing reward models, and Flow-DPO demonstrates superior performance compared to both Flow-RWR and standard supervised fine-tuning methods. Additionally, Flow-NRG lets users assign custom weights to multiple objectives during inference, meeting personalized video quality needs. Project page: this https URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
Cite as:	arXiv:2501.13918 [cs.CV]
	(or arXiv:2501.13918v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2501.13918

Submission history

From: Jie Liu [view email]
[v1] Thu, 23 Jan 2025 18:55:41 UTC (4,260 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Improving Video Generation with Human Feedback

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Improving Video Generation with Human Feedback

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators