VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models

Huang, Ziqi; Zhang, Fan; Xu, Xiaojie; He, Yinan; Yu, Jiashuo; Dong, Ziyue; Ma, Qianli; Chanpaisit, Nattapol; Si, Chenyang; Jiang, Yuming; Wang, Yaohui; Chen, Xinyuan; Chen, Ying-Cong; Wang, Limin; Lin, Dahua; Qiao, Yu; Liu, Ziwei

Computer Science > Computer Vision and Pattern Recognition

arXiv:2411.13503 (cs)

[Submitted on 20 Nov 2024]

Title:VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models

Authors:Ziqi Huang, Fan Zhang, Xiaojie Xu, Yinan He, Jiashuo Yu, Ziyue Dong, Qianli Ma, Nattapol Chanpaisit, Chenyang Si, Yuming Jiang, Yaohui Wang, Xinyuan Chen, Ying-Cong Chen, Limin Wang, Dahua Lin, Yu Qiao, Ziwei Liu

View PDF HTML (experimental)

Abstract:Video generation has witnessed significant advancements, yet evaluating these models remains a challenge. A comprehensive evaluation benchmark for video generation is indispensable for two reasons: 1) Existing metrics do not fully align with human perceptions; 2) An ideal evaluation system should provide insights to inform future developments of video generation. To this end, we present VBench, a comprehensive benchmark suite that dissects "video generation quality" into specific, hierarchical, and disentangled dimensions, each with tailored prompts and evaluation methods. VBench has several appealing properties: 1) Comprehensive Dimensions: VBench comprises 16 dimensions in video generation (e.g., subject identity inconsistency, motion smoothness, temporal flickering, and spatial relationship, etc). The evaluation metrics with fine-grained levels reveal individual models' strengths and weaknesses. 2) Human Alignment: We also provide a dataset of human preference annotations to validate our benchmarks' alignment with human perception, for each evaluation dimension respectively. 3) Valuable Insights: We look into current models' ability across various evaluation dimensions, and various content types. We also investigate the gaps between video and image generation models. 4) Versatile Benchmarking: VBench++ supports evaluating text-to-video and image-to-video. We introduce a high-quality Image Suite with an adaptive aspect ratio to enable fair evaluations across different image-to-video generation settings. Beyond assessing technical quality, VBench++ evaluates the trustworthiness of video generative models, providing a more holistic view of model performance. 5) Full Open-Sourcing: We fully open-source VBench++ and continually add new video generation models to our leaderboard to drive forward the field of video generation.

Comments:	Leaderboard: this https URL Code: this https URL Project page: this https URL extension of arXiv:2311.17982. arXiv admin note: substantial text overlap with arXiv:2311.17982
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2411.13503 [cs.CV]
	(or arXiv:2411.13503v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2411.13503

Submission history

From: Ziqi Huang [view email]
[v1] Wed, 20 Nov 2024 17:54:41 UTC (10,755 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators