Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Hu, Kairui; Wu, Penghao; Pu, Fanyi; Xiao, Wang; Zhang, Yuanhan; Yue, Xiang; Li, Bo; Liu, Ziwei

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.13826 (cs)

[Submitted on 23 Jan 2025]

Title:Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Authors:Kairui Hu, Penghao Wu, Fanyi Pu, Wang Xiao, Yuanhan Zhang, Xiang Yue, Bo Li, Ziwei Liu

View PDF HTML (experimental)

Abstract:Humans acquire knowledge through three cognitive stages: perceiving information, comprehending knowledge, and adapting knowledge to solve novel problems. Videos serve as an effective medium for this learning process, facilitating a progression through these cognitive stages. However, existing video benchmarks fail to systematically evaluate the knowledge acquisition capabilities in Large Multimodal Models (LMMs). To address this gap, we introduce Video-MMMU, a multi-modal, multi-disciplinary benchmark designed to assess LMMs' ability to acquire and utilize knowledge from videos. Video-MMMU features a curated collection of 300 expert-level videos and 900 human-annotated questions across six disciplines, evaluating knowledge acquisition through stage-aligned question-answer pairs: Perception, Comprehension, and Adaptation. A proposed knowledge gain metric, {\Delta}knowledge, quantifies improvement in performance after video viewing. Evaluation of LMMs reveals a steep decline in performance as cognitive demands increase and highlights a significant gap between human and model knowledge acquisition, underscoring the need for methods to enhance LMMs' capability to learn and adapt from videos.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Cite as:	arXiv:2501.13826 [cs.CV]
	(or arXiv:2501.13826v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2501.13826

Submission history

From: Xiang Yue [view email]
[v1] Thu, 23 Jan 2025 16:51:47 UTC (15,122 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators