Shot2Story: A New Benchmark for Comprehensive Understanding of Multi-shot Videos

Han, Mingfei; Yang, Linjie; Chang, Xiaojun; Yao, Lina; Wang, Heng

Computer Science > Computer Vision and Pattern Recognition

arXiv:2312.10300 (cs)

[Submitted on 16 Dec 2023 (v1), last revised 5 Feb 2025 (this version, v3)]

Title:Shot2Story: A New Benchmark for Comprehensive Understanding of Multi-shot Videos

Authors:Mingfei Han, Linjie Yang, Xiaojun Chang, Lina Yao, Heng Wang

View PDF HTML (experimental)

Abstract:A short clip of video may contain progression of multiple events and an interesting story line. A human need to capture both the event in every shot and associate them together to understand the story behind it. In this work, we present a new multi-shot video understanding benchmark Shot2Story with detailed shot-level captions, comprehensive video summaries and question-answering pairs. To facilitate better semantic understanding of videos, we provide captions for both visual signals and human narrations. We design several distinct tasks including single-shot video captioning, multi-shot video summarization, and multi-shot video question answering. Preliminary experiments show some challenges to generate a long and comprehensive video summary for multi-shot videos. Nevertheless, the generated imperfect summaries can already achieve competitive performance on existing video understanding tasks such as video question-answering, promoting an under-explored setting of video understanding with detailed summaries.

Comments:	ICLR 2025. Extended annotation with 43K multi-shot videos in total. this https URL for updates and more information
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2312.10300 [cs.CV]
	(or arXiv:2312.10300v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2312.10300

Submission history

From: Mingfei Han [view email]
[v1] Sat, 16 Dec 2023 03:17:30 UTC (5,330 KB)
[v2] Tue, 19 Dec 2023 02:04:18 UTC (5,330 KB)
[v3] Wed, 5 Feb 2025 09:57:59 UTC (12,073 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Shot2Story: A New Benchmark for Comprehensive Understanding of Multi-shot Videos

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Shot2Story: A New Benchmark for Comprehensive Understanding of Multi-shot Videos

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators