VideoRAG: Retrieval-Augmented Generation over Video Corpus

Jeong, Soyeong; Kim, Kangsan; Baek, Jinheon; Hwang, Sung Ju

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.05874 (cs)

[Submitted on 10 Jan 2025]

Title:VideoRAG: Retrieval-Augmented Generation over Video Corpus

Authors:Soyeong Jeong, Kangsan Kim, Jinheon Baek, Sung Ju Hwang

View PDF HTML (experimental)

Abstract:Retrieval-Augmented Generation (RAG) is a powerful strategy to address the issue of generating factually incorrect outputs in foundation models by retrieving external knowledge relevant to queries and incorporating it into their generation process. However, existing RAG approaches have primarily focused on textual information, with some recent advancements beginning to consider images, and they largely overlook videos, a rich source of multimodal knowledge capable of representing events, processes, and contextual details more effectively than any other modality. While a few recent studies explore the integration of videos in the response generation process, they either predefine query-associated videos without retrieving them according to queries, or convert videos into the textual descriptions without harnessing their multimodal richness. To tackle these, we introduce VideoRAG, a novel framework that not only dynamically retrieves relevant videos based on their relevance with queries but also utilizes both visual and textual information of videos in the output generation. Further, to operationalize this, our method revolves around the recent advance of Large Video Language Models (LVLMs), which enable the direct processing of video content to represent it for retrieval and seamless integration of the retrieved videos jointly with queries. We experimentally validate the effectiveness of VideoRAG, showcasing that it is superior to relevant baselines.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as:	arXiv:2501.05874 [cs.CV]
	(or arXiv:2501.05874v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2501.05874

Submission history

From: Soyeong Jeong [view email]
[v1] Fri, 10 Jan 2025 11:17:15 UTC (1,622 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:VideoRAG: Retrieval-Augmented Generation over Video Corpus

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:VideoRAG: Retrieval-Augmented Generation over Video Corpus

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators