Watch Video, Catch Keyword: Context-aware Keyword Attention for Moment Retrieval and Highlight Detection

Um, Sung Jin; Kim, Dongjin; Lee, Sangmin; Kim, Jung Uk

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.02504 (cs)

[Submitted on 5 Jan 2025]

Title:Watch Video, Catch Keyword: Context-aware Keyword Attention for Moment Retrieval and Highlight Detection

Authors:Sung Jin Um, Dongjin Kim, Sangmin Lee, Jung Uk Kim

View PDF HTML (experimental)

Abstract:The goal of video moment retrieval and highlight detection is to identify specific segments and highlights based on a given text query. With the rapid growth of video content and the overlap between these tasks, recent works have addressed both simultaneously. However, they still struggle to fully capture the overall video context, making it challenging to determine which words are most relevant. In this paper, we present a novel Video Context-aware Keyword Attention module that overcomes this limitation by capturing keyword variation within the context of the entire video. To achieve this, we introduce a video context clustering module that provides concise representations of the overall video context, thereby enhancing the understanding of keyword dynamics. Furthermore, we propose a keyword weight detection module with keyword-aware contrastive learning that incorporates keyword information to enhance fine-grained alignment between visual and textual features. Extensive experiments on the QVHighlights, TVSum, and Charades-STA benchmarks demonstrate that our proposed method significantly improves performance in moment retrieval and highlight detection tasks compared to existing approaches. Our code is available at: this https URL

Comments:	Accepted at AAAI 2025
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2501.02504 [cs.CV]
	(or arXiv:2501.02504v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2501.02504

Submission history

From: Sung Jin Um [view email]
[v1] Sun, 5 Jan 2025 11:01:27 UTC (4,036 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Watch Video, Catch Keyword: Context-aware Keyword Attention for Moment Retrieval and Highlight Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Watch Video, Catch Keyword: Context-aware Keyword Attention for Moment Retrieval and Highlight Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators