Jointly Cross- and Self-Modal Graph Attention Network for Query-Based Moment Localization

Liu, Daizong; Qu, Xiaoye; Liu, Xiao-Yang; Dong, Jianfeng; Zhou, Pan; Xu, Zichuan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2008.01403 (cs)

[Submitted on 4 Aug 2020 (v1), last revised 13 Aug 2020 (this version, v2)]

Title:Jointly Cross- and Self-Modal Graph Attention Network for Query-Based Moment Localization

Authors:Daizong Liu, Xiaoye Qu, Xiao-Yang Liu, Jianfeng Dong, Pan Zhou, Zichuan Xu

View PDF

Abstract:Query-based moment localization is a new task that localizes the best matched segment in an untrimmed video according to a given sentence query. In this localization task, one should pay more attention to thoroughly mine visual and linguistic information. To this end, we propose a novel Cross- and Self-Modal Graph Attention Network (CSMGAN) that recasts this task as a process of iterative messages passing over a joint graph. Specifically, the joint graph consists of Cross-Modal interaction Graph (CMG) and Self-Modal relation Graph (SMG), where frames and words are represented as nodes, and the relations between cross- and self-modal node pairs are described by an attention mechanism. Through parametric message passing, CMG highlights relevant instances across video and sentence, and then SMG models the pairwise relation inside each modality for frame (word) correlating. With multiple layers of such a joint graph, our CSMGAN is able to effectively capture high-order interactions between two modalities, thus enabling a further precise localization. Besides, to better comprehend the contextual details in the query, we develop a hierarchical sentence encoder to enhance the query understanding. Extensive experiments on four public datasets demonstrate the effectiveness of our proposed model, and GCSMAN significantly outperforms the state-of-the-arts.

Comments:	Accepted by ACM MM 2020
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
Cite as:	arXiv:2008.01403 [cs.CV]
	(or arXiv:2008.01403v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2008.01403

Submission history

From: Daizong Liu [view email]
[v1] Tue, 4 Aug 2020 08:25:24 UTC (10,081 KB)
[v2] Thu, 13 Aug 2020 01:56:06 UTC (11,765 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Jointly Cross- and Self-Modal Graph Attention Network for Query-Based Moment Localization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Jointly Cross- and Self-Modal Graph Attention Network for Query-Based Moment Localization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators