Semantic Role Aware Correlation Transformer for Text to Video Retrieval

Satar, Burak; Zhu, Hongyuan; Bresson, Xavier; Lim, Joo Hwee

doi:10.1109/ICIP42928.2021.9506267

Computer Science > Computer Vision and Pattern Recognition

arXiv:2206.12849 (cs)

[Submitted on 26 Jun 2022]

Title:Semantic Role Aware Correlation Transformer for Text to Video Retrieval

Authors:Burak Satar, Hongyuan Zhu, Xavier Bresson, Joo Hwee Lim

View PDF

Abstract:With the emergence of social media, voluminous video clips are uploaded every day, and retrieving the most relevant visual content with a language query becomes critical. Most approaches aim to learn a joint embedding space for plain textual and visual contents without adequately exploiting their intra-modality structures and inter-modality correlations. This paper proposes a novel transformer that explicitly disentangles the text and video into semantic roles of objects, spatial contexts and temporal contexts with an attention scheme to learn the intra- and inter-role correlations among the three roles to discover discriminative features for matching at different levels. The preliminary results on popular YouCook2 indicate that our approach surpasses a current state-of-the-art method, with a high margin in all metrics. It also overpasses two SOTA methods in terms of two metrics.

Comments:	Camera-ready for ICIP 2021
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as:	arXiv:2206.12849 [cs.CV]
	(or arXiv:2206.12849v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2206.12849
Journal reference:	IEEE International Conference on Image Processing (ICIP), 2021, pp. 1334-1338
Related DOI:	https://doi.org/10.1109/ICIP42928.2021.9506267

Submission history

From: Burak Satar Mr [view email]
[v1] Sun, 26 Jun 2022 11:28:03 UTC (337 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Semantic Role Aware Correlation Transformer for Text to Video Retrieval

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Semantic Role Aware Correlation Transformer for Text to Video Retrieval

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators