Weakly-Supervised Video Moment Retrieval via Regularized Two-Branch Proposal Networks with Erasing Mechanism

Li, Haoyuan; Zhao, Zhou; Zhang, Zhu; Lin, Zhijie

Abstract:Video moment retrieval is to identify the target moment according to the given sentence in an untrimmed video. Due to temporal boundary annotations of the video are extremely time-consuming to acquire, modeling in the weakly-supervised setting is increasingly focused, where we only have access to the video-sentence pairs during training. Most existing weakly-supervised methods adopt a MIL-based framework to develop inter-sample confrontment, but neglect the intra-sample confrontment between moments with similar semantics. Therefore, these methods fail to distinguish the correct moment from plausible negative moments. Further, the previous attention models in cross-modal interaction tend to focus on a few dominant words exorbitantly, ignoring the comprehensive video-sentence correspondence. In this paper, we propose a novel Regularized Two-Branch Proposal Network with Erasing Mechanism to consider the inter-sample and intra-sample confrontments simultaneously. Concretely, we first devise a language-aware visual filter to generate both enhanced and suppressed video streams. Then, we design the sharable two-branch proposal module to generate positive and plausible negative proposals from the enhanced and suppressed branch respectively, contributing to sufficient confrontment. Besides, we introduce an attention-guided dynamic erasing mechanism in enhanced branch to discover the complementary video-sentence relation. Moreover, we apply two types of proposal regularization to stabilize the training process and improve model performance. The extensive experiments on ActivityCaption, Charades-STA and DiDeMo datasets show the effectiveness of our method.

Subjects:	Multimedia (cs.MM)
Cite as:	arXiv:2311.13946 [cs.MM]
	(or arXiv:2311.13946v1 [cs.MM] for this version)
	https://doi.org/10.48550/arXiv.2311.13946

Computer Science > Multimedia

Title:Weakly-Supervised Video Moment Retrieval via Regularized Two-Branch Proposal Networks with Erasing Mechanism

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators