Polysemy Deciphering Network for Robust Human-Object Interaction Detection

Zhong, Xubin; Ding, Changxing; Qu, Xian; Tao, Dacheng

Computer Science > Computer Vision and Pattern Recognition

arXiv:2008.02918v2 (cs)

[Submitted on 7 Aug 2020 (v1), revised 23 Mar 2021 (this version, v2), latest version 24 Mar 2021 (v3)]

Title:Polysemy Deciphering Network for Robust Human-Object Interaction Detection

Authors:Xubin Zhong, Changxing Ding, Xian Qu, Dacheng Tao

View PDF

Abstract:Human-Object Interaction (HOI) detection is important to human-centric scene understanding tasks. Existing works tend to assume that the same verb has similar visual characteristics in different HOI categories, an approach that ignores the diverse semantic meanings of the verb. To address this issue, in this paper, we propose a novel Polysemy Deciphering Network (PD-Net) that decodes the visual polysemy of verbs for HOI detection in three distinct ways. First, we refine features for HOI detection to be polysemyaware through the use of two novel modules: namely, Language Prior-guided Channel Attention (LPCA) and Language Prior-based Feature Augmentation (LPFA). LPCA highlights important elements in human and object appearance features for each HOI category to be identified; moreover, LPFA augments human pose and spatial features for HOI detection using language priors, enabling the verb classifiers to receive language hints that reduce intra-class variation for the same verb. Second, we introduce a novel Polysemy-Aware Modal Fusion module (PAMF), which guides PD-Net to make decisions based on feature types deemed more important according to the language priors. Third, we propose to relieve the verb polysemy problem through sharing verb classifiers for semantically similar HOI categories. Furthermore, to expedite research on the verb polysemy problem, we build a new benchmark dataset named HOI-VerbPolysemy (HOIVP), which includes common verbs (predicates) that have diverse semantic meanings in the real world. Finally, through deciphering the visual polysemy of verbs, our approach is demonstrated to outperform state-of-the-art methods by significant margins on the HICO-DET, V-COCO, and HOI-VP databases. Code and data in this paper will be released at this https URL.

Comments:	The IJCV version extended significantly from our ECCV2020 conference paper
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2008.02918 [cs.CV]
	(or arXiv:2008.02918v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2008.02918

Submission history

From: Changxing Ding [view email]
[v1] Fri, 7 Aug 2020 00:49:27 UTC (5,899 KB)
[v2] Tue, 23 Mar 2021 11:56:07 UTC (7,511 KB)
[v3] Wed, 24 Mar 2021 01:13:06 UTC (7,527 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Polysemy Deciphering Network for Robust Human-Object Interaction Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Polysemy Deciphering Network for Robust Human-Object Interaction Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators