Single-Stage Visual Relationship Learning using Conditional Queries

Desai, Alakh; Wu, Tz-Ying; Tripathi, Subarna; Vasconcelos, Nuno

Abstract:Research in scene graph generation (SGG) usually considers two-stage models, that is, detecting a set of entities, followed by combining them and labeling all possible relationships. While showing promising results, the pipeline structure induces large parameter and computation overhead, and typically hinders end-to-end optimizations. To address this, recent research attempts to train single-stage models that are computationally efficient. With the advent of DETR, a set based detection model, one-stage models attempt to predict a set of subject-predicate-object triplets directly in a single shot. However, SGG is inherently a multi-task learning problem that requires modeling entity and predicate distributions simultaneously. In this paper, we propose Transformers with conditional queries for SGG, namely, TraCQ with a new formulation for SGG that avoids the multi-task learning problem and the combinatorial entity pair distribution. We employ a DETR-based encoder-decoder design and leverage conditional queries to significantly reduce the entity label space as well, which leads to 20% fewer parameters compared to state-of-the-art single-stage models. Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats many state-of-the-art two-stage methods on the Visual Genome dataset, yet is capable of end-to-end training and faster inference.

Comments:	Accepted to NeurIPS 2022
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2306.05689 [cs.CV]
	(or arXiv:2306.05689v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2306.05689

Computer Science > Computer Vision and Pattern Recognition

Title:Single-Stage Visual Relationship Learning using Conditional Queries

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators