AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation

Yue, Yuanwen; Mahadevan, Sabarinath; Schult, Jonas; Engelmann, Francis; Leibe, Bastian; Schindler, Konrad; Kontogianni, Theodora

Computer Science > Computer Vision and Pattern Recognition

arXiv:2306.00977v1 (cs)

[Submitted on 1 Jun 2023 (this version), latest version 10 Apr 2024 (v4)]

Title:AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation

Authors:Yuanwen Yue, Sabarinath Mahadevan, Jonas Schult, Francis Engelmann, Bastian Leibe, Konrad Schindler, Theodora Kontogianni

View PDF

Abstract:During interactive segmentation, a model and a user work together to delineate objects of interest in a 3D point cloud. In an iterative process, the model assigns each data point to an object (or the background), while the user corrects errors in the resulting segmentation and feeds them back into the model. From a machine learning perspective the goal is to design the model and the feedback mechanism in a way that minimizes the required user input. The current best practice segments objects one at a time, and asks the user to provide positive clicks to indicate regions wrongly assigned to the background and negative clicks to indicate regions wrongly assigned to the object (foreground). Sequentially visiting objects is wasteful, since it disregards synergies between objects: a positive click for a given object can, by definition, serve as a negative click for nearby objects, moreover a direct competition between adjacent objects can speed up the identification of their common boundary. We introduce AGILE3D, an efficient, attention-based model that (1) supports simultaneous segmentation of multiple 3D objects, (2) yields more accurate segmentation masks with fewer user clicks, and (3) offers faster inference. We encode the point cloud into a latent feature representation, and view user clicks as queries and employ cross-attention to represent contextual relations between different click locations as well as between clicks and the 3D point cloud features. Every time new clicks are added, we only need to run a lightweight decoder that produces updated segmentation masks. In experiments with four different point cloud datasets, AGILE3D sets a new state of the art, moreover, we also verify its practicality in real-world setups with a real user study.

Comments:	Project page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
Cite as:	arXiv:2306.00977 [cs.CV]
	(or arXiv:2306.00977v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2306.00977

Submission history

From: Yuanwen Yue [view email]
[v1] Thu, 1 Jun 2023 17:59:10 UTC (32,784 KB)
[v2] Mon, 9 Oct 2023 17:51:12 UTC (40,672 KB)
[v3] Thu, 18 Jan 2024 18:59:17 UTC (47,008 KB)
[v4] Wed, 10 Apr 2024 10:56:00 UTC (47,018 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators