Bootstrapping Referring Multi-Object Tracking

Zhang, Yani; Wu, Dongming; Han, Wencheng; Dong, Xingping

Computer Science > Computer Vision and Pattern Recognition

arXiv:2406.05039 (cs)

[Submitted on 7 Jun 2024]

Title:Bootstrapping Referring Multi-Object Tracking

Authors:Yani Zhang, Dongming Wu, Wencheng Han, Xingping Dong

View PDF HTML (experimental)

Abstract:Referring multi-object tracking (RMOT) aims at detecting and tracking multiple objects following human instruction represented by a natural language expression. Existing RMOT benchmarks are usually formulated through manual annotations, integrated with static regulations. This approach results in a dearth of notable diversity and a constrained scope of implementation. In this work, our key idea is to bootstrap the task of referring multi-object tracking by introducing discriminative language words as much as possible. In specific, we first develop Refer-KITTI into a large-scale dataset, named Refer-KITTI-V2. It starts with 2,719 manual annotations, addressing the issue of class imbalance and introducing more keywords to make it closer to real-world scenarios compared to Refer-KITTI. They are further expanded to a total of 9,758 annotations by prompting large language models, which create 617 different words, surpassing previous RMOT benchmarks. In addition, the end-to-end framework in RMOT is also bootstrapped by a simple yet elegant temporal advancement strategy, which achieves better performance than previous approaches. The source code and dataset is available at this https URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Cite as:	arXiv:2406.05039 [cs.CV]
	(or arXiv:2406.05039v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2406.05039

Submission history

From: Yani Zhang [view email]
[v1] Fri, 7 Jun 2024 16:02:10 UTC (19,306 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Bootstrapping Referring Multi-Object Tracking

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Bootstrapping Referring Multi-Object Tracking

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators