Computer Science > Computer Vision and Pattern Recognition
[Submitted on 17 Jan 2025]
Title:Spatio-temporal Graph Learning on Adaptive Mined Key Frames for High-performance Multi-Object Tracking
View PDF HTML (experimental)Abstract:In the realm of multi-object tracking, the challenge of accurately capturing the spatial and temporal relationships between objects in video sequences remains a significant hurdle. This is further complicated by frequent occurrences of mutual occlusions among objects, which can lead to tracking errors and reduced performance in existing methods. Motivated by these challenges, we propose a novel adaptive key frame mining strategy that addresses the limitations of current tracking approaches. Specifically, we introduce a Key Frame Extraction (KFE) module that leverages reinforcement learning to adaptively segment videos, thereby guiding the tracker to exploit the intrinsic logic of the video content. This approach allows us to capture structured spatial relationships between different objects as well as the temporal relationships of objects across frames. To tackle the issue of object occlusions, we have developed an Intra-Frame Feature Fusion (IFF) module. Unlike traditional graph-based methods that primarily focus on inter-frame feature fusion, our IFF module uses a Graph Convolutional Network (GCN) to facilitate information exchange between the target and surrounding objects within a frame. This innovation significantly enhances target distinguishability and mitigates tracking loss and appearance similarity due to occlusions. By combining the strengths of both long and short trajectories and considering the spatial relationships between objects, our proposed tracker achieves impressive results on the MOT17 dataset, i.e., 68.6 HOTA, 81.0 IDF1, 66.6 AssA, and 893 IDS, proving its effectiveness and accuracy.
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.