Computer Science > Computer Vision and Pattern Recognition
[Submitted on 30 Aug 2024]
Title:Cross Fusion RGB-T Tracking with Bi-directional Adapter
View PDF HTML (experimental)Abstract:Many state-of-the-art RGB-T trackers have achieved remarkable results through modality fusion. However, these trackers often either overlook temporal information or fail to fully utilize it, resulting in an ineffective balance between multi-modal and temporal information. To address this issue, we propose a novel Cross Fusion RGB-T Tracking architecture (CFBT) that ensures the full participation of multiple modalities in tracking while dynamically fusing temporal information. The effectiveness of CFBT relies on three newly designed cross spatio-temporal information fusion modules: Cross Spatio-Temporal Augmentation Fusion (CSTAF), Cross Spatio-Temporal Complementarity Fusion (CSTCF), and Dual-Stream Spatio-Temporal Adapter (DSTA). CSTAF employs a cross-attention mechanism to enhance the feature representation of the template comprehensively. CSTCF utilizes complementary information between different branches to enhance target features and suppress background features. DSTA adopts the adapter concept to adaptively fuse complementary information from multiple branches within the transformer layer, using the RGB modality as a medium. These ingenious fusions of multiple perspectives introduce only less than 0.3\% of the total modal parameters, but they indeed enable an efficient balance between multi-modal and temporal information. Extensive experiments on three popular RGB-T tracking benchmarks demonstrate that our method achieves new state-of-the-art performance.
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.