UFM: Unified Feature Matching Pre-training with Multi-Modal Image Assistants

Di, Yide; Liao, Yun; Zhou, Hao; Zhu, Kaijun; Duan, Qing; Liu, Junhui; Lu, Mingyu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2503.21820 (cs)

[Submitted on 26 Mar 2025]

Title:UFM: Unified Feature Matching Pre-training with Multi-Modal Image Assistants

Authors:Yide Di, Yun Liao, Hao Zhou, Kaijun Zhu, Qing Duan, Junhui Liu, Mingyu Lu

View PDF HTML (experimental)

Abstract:Image feature matching, a foundational task in computer vision, remains challenging for multimodal image applications, often necessitating intricate training on specific datasets. In this paper, we introduce a Unified Feature Matching pre-trained model (UFM) designed to address feature matching challenges across a wide spectrum of modal images. We present Multimodal Image Assistant (MIA) transformers, finely tunable structures adept at handling diverse feature matching problems. UFM exhibits versatility in addressing both feature matching tasks within the same modal and those across different modals. Additionally, we propose a data augmentation algorithm and a staged pre-training strategy to effectively tackle challenges arising from sparse data in specific modals and imbalanced modal datasets. Experimental results demonstrate that UFM excels in generalization and performance across various feature matching tasks. The code will be released at:this https URL.

Comments:	34 pages, 13 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Cite as:	arXiv:2503.21820 [cs.CV]
	(or arXiv:2503.21820v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2503.21820

Submission history

From: Yun Liao [view email]
[v1] Wed, 26 Mar 2025 06:20:52 UTC (31,574 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:UFM: Unified Feature Matching Pre-training with Multi-Modal Image Assistants

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:UFM: Unified Feature Matching Pre-training with Multi-Modal Image Assistants

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators