OCRT: Boosting Foundation Models in the Open World with Object-Concept-Relation Triad

Tang, Luyao; Yuan, Yuxuan; Chen, Chaoqi; Zhang, Zeyu; Huang, Yue; Zhang, Kun

Computer Science > Computer Vision and Pattern Recognition

arXiv:2503.18695 (cs)

[Submitted on 24 Mar 2025]

Title:OCRT: Boosting Foundation Models in the Open World with Object-Concept-Relation Triad

Authors:Luyao Tang, Yuxuan Yuan, Chaoqi Chen, Zeyu Zhang, Yue Huang, Kun Zhang

View PDF HTML (experimental)

Abstract:Although foundation models (FMs) claim to be powerful, their generalization ability significantly decreases when faced with distribution shifts, weak supervision, or malicious attacks in the open world. On the other hand, most domain generalization or adversarial fine-tuning methods are task-related or model-specific, ignoring the universality in practical applications and the transferability between FMs. This paper delves into the problem of generalizing FMs to the out-of-domain data. We propose a novel framework, the Object-Concept-Relation Triad (OCRT), that enables FMs to extract sparse, high-level concepts and intricate relational structures from raw visual inputs. The key idea is to bind objects in visual scenes and a set of object-centric representations through unsupervised decoupling and iterative refinement. To be specific, we project the object-centric representations onto a semantic concept space that the model can readily interpret and estimate their importance to filter out irrelevant elements. Then, a concept-based graph, which has a flexible degree, is constructed to incorporate the set of concepts and their corresponding importance, enabling the extraction of high-order factors from informative concepts and facilitating relational reasoning among these concepts. Extensive experiments demonstrate that OCRT can substantially boost the generalizability and robustness of SAM and CLIP across multiple downstream tasks.

Comments:	Accepted by CVPR 2025
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2503.18695 [cs.CV]
	(or arXiv:2503.18695v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2503.18695

Submission history

From: Luyao Tang [view email]
[v1] Mon, 24 Mar 2025 14:04:17 UTC (5,785 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:OCRT: Boosting Foundation Models in the Open World with Object-Concept-Relation Triad

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:OCRT: Boosting Foundation Models in the Open World with Object-Concept-Relation Triad

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators