Multi-Task Domain Adaptation for Language Grounding with 3D Objects

Sun, Penglei; Song, Yaoxian; Pan, Xinglin; Dong, Peijie; Yang, Xiaofei; Wang, Qiang; Li, Zhixu; Li, Tiefeng; Chu, Xiaowen

Computer Science > Computer Vision and Pattern Recognition

arXiv:2407.02846v2 (cs)

[Submitted on 3 Jul 2024 (v1), last revised 5 Jul 2024 (this version, v2)]

Title:Multi-Task Domain Adaptation for Language Grounding with 3D Objects

Authors:Penglei Sun, Yaoxian Song, Xinglin Pan, Peijie Dong, Xiaofei Yang, Qiang Wang, Zhixu Li, Tiefeng Li, Xiaowen Chu

View PDF HTML (experimental)

Abstract:The existing works on object-level language grounding with 3D objects mostly focus on improving performance by utilizing the off-the-shelf pre-trained models to capture features, such as viewpoint selection or geometric priors. However, they have failed to consider exploring the cross-modal representation of language-vision alignment in the cross-domain field. To answer this problem, we propose a novel method called Domain Adaptation for Language Grounding (DA4LG) with 3D objects. Specifically, the proposed DA4LG consists of a visual adapter module with multi-task learning to realize vision-language alignment by comprehensive multimodal feature representation. Experimental results demonstrate that DA4LG competitively performs across visual and non-visual language descriptions, independent of the completeness of observation. DA4LG achieves state-of-the-art performance in the single-view setting and multi-view setting with the accuracy of 83.8% and 86.8% respectively in the language grounding benchmark SNARE. The simulation experiments show the well-practical and generalized performance of DA4LG compared to the existing methods. Our project is available at this https URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2407.02846 [cs.CV]
	(or arXiv:2407.02846v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2407.02846

Submission history

From: Penglei Sun [view email]
[v1] Wed, 3 Jul 2024 06:47:58 UTC (1,038 KB)
[v2] Fri, 5 Jul 2024 08:10:49 UTC (1,039 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Multi-Task Domain Adaptation for Language Grounding with 3D Objects

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Multi-Task Domain Adaptation for Language Grounding with 3D Objects

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators