BFA: Best-Feature-Aware Fusion for Multi-View Fine-grained Manipulation

Lan, Zihan; Mao, Weixin; Li, Haosheng; Wang, Le; Wang, Tiancai; Fan, Haoqiang; Yoshie, Osamu

Computer Science > Robotics

arXiv:2502.11161 (cs)

[Submitted on 16 Feb 2025 (v1), last revised 19 Feb 2025 (this version, v2)]

Title:BFA: Best-Feature-Aware Fusion for Multi-View Fine-grained Manipulation

Authors:Zihan Lan, Weixin Mao, Haosheng Li, Le Wang, Tiancai Wang, Haoqiang Fan, Osamu Yoshie

View PDF HTML (experimental)

Abstract:In real-world scenarios, multi-view cameras are typically employed for fine-grained manipulation tasks. Existing approaches (e.g., ACT) tend to treat multi-view features equally and directly concatenate them for policy learning. However, it will introduce redundant visual information and bring higher computational costs, leading to ineffective manipulation. For a fine-grained manipulation task, it tends to involve multiple stages while the most contributed view for different stages is varied over time. In this paper, we propose a plug-and-play best-feature-aware (BFA) fusion strategy for multi-view manipulation tasks, which is adaptable to various policies. Built upon the visual backbone of the policy network, we design a lightweight network to predict the importance score of each view. Based on the predicted importance scores, the reweighted multi-view features are subsequently fused and input into the end-to-end policy network, enabling seamless integration. Notably, our method demonstrates outstanding performance in fine-grained manipulations. Experimental results show that our approach outperforms multiple baselines by 22-46% success rate on different tasks. Our work provides new insights and inspiration for tackling key challenges in fine-grained manipulations.

Comments:	8 pages, 4 figures
Subjects:	Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2502.11161 [cs.RO]
	(or arXiv:2502.11161v2 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2502.11161

Submission history

From: Haosheng Li [view email]
[v1] Sun, 16 Feb 2025 15:26:21 UTC (16,536 KB)
[v2] Wed, 19 Feb 2025 07:10:06 UTC (16,536 KB)

Computer Science > Robotics

Title:BFA: Best-Feature-Aware Fusion for Multi-View Fine-grained Manipulation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:BFA: Best-Feature-Aware Fusion for Multi-View Fine-grained Manipulation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators