FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction

Fang, Irving; Shi, Kairui; He, Xujin; Tan, Siqi; Wang, Yifan; Zhao, Hanwen; Huang, Hung-Jui; Yuan, Wenzhen; Feng, Chen; Zhang, Jing

Computer Science > Robotics

arXiv:2410.08282 (cs)

[Submitted on 10 Oct 2024]

Title:FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction

Authors:Irving Fang, Kairui Shi, Xujin He, Siqi Tan, Yifan Wang, Hanwen Zhao, Hung-Jui Huang, Wenzhen Yuan, Chen Feng, Jing Zhang

View PDF HTML (experimental)

Abstract:Humans effortlessly integrate common-sense knowledge with sensory input from vision and touch to understand their surroundings. Emulating this capability, we introduce FusionSense, a novel 3D reconstruction framework that enables robots to fuse priors from foundation models with highly sparse observations from vision and tactile sensors. FusionSense addresses three key challenges: (i) How can robots efficiently acquire robust global shape information about the surrounding scene and objects? (ii) How can robots strategically select touch points on the object using geometric and common-sense priors? (iii) How can partial observations such as tactile signals improve the overall representation of the object? Our framework employs 3D Gaussian Splatting as a core representation and incorporates a hierarchical optimization strategy involving global structure construction, object visual hull pruning and local geometric constraints. This advancement results in fast and robust perception in environments with traditionally challenging objects that are transparent, reflective, or dark, enabling more downstream manipulation or navigation tasks. Experiments on real-world data suggest that our framework outperforms previously state-of-the-art sparse-view methods. All code and data are open-sourced on the project website.

Subjects:	Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
ACM classes:	I.4.5; I.4.8
Cite as:	arXiv:2410.08282 [cs.RO]
	(or arXiv:2410.08282v1 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2410.08282

Submission history

From: Irving Fang [view email]
[v1] Thu, 10 Oct 2024 18:07:07 UTC (1,877 KB)

Computer Science > Robotics

Title:FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators