Embodiment: Self-Supervised Depth Estimation Based on Camera Models

Zhang, Jinchang; Reddy, Praveen Kumar; Wong, Xue-Iuan; Aloimonos, Yiannis; Lu, Guoyu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2408.01565 (cs)

[Submitted on 2 Aug 2024 (v1), last revised 29 Aug 2024 (this version, v2)]

Title:Embodiment: Self-Supervised Depth Estimation Based on Camera Models

Authors:Jinchang Zhang, Praveen Kumar Reddy, Xue-Iuan Wong, Yiannis Aloimonos, Guoyu Lu

View PDF HTML (experimental)

Abstract:Depth estimation is a critical topic for robotics and vision-related tasks. In monocular depth estimation, in comparison with supervised learning that requires expensive ground truth labeling, self-supervised methods possess great potential due to no labeling cost. However, self-supervised learning still has a large gap with supervised learning in 3D reconstruction and depth estimation performance. Meanwhile, scaling is also a major issue for monocular unsupervised depth estimation, which commonly still needs ground truth scale from GPS, LiDAR, or existing maps to correct. In the era of deep learning, existing methods primarily rely on exploring image relationships to train unsupervised neural networks, while the physical properties of the camera itself such as intrinsics and extrinsics are often overlooked. These physical properties are not just mathematical parameters; they are embodiments of the camera's interaction with the physical world. By embedding these physical properties into the deep learning model, we can calculate depth priors for ground regions and regions connected to the ground based on physical principles, providing free supervision signals without the need for additional sensors. This approach is not only easy to implement but also enhances the effects of all unsupervised methods by embedding the camera's physical properties into the model, thereby achieving an embodied understanding of the real world.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2408.01565 [cs.CV]
	(or arXiv:2408.01565v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2408.01565

Submission history

From: Jinchang Zhang [view email]
[v1] Fri, 2 Aug 2024 20:40:19 UTC (9,011 KB)
[v2] Thu, 29 Aug 2024 01:32:17 UTC (9,012 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Embodiment: Self-Supervised Depth Estimation Based on Camera Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Embodiment: Self-Supervised Depth Estimation Based on Camera Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators