Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust Depth Prediction

Yin, Wei; Liu, Yifan; Shen, Chunhua

Computer Science > Computer Vision and Pattern Recognition

arXiv:2103.04216 (cs)

[Submitted on 7 Mar 2021 (v1), last revised 27 Jun 2021 (this version, v5)]

Title:Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust Depth Prediction

Authors:Wei Yin, Yifan Liu, Chunhua Shen

View PDF

Abstract:Monocular depth prediction plays a crucial role in understanding 3D scene geometry. Although recent methods have achieved impressive progress in terms of evaluation metrics such as the pixel-wise relative error, most methods neglect the geometric constraints in the 3D space. In this work, we show the importance of the high-order 3D geometric constraints for depth prediction. By designing a loss term that enforces a simple geometric constraint, namely, virtual normal directions determined by randomly sampled three points in the reconstructed 3D space, we significantly improve the accuracy and robustness of monocular depth estimation. Significantly, the virtual normal loss can not only improve the performance of learning metric depth, but also disentangle the scale information and enrich the model with better shape information. Therefore, when not having access to absolute metric depth training data, we can use virtual normal to learn a robust affine-invariant depth generated on diverse scenes. In experiments, We show state-of-the-art results of learning metric depth on NYU Depth-V2 and KITTI. From the high-quality predicted depth, we are now able to recover good 3D structures of the scene such as the point cloud and surface normal directly, eliminating the necessity of relying on additional models as was previously done. To demonstrate the excellent generalizability of learning affine-invariant depth on diverse data with the virtual normal loss, we construct a large-scale and diverse dataset for training affine-invariant depth, termed Diverse Scene Depth dataset (DiverseDepth), and test on five datasets with the zero-shot test setting. Code is available at: this https URL

Comments:	Fxied typos. Extended version of arXiv:1907.12209 Int. Conf. Comp. Vis. (ICCV) 2019. Code is available at: this https URL. arXiv admin note: substantial text overlap with arXiv:2002.00569
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2103.04216 [cs.CV]
	(or arXiv:2103.04216v5 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2103.04216

Submission history

From: Chunhua Shen [view email]
[v1] Sun, 7 Mar 2021 00:08:21 UTC (9,067 KB)
[v2] Tue, 9 Mar 2021 12:34:46 UTC (9,069 KB)
[v3] Tue, 13 Apr 2021 05:59:19 UTC (8,672 KB)
[v4] Sat, 17 Apr 2021 07:23:41 UTC (8,147 KB)
[v5] Sun, 27 Jun 2021 02:26:59 UTC (9,922 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust Depth Prediction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust Depth Prediction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators