Self-supervised Pretraining and Finetuning for Monocular Depth and Visual Odometry

Chidlovskii, Boris; Antsfeld, Leonid

Computer Science > Computer Vision and Pattern Recognition

arXiv:2406.11019 (cs)

[Submitted on 16 Jun 2024]

Title:Self-supervised Pretraining and Finetuning for Monocular Depth and Visual Odometry

Authors:Boris Chidlovskii, Leonid Antsfeld

View PDF HTML (experimental)

Abstract:For the task of simultaneous monocular depth and visual odometry estimation, we propose learning self-supervised transformer-based models in two steps. Our first step consists in a generic pretraining to learn 3D geometry, using cross-view completion objective (CroCo), followed by self-supervised finetuning on non-annotated videos. We show that our self-supervised models can reach state-of-the-art performance 'without bells and whistles' using standard components such as visual transformers, dense prediction transformers and adapters. We demonstrate the effectiveness of our proposed method by running evaluations on six benchmark datasets, both static and dynamic, indoor and outdoor, with synthetic and real images. For all datasets, our method outperforms state-of-the-art methods, in particular for depth prediction task.

Comments:	8 pages, to appear in ICRA'24
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2406.11019 [cs.CV]
	(or arXiv:2406.11019v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2406.11019

Submission history

From: Boris Chidlovskii [view email]
[v1] Sun, 16 Jun 2024 17:24:20 UTC (9,170 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2024-06

Change to browse by:

References & Citations

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Self-supervised Pretraining and Finetuning for Monocular Depth and Visual Odometry

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Self-supervised Pretraining and Finetuning for Monocular Depth and Visual Odometry

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators