REMOTE: Real-time Ego-motion Tracking for Various Endoscopes via Multimodal Visual Feature Learning

Shao, Liangjing; Chen, Benshuang; Zhao, Shuting; Chen, Xinrong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.18124 (cs)

[Submitted on 30 Jan 2025 (v1), last revised 2 Feb 2025 (this version, v2)]

Title:REMOTE: Real-time Ego-motion Tracking for Various Endoscopes via Multimodal Visual Feature Learning

Authors:Liangjing Shao, Benshuang Chen, Shuting Zhao, Xinrong Chen

View PDF HTML (experimental)

Abstract:Real-time ego-motion tracking for endoscope is a significant task for efficient navigation and robotic automation of endoscopy. In this paper, a novel framework is proposed to perform real-time ego-motion tracking for endoscope. Firstly, a multi-modal visual feature learning network is proposed to perform relative pose prediction, in which the motion feature from the optical flow, the scene features and the joint feature from two adjacent observations are all extracted for prediction. Due to more correlation information in the channel dimension of the concatenated image, a novel feature extractor is designed based on an attention mechanism to integrate multi-dimensional information from the concatenation of two continuous frames. To extract more complete feature representation from the fused features, a novel pose decoder is proposed to predict the pose transformation from the concatenated feature map at the end of the framework. At last, the absolute pose of endoscope is calculated based on relative poses. The experiment is conducted on three datasets of various endoscopic scenes and the results demonstrate that the proposed method outperforms state-of-the-art methods. Besides, the inference speed of the proposed method is over 30 frames per second, which meets the real-time requirement. The project page is here: this http URL

Comments:	Accepted by ICRA 2025
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2501.18124 [cs.CV]
	(or arXiv:2501.18124v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2501.18124

Submission history

From: Liangjing Shao [view email]
[v1] Thu, 30 Jan 2025 03:58:41 UTC (5,268 KB)
[v2] Sun, 2 Feb 2025 14:32:01 UTC (5,268 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:REMOTE: Real-time Ego-motion Tracking for Various Endoscopes via Multimodal Visual Feature Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:REMOTE: Real-time Ego-motion Tracking for Various Endoscopes via Multimodal Visual Feature Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators