Joint Learning of Salient Object Detection, Depth Estimation and Contour Extraction

Zhao, Xiaoqi; Pang, Youwei; Zhang, Lihe; Lu, Huchuan

doi:10.1109/TIP.2022.3222641

Computer Science > Computer Vision and Pattern Recognition

arXiv:2203.04895 (cs)

[Submitted on 9 Mar 2022 (v1), last revised 8 Nov 2022 (this version, v2)]

Title:Joint Learning of Salient Object Detection, Depth Estimation and Contour Extraction

Authors:Xiaoqi Zhao, Youwei Pang, Lihe Zhang, Huchuan Lu

View PDF

Abstract:Benefiting from color independence, illumination invariance and location discrimination attributed by the depth map, it can provide important supplemental information for extracting salient objects in complex environments. However, high-quality depth sensors are expensive and can not be widely applied. While general depth sensors produce the noisy and sparse depth information, which brings the depth-based networks with irreversible interference. In this paper, we propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD). Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks. In this way, the depth information can be completed and purified. Moreover, we introduce a multi-modal filtered transformer (MFT) module, which equips with three modality-specific filters to generate the transformer-enhanced feature for each modality. The proposed model works in a depth-free style during the testing phase. Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time. And, the resulted depth map can help existing RGB-D SOD methods obtain significant performance gain. The source code will be publicly available at this https URL.

Comments:	Accepted by IEEE TIP
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2203.04895 [cs.CV]
	(or arXiv:2203.04895v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2203.04895
Related DOI:	https://doi.org/10.1109/TIP.2022.3222641

Submission history

From: Xiaoqi Zhao [view email]
[v1] Wed, 9 Mar 2022 17:20:18 UTC (22,415 KB)
[v2] Tue, 8 Nov 2022 02:39:23 UTC (47,801 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Joint Learning of Salient Object Detection, Depth Estimation and Contour Extraction

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Joint Learning of Salient Object Detection, Depth Estimation and Contour Extraction

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators