SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation

Cao, Jiale; Anwer, Rao Muhammad; Cholakkal, Hisham; Khan, Fahad Shahbaz; Pang, Yanwei; Shao, Ling

Computer Science > Computer Vision and Pattern Recognition

arXiv:2007.14772 (cs)

[Submitted on 29 Jul 2020]

Title:SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation

Authors:Jiale Cao, Rao Muhammad Anwer, Hisham Cholakkal, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao

View PDF

Abstract:Single-stage instance segmentation approaches have recently gained popularity due to their speed and simplicity, but are still lagging behind in accuracy, compared to two-stage methods. We propose a fast single-stage instance segmentation method, called SipMask, that preserves instance-specific spatial information by separating mask prediction of an instance to different sub-regions of a detected bounding-box. Our main contribution is a novel light-weight spatial preservation (SP) module that generates a separate set of spatial coefficients for each sub-region within a bounding-box, leading to improved mask predictions. It also enables accurate delineation of spatially adjacent instances. Further, we introduce a mask alignment weighting loss and a feature alignment scheme to better correlate mask prediction with object detection. On COCO test-dev, our SipMask outperforms the existing single-stage methods. Compared to the state-of-the-art single-stage TensorMask, SipMask obtains an absolute gain of 1.0% (mask AP), while providing a four-fold speedup. In terms of real-time capabilities, SipMask outperforms YOLACT with an absolute gain of 3.0% (mask AP) under similar settings, while operating at comparable speed on a Titan Xp. We also evaluate our SipMask for real-time video instance segmentation, achieving promising results on YouTube-VIS dataset. The source code is available at this https URL.

Comments:	ECCV2020; Code: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2007.14772 [cs.CV]
	(or arXiv:2007.14772v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2007.14772

Submission history

From: Jiale Cao [view email]
[v1] Wed, 29 Jul 2020 12:21:00 UTC (2,949 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators