Associating Spatially-Consistent Grouping with Text-supervised Semantic Segmentation

Zhang, Yabo; Wang, Zihao; Liew, Jun Hao; Huang, Jingjia; Zhu, Manyu; Feng, Jiashi; Zuo, Wangmeng

Computer Science > Computer Vision and Pattern Recognition

arXiv:2304.01114 (cs)

[Submitted on 3 Apr 2023]

Title:Associating Spatially-Consistent Grouping with Text-supervised Semantic Segmentation

Authors:Yabo Zhang, Zihao Wang, Jun Hao Liew, Jingjia Huang, Manyu Zhu, Jiashi Feng, Wangmeng Zuo

View PDF

Abstract:In this work, we investigate performing semantic segmentation solely through the training on image-sentence pairs. Due to the lack of dense annotations, existing text-supervised methods can only learn to group an image into semantic regions via pixel-insensitive feedback. As a result, their grouped results are coarse and often contain small spurious regions, limiting the upper-bound performance of segmentation. On the other hand, we observe that grouped results from self-supervised models are more semantically consistent and break the bottleneck of existing methods. Motivated by this, we introduce associate self-supervised spatially-consistent grouping with text-supervised semantic segmentation. Considering the part-like grouped results, we further adapt a text-supervised model from image-level to region-level recognition with two core designs. First, we encourage fine-grained alignment with a one-way noun-to-region contrastive loss, which reduces the mismatched noun-region pairs. Second, we adopt a contextually aware masking strategy to enable simultaneous recognition of all grouped regions. Coupled with spatially-consistent grouping and region-adapted recognition, our method achieves 59.2% mIoU and 32.4% mIoU on Pascal VOC and Pascal Context benchmarks, significantly surpassing the state-of-the-art methods.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2304.01114 [cs.CV]
	(or arXiv:2304.01114v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2304.01114

Submission history

From: Yabo Zhang [view email]
[v1] Mon, 3 Apr 2023 16:24:39 UTC (8,478 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Associating Spatially-Consistent Grouping with Text-supervised Semantic Segmentation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Associating Spatially-Consistent Grouping with Text-supervised Semantic Segmentation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators