SGC-VQGAN: Towards Complex Scene Representation via Semantic Guided Clustering Codebook

Ding, Chenjing; Wang, Chiyu; Liu, Boshi; Guo, Xi; Tang, Weixuan; Wu, Wei

Abstract:Vector quantization (VQ) is a method for deterministically learning features through discrete codebook representations. Recent works have utilized visual tokenizers to discretize visual regions for self-supervised representation learning. However, a notable limitation of these tokenizers is lack of semantics, as they are derived solely from the pretext task of reconstructing raw image pixels in an auto-encoder paradigm. Additionally, issues like imbalanced codebook distribution and codebook collapse can adversely impact performance due to inefficient codebook utilization. To address these challenges, We introduce SGC-VQGAN through Semantic Online Clustering method to enhance token semantics through Consistent Semantic Learning. Utilizing inference results from segmentation model , our approach constructs a temporospatially consistent semantic codebook, addressing issues of codebook collapse and imbalanced token semantics. Our proposed Pyramid Feature Learning pipeline integrates multi-level features to capture both image details and semantics simultaneously. As a result, SGC-VQGAN achieves SOTA performance in both reconstruction quality and various downstream tasks. Its simplicity, requiring no additional parameter learning, enables its direct application in downstream tasks, presenting significant potential.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2409.06105 [cs.CV]
	(or arXiv:2409.06105v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2409.06105

Computer Science > Computer Vision and Pattern Recognition

Title:SGC-VQGAN: Towards Complex Scene Representation via Semantic Guided Clustering Codebook

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators