Semantics Disentanglement and Composition for Versatile Codec toward both Human-eye Perception and Machine Vision Task

Liu, Jinming; Wei, Yuntao; Lin, Junyan; Zhao, Shengyang; Sun, Heming; Chen, Zhibo; Zeng, Wenjun; Jin, Xin

Computer Science > Computer Vision and Pattern Recognition

arXiv:2412.18158 (cs)

[Submitted on 24 Dec 2024]

Title:Semantics Disentanglement and Composition for Versatile Codec toward both Human-eye Perception and Machine Vision Task

Authors:Jinming Liu, Yuntao Wei, Junyan Lin, Shengyang Zhao, Heming Sun, Zhibo Chen, Wenjun Zeng, Xin Jin

View PDF HTML (experimental)

Abstract:While learned image compression methods have achieved impressive results in either human visual perception or machine vision tasks, they are often specialized only for one domain. This drawback limits their versatility and generalizability across scenarios and also requires retraining to adapt to new applications-a process that adds significant complexity and cost in real-world scenarios. In this study, we introduce an innovative semantics DISentanglement and COmposition VERsatile codec (DISCOVER) to simultaneously enhance human-eye perception and machine vision tasks. The approach derives a set of labels per task through multimodal large models, which grounding models are then applied for precise localization, enabling a comprehensive understanding and disentanglement of image components at the encoder side. At the decoding stage, a comprehensive reconstruction of the image is achieved by leveraging these encoded components alongside priors from generative models, thereby optimizing performance for both human visual perception and machine-based analytical tasks. Extensive experimental evaluations substantiate the robustness and effectiveness of DISCOVER, demonstrating superior performance in fulfilling the dual objectives of human and machine vision requirements.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Cite as:	arXiv:2412.18158 [cs.CV]
	(or arXiv:2412.18158v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2412.18158

Submission history

From: Jinming Liu [view email]
[v1] Tue, 24 Dec 2024 04:32:36 UTC (2,435 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Semantics Disentanglement and Composition for Versatile Codec toward both Human-eye Perception and Machine Vision Task

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Semantics Disentanglement and Composition for Versatile Codec toward both Human-eye Perception and Machine Vision Task

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators