Stable Diffusion is a Natural Cross-Modal Decoder for Layered AI-generated Image Compression

Chen, Ruijie; Mao, Qi; Cheng, Zhengxue

Electrical Engineering and Systems Science > Image and Video Processing

arXiv:2412.12982 (eess)

[Submitted on 17 Dec 2024]

Title:Stable Diffusion is a Natural Cross-Modal Decoder for Layered AI-generated Image Compression

Authors:Ruijie Chen, Qi Mao, Zhengxue Cheng

View PDF HTML (experimental)

Abstract:Recent advances in Artificial Intelligence Generated Content (AIGC) have garnered significant interest, accompanied by an increasing need to transmit and compress the vast number of AI-generated images (AIGIs). However, there is a noticeable deficiency in research focused on compression methods for AIGIs. To address this critical gap, we introduce a scalable cross-modal compression framework that incorporates multiple human-comprehensible modalities, designed to efficiently capture and relay essential visual information for AIGIs. In particular, our framework encodes images into a layered bitstream consisting of a semantic layer that delivers high-level semantic information through text prompts; a structural layer that captures spatial details using edge or skeleton maps; and a texture layer that preserves local textures via a colormap. Utilizing Stable Diffusion as the backend, the framework effectively leverages these multimodal priors for image generation, effectively functioning as a decoder when these priors are encoded. Qualitative and quantitative results show that our method proficiently restores both semantic and visual details, competing against baseline approaches at extremely low bitrates ( <0.02 bpp). Additionally, our framework facilitates downstream editing applications without requiring full decoding, thereby paving a new direction for future research in AIGI compression.

Subjects:	Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2412.12982 [eess.IV]
	(or arXiv:2412.12982v1 [eess.IV] for this version)
	https://doi.org/10.48550/arXiv.2412.12982

Submission history

From: Ruijie Chen [view email]
[v1] Tue, 17 Dec 2024 15:01:35 UTC (4,614 KB)

Electrical Engineering and Systems Science > Image and Video Processing

Title:Stable Diffusion is a Natural Cross-Modal Decoder for Layered AI-generated Image Compression

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Image and Video Processing

Title:Stable Diffusion is a Natural Cross-Modal Decoder for Layered AI-generated Image Compression

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators