ACE++: Instruction-Based Image Creation and Editing via Context-Aware Content Filling

Mao, Chaojie; Zhang, Jingfeng; Pan, Yulin; Jiang, Zeyinzi; Han, Zhen; Liu, Yu; Zhou, Jingren

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.02487 (cs)

[Submitted on 5 Jan 2025 (v1), last revised 7 Jan 2025 (this version, v2)]

Title:ACE++: Instruction-Based Image Creation and Editing via Context-Aware Content Filling

Authors:Chaojie Mao, Jingfeng Zhang, Yulin Pan, Zeyinzi Jiang, Zhen Han, Yu Liu, Jingren Zhou

View PDF HTML (experimental)

Abstract:We report ACE++, an instruction-based diffusion framework that tackles various image generation and editing tasks. Inspired by the input format for the inpainting task proposed by FLUX.1-Fill-dev, we improve the Long-context Condition Unit (LCU) introduced in ACE and extend this input paradigm to any editing and generation tasks. To take full advantage of image generative priors, we develop a two-stage training scheme to minimize the efforts of finetuning powerful text-to-image diffusion models like FLUX.1-dev. In the first stage, we pre-train the model using task data with the 0-ref tasks from the text-to-image model. There are many models in the community based on the post-training of text-to-image foundational models that meet this training paradigm of the first stage. For example, FLUX.1-Fill-dev deals primarily with painting tasks and can be used as an initialization to accelerate the training process. In the second stage, we finetune the above model to support the general instructions using all tasks defined in ACE. To promote the widespread application of ACE++ in different scenarios, we provide a comprehensive set of models that cover both full finetuning and lightweight finetuning, while considering general applicability and applicability in vertical scenarios. The qualitative analysis showcases the superiority of ACE++ in terms of generating image quality and prompt following ability. Code and models will be available on the project page: https://ali-vilab. this http URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2501.02487 [cs.CV]
	(or arXiv:2501.02487v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2501.02487

Submission history

From: Chaojie Mao [view email]
[v1] Sun, 5 Jan 2025 09:40:58 UTC (5,804 KB)
[v2] Tue, 7 Jan 2025 08:47:34 UTC (6,653 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:ACE++: Instruction-Based Image Creation and Editing via Context-Aware Content Filling

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:ACE++: Instruction-Based Image Creation and Editing via Context-Aware Content Filling

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators