SketchFlex: Facilitating Spatial-Semantic Coherence in Text-to-Image Generation with Region-Based Sketches

Lin, Haichuan; Ye, Yilin; Xia, Jiazhi; Zeng, Wei

Computer Science > Human-Computer Interaction

arXiv:2502.07556 (cs)

[Submitted on 11 Feb 2025]

Title:SketchFlex: Facilitating Spatial-Semantic Coherence in Text-to-Image Generation with Region-Based Sketches

Authors:Haichuan Lin, Yilin Ye, Jiazhi Xia, Wei Zeng

View PDF HTML (experimental)

Abstract:Text-to-image models can generate visually appealing images from text descriptions. Efforts have been devoted to improving model controls with prompt tuning and spatial conditioning. However, our formative study highlights the challenges for non-expert users in crafting appropriate prompts and specifying fine-grained spatial conditions (e.g., depth or canny references) to generate semantically cohesive images, especially when multiple objects are involved. In response, we introduce SketchFlex, an interactive system designed to improve the flexibility of spatially conditioned image generation using rough region sketches. The system automatically infers user prompts with rational descriptions within a semantic space enriched by crowd-sourced object attributes and relationships. Additionally, SketchFlex refines users' rough sketches into canny-based shape anchors, ensuring the generation quality and alignment of user intentions. Experimental results demonstrate that SketchFlex achieves more cohesive image generations than end-to-end models, meanwhile significantly reducing cognitive load and better matching user intentions compared to region-based generation baseline.

Comments:	conference: CHI2025
Subjects:	Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2502.07556 [cs.HC]
	(or arXiv:2502.07556v1 [cs.HC] for this version)
	https://doi.org/10.48550/arXiv.2502.07556

Submission history

From: Haichuan Lin [view email]
[v1] Tue, 11 Feb 2025 13:48:11 UTC (28,507 KB)

Computer Science > Human-Computer Interaction

Title:SketchFlex: Facilitating Spatial-Semantic Coherence in Text-to-Image Generation with Region-Based Sketches

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Human-Computer Interaction

Title:SketchFlex: Facilitating Spatial-Semantic Coherence in Text-to-Image Generation with Region-Based Sketches

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators