ViSketch-GPT: Collaborative Multi-Scale Feature Extraction for Sketch Recognition and Generation

Federico, Giulio; Amato, Giuseppe; Carrara, Fabio; Gennaro, Claudio; Di Benedetto, Marco

Computer Science > Computer Vision and Pattern Recognition

arXiv:2503.22374 (cs)

[Submitted on 28 Mar 2025]

Title:ViSketch-GPT: Collaborative Multi-Scale Feature Extraction for Sketch Recognition and Generation

Authors:Giulio Federico, Giuseppe Amato, Fabio Carrara, Claudio Gennaro, Marco Di Benedetto

View PDF HTML (experimental)

Abstract:Understanding the nature of human sketches is challenging because of the wide variation in how they are created. Recognizing complex structural patterns improves both the accuracy in recognizing sketches and the fidelity of the generated sketches. In this work, we introduce ViSketch-GPT, a novel algorithm designed to address these challenges through a multi-scale context extraction approach. The model captures intricate details at multiple scales and combines them using an ensemble-like mechanism, where the extracted features work collaboratively to enhance the recognition and generation of key details crucial for classification and generation tasks.
The effectiveness of ViSketch-GPT is validated through extensive experiments on the QuickDraw dataset. Our model establishes a new benchmark, significantly outperforming existing methods in both classification and generation tasks, with substantial improvements in accuracy and the fidelity of generated sketches.
The proposed algorithm offers a robust framework for understanding complex structures by extracting features that collaborate to recognize intricate details, enhancing the understanding of structures like sketches and making it a versatile tool for various applications in computer vision and machine learning.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2503.22374 [cs.CV]
	(or arXiv:2503.22374v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2503.22374

Submission history

From: Giulio Federico [view email]
[v1] Fri, 28 Mar 2025 12:28:30 UTC (10,113 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:ViSketch-GPT: Collaborative Multi-Scale Feature Extraction for Sketch Recognition and Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:ViSketch-GPT: Collaborative Multi-Scale Feature Extraction for Sketch Recognition and Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators