OCR-VQGAN: Taming Text-within-Image Generation

Rodriguez, Juan A.; Vazquez, David; Laradji, Issam; Pedersoli, Marco; Rodriguez, Pau

Computer Science > Computer Vision and Pattern Recognition

arXiv:2210.11248 (cs)

[Submitted on 19 Oct 2022 (v1), last revised 21 Oct 2022 (this version, v2)]

Title:OCR-VQGAN: Taming Text-within-Image Generation

Authors:Juan A. Rodriguez, David Vazquez, Issam Laradji, Marco Pedersoli, Pau Rodriguez

View PDF

Abstract:Synthetic image generation has recently experienced significant improvements in domains such as natural image or art generation. However, the problem of figure and diagram generation remains unexplored. A challenging aspect of generating figures and diagrams is effectively rendering readable texts within the images. To alleviate this problem, we present OCR-VQGAN, an image encoder, and decoder that leverages OCR pre-trained features to optimize a text perceptual loss, encouraging the architecture to preserve high-fidelity text and diagram structure. To explore our approach, we introduce the Paper2Fig100k dataset, with over 100k images of figures and texts from research papers. The figures show architecture diagrams and methodologies of articles available at arXiv.org from fields like artificial intelligence and computer vision. Figures usually include text and discrete objects, e.g., boxes in a diagram, with lines and arrows that connect them. We demonstrate the effectiveness of OCR-VQGAN by conducting several experiments on the task of figure reconstruction. Additionally, we explore the qualitative and quantitative impact of weighting different perceptual metrics in the overall loss function. We release code, models, and dataset at this https URL.

Comments:	Paper accepted at WACV 2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2210.11248 [cs.CV]
	(or arXiv:2210.11248v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2210.11248

Submission history

From: Juan A. Rodriguez [view email]
[v1] Wed, 19 Oct 2022 16:37:48 UTC (45,327 KB)
[v2] Fri, 21 Oct 2022 18:32:27 UTC (45,327 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:OCR-VQGAN: Taming Text-within-Image Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:OCR-VQGAN: Taming Text-within-Image Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators