TextDoctor: Unified Document Image Inpainting via Patch Pyramid Diffusion Models

Lu, Wanglong; Su, Lingming; Zheng, Jingjing; de Melo, Vinícius Veloso; Shoeleh, Farzaneh; Hawkin, John; Tricco, Terrence; Zhao, Hanli; Jiang, Xianta

Computer Science > Computer Vision and Pattern Recognition

arXiv:2503.04021 (cs)

[Submitted on 6 Mar 2025]

Title:TextDoctor: Unified Document Image Inpainting via Patch Pyramid Diffusion Models

Authors:Wanglong Lu, Lingming Su, Jingjing Zheng, Vinícius Veloso de Melo, Farzaneh Shoeleh, John Hawkin, Terrence Tricco, Hanli Zhao, Xianta Jiang

View PDF HTML (experimental)

Abstract:Digital versions of real-world text documents often suffer from issues like environmental corrosion of the original document, low-quality scanning, or human interference. Existing document restoration and inpainting methods typically struggle with generalizing to unseen document styles and handling high-resolution images. To address these challenges, we introduce TextDoctor, a novel unified document image inpainting method. Inspired by human reading behavior, TextDoctor restores fundamental text elements from patches and then applies diffusion models to entire document images instead of training models on specific document types. To handle varying text sizes and avoid out-of-memory issues, common in high-resolution documents, we propose using structure pyramid prediction and patch pyramid diffusion models. These techniques leverage multiscale inputs and pyramid patches to enhance the quality of inpainting both globally and locally. Extensive qualitative and quantitative experiments on seven public datasets validated that TextDoctor outperforms state-of-the-art methods in restoring various types of high-resolution document images.

Comments:	28 pages, 25 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
MSC classes:	68U10
ACM classes:	I.4.3; I.4.4; I.4.5; I.4.9
Cite as:	arXiv:2503.04021 [cs.CV]
	(or arXiv:2503.04021v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2503.04021

Submission history

From: Wanglong Lu [view email]
[v1] Thu, 6 Mar 2025 02:16:35 UTC (40,349 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:TextDoctor: Unified Document Image Inpainting via Patch Pyramid Diffusion Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:TextDoctor: Unified Document Image Inpainting via Patch Pyramid Diffusion Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators