MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining

Lyu, Pengyuan; Zhang, Chengquan; Liu, Shanshan; Qiao, Meina; Xu, Yangliu; Wu, Liang; Yao, Kun; Han, Junyu; Ding, Errui; Wang, Jingdong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2206.00311 (cs)

[Submitted on 1 Jun 2022 (v1), last revised 10 Oct 2023 (this version, v3)]

Title:MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining

Authors:Pengyuan Lyu, Chengquan Zhang, Shanshan Liu, Meina Qiao, Yangliu Xu, Liang Wu, Kun Yao, Junyu Han, Errui Ding, Jingdong Wang

View PDF

Abstract:Text images contain both visual and linguistic information. However, existing pre-training techniques for text recognition mainly focus on either visual representation learning or linguistic knowledge learning. In this paper, we propose a novel approach MaskOCR to unify vision and language pre-training in the classical encoder-decoder recognition framework. We adopt the masked image modeling approach to pre-train the feature encoder using a large set of unlabeled real text images, which allows us to learn strong visual representations. In contrast to introducing linguistic knowledge with an additional language model, we directly pre-train the sequence decoder. Specifically, we transform text data into synthesized text images to unify the data modalities of vision and language, and enhance the language modeling capability of the sequence decoder using a proposed masked image-language modeling scheme. Significantly, the encoder is frozen during the pre-training phase of the sequence decoder. Experimental results demonstrate that our proposed method achieves superior performance on benchmark datasets, including Chinese and English text images.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2206.00311 [cs.CV]
	(or arXiv:2206.00311v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2206.00311

Submission history

From: Pengyuan Lyu [view email]
[v1] Wed, 1 Jun 2022 08:27:19 UTC (665 KB)
[v2] Sun, 8 Oct 2023 07:08:00 UTC (380 KB)
[v3] Tue, 10 Oct 2023 03:06:45 UTC (380 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators