Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer

Kittenplon, Yair; Lavi, Inbal; Fogel, Sharon; Bar, Yarin; Manmatha, R.; Perona, Pietro

Computer Science > Computer Vision and Pattern Recognition

arXiv:2202.05508v1 (cs)

[Submitted on 11 Feb 2022 (this version), latest version 14 Feb 2022 (v2)]

Title:Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer

Authors:Yair Kittenplon, Inbal Lavi, Sharon Fogel, Yarin Bar, R. Manmatha, Pietro Perona

View PDF

Abstract:Text spotting end-to-end methods have recently gained attention in the literature due to the benefits of jointly optimizing the text detection and recognition components. Existing methods usually have a distinct separation between the detection and recognition branches, requiring exact annotations for the two tasks. We introduce TextTranSpotter (TTS), a transformer-based approach for text spotting and the first text spotting framework which may be trained with both fully- and weakly-supervised settings. By learning a single latent representation per word detection, and using a novel loss function based on the Hungarian loss, our method alleviates the need for expensive localization annotations. Trained with only text transcription annotations on real data, our weakly-supervised method achieves competitive performance with previous state-of-the-art fully-supervised methods. When trained in a fully-supervised manner, TextTranSpotter shows state-of-the-art results on multiple benchmarks \footnote {Our code will be publicly available upon publication.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2202.05508 [cs.CV]
	(or arXiv:2202.05508v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2202.05508

Submission history

From: Yair Kittenplon [view email]
[v1] Fri, 11 Feb 2022 08:50:09 UTC (12,107 KB)
[v2] Mon, 14 Feb 2022 05:55:25 UTC (12,107 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators