Block-level Text Spotting with LLMs

Bannur, Ganesh; Amrutur, Bharadwaj

Computer Science > Computer Vision and Pattern Recognition

arXiv:2406.13208 (cs)

[Submitted on 19 Jun 2024]

Title:Block-level Text Spotting with LLMs

Authors:Ganesh Bannur, Bharadwaj Amrutur

View PDF HTML (experimental)

Abstract:Text spotting has seen tremendous progress in recent years yielding performant techniques which can extract text at the character, word or line level. However, extracting blocks of text from images (block-level text spotting) is relatively unexplored. Blocks contain more context than individual lines, words or characters and so block-level text spotting would enhance downstream applications, such as translation, which benefit from added context. We propose a novel method, BTS-LLM (Block-level Text Spotting with LLMs), to identify text at the block level. BTS-LLM has three parts: 1) detecting and recognizing text at the line level, 2) grouping lines into blocks and 3) finding the best order of lines within a block using a large language model (LLM). We aim to exploit the strong semantic knowledge in LLMs for accurate block-level text spotting. Consequently if the text spotted is semantically meaningful but has been corrupted during text recognition, the LLM is also able to rectify mistakes in the text and produce a reconstruction of it.

Comments:	19 pages, 7 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2406.13208 [cs.CV]
	(or arXiv:2406.13208v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2406.13208

Submission history

From: Ganesh Bannur [view email]
[v1] Wed, 19 Jun 2024 04:37:38 UTC (1,731 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Block-level Text Spotting with LLMs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Block-level Text Spotting with LLMs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators