Classifying the Unknown: In-Context Learning for Open-Vocabulary Text and Symbol Recognition

Simon, Tom; Mocaer, William; Tranouez, Pierrick; Chatelain, Clement; Paquet, Thierry

Computer Science > Computer Vision and Pattern Recognition

arXiv:2504.06841 (cs)

[Submitted on 9 Apr 2025]

Title:Classifying the Unknown: In-Context Learning for Open-Vocabulary Text and Symbol Recognition

Authors:Tom Simon, William Mocaer, Pierrick Tranouez, Clement Chatelain, Thierry Paquet

View PDF HTML (experimental)

Abstract:We introduce Rosetta, a multimodal model that leverages Multimodal In-Context Learning (MICL) to classify sequences of novel script patterns in documents by leveraging minimal examples, thus eliminating the need for explicit retraining. To enhance contextual learning, we designed a dataset generation process that ensures varying degrees of contextual informativeness, improving the model's adaptability in leveraging context across different scenarios. A key strength of our method is the use of a Context-Aware Tokenizer (CAT), which enables open-vocabulary classification. This allows the model to classify text and symbol patterns across an unlimited range of classes, extending its classification capabilities beyond the scope of its training alphabet of patterns. As a result, it unlocks applications such as the recognition of new alphabets and languages. Experiments on synthetic datasets demonstrate the potential of Rosetta to successfully classify Out-Of-Distribution visual patterns and diverse sets of alphabets and scripts, including but not limited to Chinese, Greek, Russian, French, Spanish, and Japanese.

Comments:	Submitted to ICDAR 2025
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2504.06841 [cs.CV]
	(or arXiv:2504.06841v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2504.06841

Submission history

From: Tom Simon [view email]
[v1] Wed, 9 Apr 2025 12:58:25 UTC (9,363 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Classifying the Unknown: In-Context Learning for Open-Vocabulary Text and Symbol Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Classifying the Unknown: In-Context Learning for Open-Vocabulary Text and Symbol Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators