Chitranuvad: Adapting Multi-Lingual LLMs for Multimodal Translation

Khan, Shaharukh; Tarun, Ayush; Faraz, Ali; Kamble, Palash; Dahiya, Vivek; Pokala, Praveen; Kulkarni, Ashish; Khatri, Chandra; Ravi, Abhinav; Agarwal, Shubham

Computer Science > Computation and Language

arXiv:2502.20420 (cs)

[Submitted on 27 Feb 2025]

Title:Chitranuvad: Adapting Multi-Lingual LLMs for Multimodal Translation

Authors:Shaharukh Khan, Ayush Tarun, Ali Faraz, Palash Kamble, Vivek Dahiya, Praveen Pokala, Ashish Kulkarni, Chandra Khatri, Abhinav Ravi, Shubham Agarwal

View PDF HTML (experimental)

Abstract:In this work, we provide the system description of our submission as part of the English to Lowres Multimodal Translation Task at the Workshop on Asian Translation (WAT2024). We introduce Chitranuvad, a multimodal model that effectively integrates Multilingual LLM and a vision module for Multimodal Translation. Our method uses a ViT image encoder to extract visual representations as visual token embeddings which are projected to the LLM space by an adapter layer and generates translation in an autoregressive fashion. We participated in all the three tracks (Image Captioning, Text only and Multimodal translation tasks) for Indic languages (ie. English translation to Hindi, Bengali and Malyalam) and achieved SOTA results for Hindi in all of them on the Challenge set while remaining competitive for the other languages in the shared task.

Subjects:	Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2502.20420 [cs.CL]
	(or arXiv:2502.20420v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.20420
Journal reference:	https://aclanthology.org/2024.wmt-1.80/

Submission history

From: Abhinav Ravi [view email]
[v1] Thu, 27 Feb 2025 07:14:31 UTC (18,065 KB)

Computer Science > Computation and Language

Title:Chitranuvad: Adapting Multi-Lingual LLMs for Multimodal Translation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Chitranuvad: Adapting Multi-Lingual LLMs for Multimodal Translation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators