Image to Language Understanding: Captioning approach

Seshadri, Madhavan; Srikanth, Malavika; Belov, Mikhail

Computer Science > Computer Vision and Pattern Recognition

arXiv:2002.09536 (cs)

[Submitted on 21 Feb 2020]

Title:Image to Language Understanding: Captioning approach

Authors:Madhavan Seshadri, Malavika Srikanth, Mikhail Belov

View PDF

Abstract:Extracting context from visual representations is of utmost importance in the advancement of Computer Science. Representation of such a format in Natural Language has a huge variety of applications such as helping the visually impaired etc. Such an approach is a combination of Computer Vision and Natural Language techniques which is a hard problem to solve. This project aims to compare different approaches for solving the image captioning problem. In specific, the focus was on comparing two different types of models: Encoder-Decoder approach and a Multi-model approach. In the encoder-decoder approach, inject and merge architectures were compared against a multi-modal image captioning approach based primarily on object detection. These approaches have been compared on the basis on state of the art sentence comparison metrics such as BLEU, GLEU, Meteor, and Rouge on a subset of the Google Conceptual captions dataset which contains 100k images. On the basis of this comparison, we observed that the best model was the Inception injected encoder model. This best approach has been deployed as a web-based system. On uploading an image, such a system will output the best caption associated with the image.

Comments:	8 pages
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2002.09536 [cs.CV]
	(or arXiv:2002.09536v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2002.09536

Submission history

From: Madhavan Seshadri [view email]
[v1] Fri, 21 Feb 2020 20:15:33 UTC (2,866 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Image to Language Understanding: Captioning approach

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Image to Language Understanding: Captioning approach

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators