Computer Science > Computer Vision and Pattern Recognition
[Submitted on 3 Jun 2015 (v1), revised 6 Jun 2015 (this version, v2), latest version 28 Apr 2016 (v6)]
Title:Image Captioning with an Intermediate Attributes Layer
View PDFAbstract:Many recent studies in image captioning rely on an architecture which learns the mapping from images to sentences in an end-to-end fashion. However, generating an accurate and complete description requires identifying all entities, their mutual inter- actions and the context of the image. In this work, we show that an intermediate image-to-attributes layer can dramatically improve captioning results over the current approach which directly connects an RNN to a CNN. We propose a two-stage pro- cedure for training such an attribute-based approach: in the first stage, we mine a number of keywords from the training sentences which we use as semantic attributes for images, and learn the mapping from images to those attributes with a CNN; in the second stage, we learn the mapping from detected attribute occurrence likelihoods to sentence description using LSTM. We then demonstrate the effectiveness of our two-stage model with captioning experiments on three benchmark datasets, which are Flickr8k, Flickr30K and MS COCO.
Submission history
From: Chunhua Shen [view email][v1] Wed, 3 Jun 2015 07:06:11 UTC (3,079 KB)
[v2] Sat, 6 Jun 2015 03:16:41 UTC (3,079 KB)
[v3] Sun, 4 Oct 2015 03:41:04 UTC (6,567 KB)
[v4] Fri, 9 Oct 2015 03:21:37 UTC (6,567 KB)
[v5] Thu, 14 Apr 2016 08:05:03 UTC (331 KB)
[v6] Thu, 28 Apr 2016 04:59:36 UTC (331 KB)
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.