Analysis of Convolutional Decoder for Image Caption Generation

Katiyar, Sulabh; Borgohain, Samir Kumar

Abstract:Recently Convolutional Neural Networks have been proposed for Sequence Modelling tasks such as Image Caption Generation. However, unlike Recurrent Neural Networks, the performance of Convolutional Neural Networks as Decoders for Image Caption Generation has not been extensively studied. In this work, we analyse various aspects of Convolutional Neural Network based Decoders such as Network complexity and depth, use of Data Augmentation, Attention mechanism, length of sentences used during training, etc on performance of the model. We perform experiments using Flickr8k and Flickr30k image captioning datasets and observe that unlike Recurrent Neural Network based Decoder, Convolutional Decoder for Image Captioning does not generally benefit from increase in network depth, in the form of stacked Convolutional Layers, and also the use of Data Augmentation techniques. In addition, use of Attention mechanism also provides limited performance gains with Convolutional Decoder. Furthermore, we observe that Convolutional Decoders show performance comparable with Recurrent Decoders only when trained using sentences of smaller length which contain up to 15 words but they have limitations when trained using higher sentence lengths which suggests that Convolutional Decoders may not be able to model long-term dependencies efficiently. In addition, the Convolutional Decoder usually performs poorly on CIDEr evaluation metric as compared to Recurrent Decoder.

Comments:	18 pages, to be published in Book Series: Advances in Intelligent Systems and Computing - ISSN 2194-5357
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2103.04914 [cs.CV]
	(or arXiv:2103.04914v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2103.04914

Computer Science > Computer Vision and Pattern Recognition

Title:Analysis of Convolutional Decoder for Image Caption Generation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators