Language-Driven Region Pointer Advancement for Controllable Image Captioning

Lindh, Annika; Ross, Robert J.; Kelleher, John D.

Computer Science > Computation and Language

arXiv:2011.14901 (cs)

[Submitted on 30 Nov 2020]

Title:Language-Driven Region Pointer Advancement for Controllable Image Captioning

Authors:Annika Lindh, Robert J. Ross, John D. Kelleher

View PDF

Abstract:Controllable Image Captioning is a recent sub-field in the multi-modal task of Image Captioning wherein constraints are placed on which regions in an image should be described in the generated natural language caption. This puts a stronger focus on producing more detailed descriptions, and opens the door for more end-user control over results. A vital component of the Controllable Image Captioning architecture is the mechanism that decides the timing of attending to each region through the advancement of a region pointer. In this paper, we propose a novel method for predicting the timing of region pointer advancement by treating the advancement step as a natural part of the language structure via a NEXT-token, motivated by a strong correlation to the sentence structure in the training data. We find that our timing agrees with the ground-truth timing in the Flickr30k Entities test data with a precision of 86.55% and a recall of 97.92%. Our model implementing this technique improves the state-of-the-art on standard captioning metrics while additionally demonstrating a considerably larger effective vocabulary size.

Comments:	Accepted to COLING 2020
Subjects:	Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
MSC classes:	68T07, 68T45, 68T50
ACM classes:	I.2.7; I.2.10; I.5.1
Cite as:	arXiv:2011.14901 [cs.CL]
	(or arXiv:2011.14901v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2011.14901

Submission history

From: Annika Lindh [view email]
[v1] Mon, 30 Nov 2020 15:34:59 UTC (9,150 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2020-11

Change to browse by:

cs
cs.CV
cs.LG
cs.NE

References & Citations

DBLP - CS Bibliography

listing | bibtex

Annika Lindh
Robert J. Ross
John D. Kelleher

export BibTeX citation

Computer Science > Computation and Language

Title:Language-Driven Region Pointer Advancement for Controllable Image Captioning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Language-Driven Region Pointer Advancement for Controllable Image Captioning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators