Object Counts! Bringing Explicit Detections Back into Image Captioning

Wang, Josiah; Madhyastha, Pranava; Specia, Lucia

Computer Science > Computer Vision and Pattern Recognition

arXiv:1805.00314 (cs)

[Submitted on 23 Apr 2018]

Title:Object Counts! Bringing Explicit Detections Back into Image Captioning

Authors:Josiah Wang, Pranava Madhyastha, Lucia Specia

View PDF

Abstract:The use of explicit object detectors as an intermediate step to image captioning - which used to constitute an essential stage in early work - is often bypassed in the currently dominant end-to-end approaches, where the language model is conditioned directly on a mid-level image embedding. We argue that explicit detections provide rich semantic information, and can thus be used as an interpretable representation to better understand why end-to-end image captioning systems work well. We provide an in-depth analysis of end-to-end image captioning by exploring a variety of cues that can be derived from such object detections. Our study reveals that end-to-end image captioning systems rely on matching image representations to generate captions, and that encoding the frequency, size and position of objects are complementary and all play a role in forming a good image representation. It also reveals that different object categories contribute in different ways towards image captioning.

Comments:	Please cite: In Proceedings of 2018 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2018)
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:1805.00314 [cs.CV]
	(or arXiv:1805.00314v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1805.00314

Submission history

From: Josiah Wang [view email]
[v1] Mon, 23 Apr 2018 14:51:46 UTC (4,676 KB)

Full-text links:

Access Paper:

view license

Current browse context:

< prev | next >

new | recent | 2018-05

Change to browse by:

cs.AI
cs.CL
cs.CV

References & Citations

DBLP - CS Bibliography

listing | bibtex

Josiah Wang
Pranava Swaroop Madhyastha
Lucia Specia

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Object Counts! Bringing Explicit Detections Back into Image Captioning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Object Counts! Bringing Explicit Detections Back into Image Captioning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators