Image Captioning using Multiple Transformers for Self-Attention Mechanism

Olimov, Farrukh; Dubey, Shikha; Shrestha, Labina; Tin, Tran Trung; Jeon, Moongu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2103.05103 (cs)

[Submitted on 14 Feb 2021]

Title:Image Captioning using Multiple Transformers for Self-Attention Mechanism

Authors:Farrukh Olimov, Shikha Dubey, Labina Shrestha, Tran Trung Tin, Moongu Jeon

View PDF

Abstract:Real-time image captioning, along with adequate precision, is the main challenge of this research field. The present work, Multiple Transformers for Self-Attention Mechanism (MTSM), utilizes multiple transformers to address these problems. The proposed algorithm, MTSM, acquires region proposals using a transformer detector (DETR). Consequently, MTSM achieves the self-attention mechanism by transferring these region proposals and their visual and geometrical features through another transformer and learns the objects' local and global interconnections. The qualitative and quantitative results of the proposed algorithm, MTSM, are shown on the MSCOCO dataset.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2103.05103 [cs.CV]
	(or arXiv:2103.05103v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2103.05103

Submission history

From: Shikha Dubey [view email]
[v1] Sun, 14 Feb 2021 05:35:54 UTC (553 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.AI

< prev | next >

new | recent | 2021-03

Change to browse by:

cs
cs.CL
cs.CV

References & Citations

DBLP - CS Bibliography

listing | bibtex

Moongu Jeon

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Image Captioning using Multiple Transformers for Self-Attention Mechanism

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Image Captioning using Multiple Transformers for Self-Attention Mechanism

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators