Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations

Huang, Po-Yao; Chang, Xiaojun; Hauptmann, Alexander

Computer Science > Computation and Language

arXiv:1910.00058 (cs)

[Submitted on 30 Sep 2019]

Title:Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations

Authors:Po-Yao Huang, Xiaojun Chang, Alexander Hauptmann

View PDF

Abstract:With the aim of promoting and understanding the multilingual version of image search, we leverage visual object detection and propose a model with diverse multi-head attention to learn grounded multilingual multimodal representations. Specifically, our model attends to different types of textual semantics in two languages and visual objects for fine-grained alignments between sentences and images. We introduce a new objective function which explicitly encourages attention diversity to learn an improved visual-semantic embedding space. We evaluate our model in the German-Image and English-Image matching tasks on the Multi30K dataset, and in the Semantic Textual Similarity task with the English descriptions of visual content. Results show that our model yields a significant performance gain over other methods in all of the three tasks.

Comments:	Accepted at EMNLP 2019
Subjects:	Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1910.00058 [cs.CL]
	(or arXiv:1910.00058v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1910.00058

Submission history

From: Po-Yao Huang [view email]
[v1] Mon, 30 Sep 2019 18:58:03 UTC (3,998 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2019-10

Change to browse by:

cs
cs.CV

References & Citations

DBLP - CS Bibliography

listing | bibtex

Xiaojun Chang
Alexander G. Hauptmann

export BibTeX citation

Computer Science > Computation and Language

Title:Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators