Full-Network Embedding in a Multimodal Embedding Pipeline

Vilalta, Armand; Garcia-Gasulla, Dario; Parés, Ferran; Ayguadé, Eduard; Labarta, Jesus; Cortés, Ulises; Suzumura, Toyotaro

Computer Science > Computer Vision and Pattern Recognition

arXiv:1707.09872 (cs)

[Submitted on 24 Jul 2017 (v1), last revised 9 Aug 2017 (this version, v2)]

Title:Full-Network Embedding in a Multimodal Embedding Pipeline

Authors:Armand Vilalta, Dario Garcia-Gasulla, Ferran Parés, Eduard Ayguadé, Jesus Labarta, Ulises Cortés, Toyotaro Suzumura

View PDF

Abstract:The current state-of-the-art for image annotation and image retrieval tasks is obtained through deep neural networks, which combine an image representation and a text representation into a shared embedding space. In this paper we evaluate the impact of using the Full-Network embedding in this setting, replacing the original image representation in a competitive multimodal embedding generation scheme. Unlike the one-layer image embeddings typically used by most approaches, the Full-Network embedding provides a multi-scale representation of images, which results in richer characterizations. To measure the influence of the Full-Network embedding, we evaluate its performance on three different datasets, and compare the results with the original multimodal embedding generation scheme when using a one-layer image embedding, and with the rest of the state-of-the-art. Results for image annotation and image retrieval tasks indicate that the Full-Network embedding is consistently superior to the one-layer embedding. These results motivate the integration of the Full-Network embedding on any multimodal embedding generation scheme, something feasible thanks to the flexibility of the approach.

Comments:	In 2nd Workshop on Semantic Deep Learning (SemDeep-2) at the 12th International Conference on Computational Semantics (IWCS) 2017
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)
Cite as:	arXiv:1707.09872 [cs.CV]
	(or arXiv:1707.09872v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1707.09872

Submission history

From: Armand Vilalta [view email]
[v1] Mon, 24 Jul 2017 10:27:33 UTC (103 KB)
[v2] Wed, 9 Aug 2017 13:11:42 UTC (104 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Full-Network Embedding in a Multimodal Embedding Pipeline

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Full-Network Embedding in a Multimodal Embedding Pipeline

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators