Efficient Multi-Modal Embeddings from Structured Data

Verő, Anita L.; Copestake, Ann

Computer Science > Computation and Language

arXiv:2110.02577 (cs)

[Submitted on 6 Oct 2021]

Title:Efficient Multi-Modal Embeddings from Structured Data

Authors:Anita L. Verő, Ann Copestake

View PDF

Abstract:Multi-modal word semantics aims to enhance embeddings with perceptual input, assuming that human meaning representation is grounded in sensory experience. Most research focuses on evaluation involving direct visual input, however, visual grounding can contribute to linguistic applications as well. Another motivation for this paper is the growing need for more interpretable models and for evaluating model efficiency regarding size and performance. This work explores the impact of visual information for semantics when the evaluation involves no direct visual input, specifically semantic similarity and relatedness. We investigate a new embedding type in-between linguistic and visual modalities, based on the structured annotations of Visual Genome. We compare uni- and multi-modal models including structured, linguistic and image based representations. We measure the efficiency of each model with regard to data and model size, modality / data distribution and information gain. The analysis includes an interpretation of embedding structures. We found that this new embedding conveys complementary information for text based embeddings. It achieves comparable performance in an economic way, using orders of magnitude less resources than visual models.

Comments:	5 pages, 5 pages of appendix, 7 figures
Subjects:	Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2110.02577 [cs.CL]
	(or arXiv:2110.02577v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2110.02577

Submission history

From: Anita Lilla Verő [view email]
[v1] Wed, 6 Oct 2021 08:42:09 UTC (10,257 KB)

Computer Science > Computation and Language

Title:Efficient Multi-Modal Embeddings from Structured Data

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Efficient Multi-Modal Embeddings from Structured Data

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators