Cross-Modal Generalization: Learning in Low Resource Modalities via Meta-Alignment

Liang, Paul Pu; Wu, Peter; Ziyin, Liu; Morency, Louis-Philippe; Salakhutdinov, Ruslan

Computer Science > Machine Learning

arXiv:2012.02813 (cs)

[Submitted on 4 Dec 2020]

Title:Cross-Modal Generalization: Learning in Low Resource Modalities via Meta-Alignment

Authors:Paul Pu Liang, Peter Wu, Liu Ziyin, Louis-Philippe Morency, Ruslan Salakhutdinov

View PDF

Abstract:The natural world is abundant with concepts expressed via visual, acoustic, tactile, and linguistic modalities. Much of the existing progress in multimodal learning, however, focuses primarily on problems where the same set of modalities are present at train and test time, which makes learning in low-resource modalities particularly difficult. In this work, we propose algorithms for cross-modal generalization: a learning paradigm to train a model that can (1) quickly perform new tasks in a target modality (i.e. meta-learning) and (2) doing so while being trained on a different source modality. We study a key research question: how can we ensure generalization across modalities despite using separate encoders for different source and target modalities? Our solution is based on meta-alignment, a novel method to align representation spaces using strongly and weakly paired cross-modal data while ensuring quick generalization to new tasks across different modalities. We study this problem on 3 classification tasks: text to image, image to audio, and text to speech. Our results demonstrate strong performance even when the new target modality has only a few (1-10) labeled samples and in the presence of noisy labels, a scenario particularly prevalent in low-resource modalities.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2012.02813 [cs.LG]
	(or arXiv:2012.02813v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2012.02813

Submission history

From: Paul Pu Liang [view email]
[v1] Fri, 4 Dec 2020 19:27:26 UTC (22,236 KB)

Computer Science > Machine Learning

Title:Cross-Modal Generalization: Learning in Low Resource Modalities via Meta-Alignment

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Cross-Modal Generalization: Learning in Low Resource Modalities via Meta-Alignment

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators