Learning Social Image Embedding with Deep Multimodal Attention Networks

Huang, Feiran; Zhang, Xiaoming; Li, Zhoujun; Mei, Tao; He, Yueying; Zhao, Zhonghua

doi:10.1145/3126686.3126720

Computer Science > Multimedia

arXiv:1710.06582 (cs)

[Submitted on 18 Oct 2017]

Title:Learning Social Image Embedding with Deep Multimodal Attention Networks

Authors:Feiran Huang, Xiaoming Zhang, Zhoujun Li, Tao Mei, Yueying He, Zhonghua Zhao

View PDF

Abstract:Learning social media data embedding by deep models has attracted extensive research interest as well as boomed a lot of applications, such as link prediction, classification, and cross-modal search. However, for social images which contain both link information and multimodal contents (e.g., text description, and visual content), simply employing the embedding learnt from network structure or data content results in sub-optimal social image representation. In this paper, we propose a novel social image embedding approach called Deep Multimodal Attention Networks (DMAN), which employs a deep model to jointly embed multimodal contents and link information. Specifically, to effectively capture the correlations between multimodal contents, we propose a multimodal attention network to encode the fine-granularity relation between image regions and textual words. To leverage the network structure for embedding learning, a novel Siamese-Triplet neural network is proposed to model the links among images. With the joint deep model, the learnt embedding can capture both the multimodal contents and the nonlinear network information. Extensive experiments are conducted to investigate the effectiveness of our approach in the applications of multi-label classification and cross-modal search. Compared to state-of-the-art image embeddings, our proposed DMAN achieves significant improvement in the tasks of multi-label classification and cross-modal search.

Subjects:	Multimedia (cs.MM); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1710.06582 [cs.MM]
	(or arXiv:1710.06582v1 [cs.MM] for this version)
	https://doi.org/10.48550/arXiv.1710.06582
Journal reference:	Proceedings of Thematic Workshops of the 25th ACM Multimedia 2017
Related DOI:	https://doi.org/10.1145/3126686.3126720

Submission history

From: Xiaoming Zhang [view email]
[v1] Wed, 18 Oct 2017 04:28:20 UTC (1,352 KB)

Computer Science > Multimedia

Title:Learning Social Image Embedding with Deep Multimodal Attention Networks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Multimedia

Title:Learning Social Image Embedding with Deep Multimodal Attention Networks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators