Deep Multimodal Fusion by Channel Exchanging

Wang, Yikai; Huang, Wenbing; Sun, Fuchun; Xu, Tingyang; Rong, Yu; Huang, Junzhou

Computer Science > Computer Vision and Pattern Recognition

arXiv:2011.05005v1 (cs)

[Submitted on 10 Nov 2020 (this version), latest version 5 Dec 2020 (v2)]

Title:Deep Multimodal Fusion by Channel Exchanging

Authors:Yikai Wang, Wenbing Huang, Fuchun Sun, Tingyang Xu, Yu Rong, Junzhou Huang

View PDF

Abstract:Deep multimodal fusion by using multiple sources of data for classification or regression has exhibited a clear advantage over the unimodal counterpart on various applications. Yet, current methods including aggregation-based and alignment-based fusion are still inadequate in balancing the trade-off between inter-modal fusion and intra-modal processing, incurring a bottleneck of performance improvement. To this end, this paper proposes Channel-Exchanging-Network (CEN), a parameter-free multimodal fusion framework that dynamically exchanges channels between sub-networks of different modalities. Specifically, the channel exchanging process is self-guided by individual channel importance that is measured by the magnitude of Batch-Normalization (BN) scaling factor during training. The validity of such exchanging process is also guaranteed by sharing convolutional filters yet keeping separate BN layers across modalities, which, as an add-on benefit, allows our multimodal architecture to be almost as compact as a unimodal network. Extensive experiments on semantic segmentation via RGB-D data and image translation through multi-domain input verify the effectiveness of our CEN compared to current state-of-the-art methods. Detailed ablation studies have also been carried out, which provably affirm the advantage of each component we propose. Our code is available at this https URL.

Comments:	NeurIPS 2020. Code and models: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2011.05005 [cs.CV]
	(or arXiv:2011.05005v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2011.05005

Submission history

From: Yikai Wang [view email]
[v1] Tue, 10 Nov 2020 09:53:20 UTC (9,882 KB)
[v2] Sat, 5 Dec 2020 05:42:46 UTC (9,885 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Deep Multimodal Fusion by Channel Exchanging

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Deep Multimodal Fusion by Channel Exchanging

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators