PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts

Li, Yunshui; Hui, Binyuan; Yin, ZhiChao; Yang, Min; Huang, Fei; Li, Yongbin

Computer Science > Computation and Language

arXiv:2305.14839v2 (cs)

[Submitted on 24 May 2023 (v1), last revised 13 Jun 2023 (this version, v2)]

Title:PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts

Authors:Yunshui Li, Binyuan Hui, ZhiChao Yin, Min Yang, Fei Huang, Yongbin Li

View PDF

Abstract:Perceiving multi-modal information and fulfilling dialogues with humans is a long-term goal of artificial intelligence. Pre-training is commonly regarded as an effective approach for multi-modal dialogue. However, due to the limited availability of multi-modal dialogue data, there is still scarce research on multi-modal dialogue pre-training. Yet another intriguing challenge emerges from the encompassing nature of multi-modal dialogue, which involves various modalities and tasks. Moreover, new forms of tasks may arise at unpredictable points in the future. Hence, it is essential for designed multi-modal dialogue models to possess sufficient flexibility to adapt to such scenarios. This paper proposes \textbf{PaCE}, a unified, structured, compositional multi-modal dialogue pre-training framework. It utilizes a combination of several fundamental experts to accommodate multiple dialogue-related tasks and can be pre-trained using limited dialogue and extensive non-dialogue multi-modal data. Furthermore, we propose a progressive training method where old experts from the past can assist new experts, facilitating the expansion of their capabilities. Experimental results demonstrate that PaCE achieves state-of-the-art results on eight multi-modal dialog benchmarks.

Comments:	ACL 2023
Subjects:	Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2305.14839 [cs.CL]
	(or arXiv:2305.14839v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2305.14839

Submission history

From: Yunshui Li [view email]
[v1] Wed, 24 May 2023 07:43:29 UTC (11,086 KB)
[v2] Tue, 13 Jun 2023 06:31:46 UTC (11,082 KB)

Computer Science > Computation and Language

Title:PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators