PandaGPT: One Model To Instruction-Follow Them All

Su, Yixuan; Lan, Tian; Li, Huayang; Xu, Jialu; Wang, Yan; Cai, Deng

Computer Science > Computation and Language

arXiv:2305.16355 (cs)

[Submitted on 25 May 2023]

Title:PandaGPT: One Model To Instruction-Follow Them All

Authors:Yixuan Su, Tian Lan, Huayang Li, Jialu Xu, Yan Wang, Deng Cai

View PDF

Abstract:We present PandaGPT, an approach to emPower large lANguage moDels with visual and Auditory instruction-following capabilities. Our pilot experiments show that PandaGPT can perform complex tasks such as detailed image description generation, writing stories inspired by videos, and answering questions about audios. More interestingly, PandaGPT can take multimodal inputs simultaneously and compose their semantics naturally. For example, PandaGPT can connect how objects look in an image/video and how they sound in an audio. To do so, PandaGPT combines the multimodal encoders from ImageBind and the large language models from Vicuna. Notably, only aligned image-text pairs are required for the training of PandaGPT. Thanks to the strong capability of ImageBind in embedding data from different modalities into the same space, PandaGPT displays emergent, i.e. zero-shot, cross-modal behaviors for data other than image and text (e.g., video, audio, depth, thermal, and IMU). We hope that PandaGPT serves as an initial step toward building AGI that can perceive and understand inputs in different modalities holistically, as we humans do. Our project page is at this https URL.

Comments:	Technical report, work in progress. Our project page is at this https URL
Subjects:	Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2305.16355 [cs.CL]
	(or arXiv:2305.16355v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2305.16355

Submission history

From: Yixuan Su [view email]
[v1] Thu, 25 May 2023 04:16:07 UTC (14,136 KB)

Computer Science > Computation and Language

Title:PandaGPT: One Model To Instruction-Follow Them All

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:PandaGPT: One Model To Instruction-Follow Them All

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators