A Review of Multimodal Explainable Artificial Intelligence: Past, Present and Future

Sun, Shilin; An, Wenbin; Tian, Feng; Nan, Fang; Liu, Qidong; Liu, Jun; Shah, Nazaraf; Chen, Ping

Computer Science > Computer Vision and Pattern Recognition

arXiv:2412.14056 (cs)

[Submitted on 18 Dec 2024]

Title:A Review of Multimodal Explainable Artificial Intelligence: Past, Present and Future

Authors:Shilin Sun, Wenbin An, Feng Tian, Fang Nan, Qidong Liu, Jun Liu, Nazaraf Shah, Ping Chen

View PDF HTML (experimental)

Abstract:Artificial intelligence (AI) has rapidly developed through advancements in computational power and the growth of massive datasets. However, this progress has also heightened challenges in interpreting the "black-box" nature of AI models. To address these concerns, eXplainable AI (XAI) has emerged with a focus on transparency and interpretability to enhance human understanding and trust in AI decision-making processes. In the context of multimodal data fusion and complex reasoning scenarios, the proposal of Multimodal eXplainable AI (MXAI) integrates multiple modalities for prediction and explanation tasks. Meanwhile, the advent of Large Language Models (LLMs) has led to remarkable breakthroughs in natural language processing, yet their complexity has further exacerbated the issue of MXAI. To gain key insights into the development of MXAI methods and provide crucial guidance for building more transparent, fair, and trustworthy AI systems, we review the MXAI methods from a historical perspective and categorize them across four eras: traditional machine learning, deep learning, discriminative foundation models, and generative LLMs. We also review evaluation metrics and datasets used in MXAI research, concluding with a discussion of future challenges and directions. A project related to this review has been created at this https URL.

Comments:	This work has been submitted to the IEEE for possible publication
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
Cite as:	arXiv:2412.14056 [cs.CV]
	(or arXiv:2412.14056v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2412.14056

Submission history

From: Shilin Sun [view email]
[v1] Wed, 18 Dec 2024 17:06:21 UTC (203 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:A Review of Multimodal Explainable Artificial Intelligence: Past, Present and Future

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:A Review of Multimodal Explainable Artificial Intelligence: Past, Present and Future

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators