MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation

Xue, Haochen; Tang, Feilong; Hu, Ming; Liu, Yexin; Huang, Qidong; Li, Yulong; Liu, Chengzhi; Xu, Zhongxing; Zhang, Chong; Feng, Chun-Mei; Xie, Yutong; Razzak, Imran; Ge, Zongyuan; Su, Jionglong; He, Junjun; Qiao, Yu

Computer Science > Computation and Language

arXiv:2502.11903 (cs)

[Submitted on 17 Feb 2025]

Title:MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation

Authors:Haochen Xue, Feilong Tang, Ming Hu, Yexin Liu, Qidong Huang, Yulong Li, Chengzhi Liu, Zhongxing Xu, Chong Zhang, Chun-Mei Feng, Yutong Xie, Imran Razzak, Zongyuan Ge, Jionglong Su, Junjun He, Yu Qiao

View PDF HTML (experimental)

Abstract:Recent multimodal large language models (MLLMs) have demonstrated significant potential in open-ended conversation, generating more accurate and personalized responses. However, their abilities to memorize, recall, and reason in sustained interactions within real-world scenarios remain underexplored. This paper introduces MMRC, a Multi-Modal Real-world Conversation benchmark for evaluating six core open-ended abilities of MLLMs: information extraction, multi-turn reasoning, information update, image management, memory recall, and answer refusal. With data collected from real-world scenarios, MMRC comprises 5,120 conversations and 28,720 corresponding manually labeled questions, posing a significant challenge to existing MLLMs. Evaluations on 20 MLLMs in MMRC indicate an accuracy drop during open-ended interactions. We identify four common failure patterns: long-term memory degradation, inadequacies in updating factual knowledge, accumulated assumption of error propagation, and reluctance to say no. To mitigate these issues, we propose a simple yet effective NOTE-TAKING strategy, which can record key information from the conversation and remind the model during its responses, enhancing conversational capabilities. Experiments across six MLLMs demonstrate significant performance improvements.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2502.11903 [cs.CL]
	(or arXiv:2502.11903v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.11903

Submission history

From: Haochen Xue [view email]
[v1] Mon, 17 Feb 2025 15:24:49 UTC (6,975 KB)

Computer Science > Computation and Language

Title:MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators