CMMLU: Measuring massive multitask language understanding in Chinese

Li, Haonan; Zhang, Yixuan; Koto, Fajri; Yang, Yifei; Zhao, Hai; Gong, Yeyun; Duan, Nan; Baldwin, Timothy

Computer Science > Computation and Language

arXiv:2306.09212 (cs)

[Submitted on 15 Jun 2023 (v1), last revised 17 Jan 2024 (this version, v2)]

Title:CMMLU: Measuring massive multitask language understanding in Chinese

Authors:Haonan Li, Yixuan Zhang, Fajri Koto, Yifei Yang, Hai Zhao, Yeyun Gong, Nan Duan, Timothy Baldwin

View PDF HTML (experimental)

Abstract:As the capabilities of large language models (LLMs) continue to advance, evaluating their performance becomes increasingly crucial and challenging. This paper aims to bridge this gap by introducing CMMLU, a comprehensive Chinese benchmark that covers various subjects, including natural science, social sciences, engineering, and humanities. We conduct a thorough evaluation of 18 advanced multilingual- and Chinese-oriented LLMs, assessing their performance across different subjects and settings. The results reveal that most existing LLMs struggle to achieve an average accuracy of 50%, even when provided with in-context examples and chain-of-thought prompts, whereas the random baseline stands at 25%. This highlights significant room for improvement in LLMs. Additionally, we conduct extensive experiments to identify factors impacting the models' performance and propose directions for enhancing LLMs. CMMLU fills the gap in evaluating the knowledge and reasoning capabilities of large language models within the Chinese context.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2306.09212 [cs.CL]
	(or arXiv:2306.09212v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2306.09212

Submission history

From: Haonan Li [view email]
[v1] Thu, 15 Jun 2023 15:49:51 UTC (2,135 KB)
[v2] Wed, 17 Jan 2024 19:09:57 UTC (3,151 KB)

Computer Science > Computation and Language

Title:CMMLU: Measuring massive multitask language understanding in Chinese

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:CMMLU: Measuring massive multitask language understanding in Chinese

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators