Setting Standards in Turkish NLP: TR-MMLU for Large Language Model Evaluation

Bayram, M. Ali; Fincan, Ali Arda; G"um"uş, Ahmet Semih; Diri, Banu; Yıldırım, Savaş; Aytaş, "Oner

Computer Science > Computation and Language

arXiv:2501.00593 (cs)

[Submitted on 31 Dec 2024]

Title:Setting Standards in Turkish NLP: TR-MMLU for Large Language Model Evaluation

Authors:M. Ali Bayram, Ali Arda Fincan, Ahmet Semih G"um"uş, Banu Diri, Savaş Yıldırım, "Oner Aytaş

View PDF HTML (experimental)

Abstract:Language models have made remarkable advancements in understanding and generating human language, achieving notable success across a wide array of applications. However, evaluating these models remains a significant challenge, particularly for resource-limited languages such as Turkish. To address this gap, we introduce the Turkish MMLU (TR-MMLU) benchmark, a comprehensive evaluation framework designed to assess the linguistic and conceptual capabilities of large language models (LLMs) in Turkish. TR-MMLU is constructed from a carefully curated dataset comprising 6200 multiple-choice questions across 62 sections, selected from a pool of 280000 questions spanning 67 disciplines and over 800 topics within the Turkish education system. This benchmark provides a transparent, reproducible, and culturally relevant tool for evaluating model performance. It serves as a standard framework for Turkish NLP research, enabling detailed analyses of LLMs' capabilities in processing Turkish text and fostering the development of more robust and accurate language models. In this study, we evaluate state-of-the-art LLMs on TR-MMLU, providing insights into their strengths and limitations for Turkish-specific tasks. Our findings reveal critical challenges, such as the impact of tokenization and fine-tuning strategies, and highlight areas for improvement in model design. By setting a new standard for evaluating Turkish language models, TR-MMLU aims to inspire future innovations and support the advancement of Turkish NLP research.

Comments:	6 pages, 2 tables, submitted to arXiv for review. Includes a comprehensive evaluation framework for Turkish NLP tasks and state-of-the-art LLM evaluations
Subjects:	Computation and Language (cs.CL)
MSC classes:	68T50
ACM classes:	I.2.7
Cite as:	arXiv:2501.00593 [cs.CL]
	(or arXiv:2501.00593v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2501.00593

Submission history

From: M. Ali Bayram [view email]
[v1] Tue, 31 Dec 2024 18:43:49 UTC (128 KB)

Computer Science > Computation and Language

Title:Setting Standards in Turkish NLP: TR-MMLU for Large Language Model Evaluation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Setting Standards in Turkish NLP: TR-MMLU for Large Language Model Evaluation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators