Disce aut Deficere: Evaluating LLMs Proficiency on the INVALSI Italian Benchmark

Mercorio, Fabio; Mezzanzanica, Mario; Potertì, Daniele; Serino, Antonio; Seveso, Andrea

Computer Science > Computation and Language

arXiv:2406.17535 (cs)

[Submitted on 25 Jun 2024]

Title:Disce aut Deficere: Evaluating LLMs Proficiency on the INVALSI Italian Benchmark

Authors:Fabio Mercorio, Mario Mezzanzanica, Daniele Potertì, Antonio Serino, Andrea Seveso

View PDF HTML (experimental)

Abstract:Recent advancements in Large Language Models (LLMs) have significantly enhanced their ability to generate and manipulate human language, highlighting their potential across various applications. Evaluating LLMs in languages other than English is crucial for ensuring their linguistic versatility, cultural relevance, and applicability in diverse global contexts, thus broadening their usability and effectiveness. We tackle this challenge by introducing a structured benchmark using the INVALSI tests, a set of well-established assessments designed to measure educational competencies across Italy. Our study makes three primary contributions: Firstly, we adapt the INVALSI benchmark for automated LLM evaluation, which involves rigorous adaptation of the test format to suit automated processing while retaining the essence of the original tests. Secondly, we provide a detailed assessment of current LLMs, offering a crucial reference point for the academic community. Finally, we visually compare the performance of these models against human results. Additionally, researchers are invited to submit their models for ongoing evaluation, ensuring the benchmark remains a current and valuable resource.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2406.17535 [cs.CL]
	(or arXiv:2406.17535v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2406.17535

Submission history

From: Andrea Seveso [view email]
[v1] Tue, 25 Jun 2024 13:20:08 UTC (1,834 KB)

Computer Science > Computation and Language

Title:Disce aut Deficere: Evaluating LLMs Proficiency on the INVALSI Italian Benchmark

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Disce aut Deficere: Evaluating LLMs Proficiency on the INVALSI Italian Benchmark

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators