ECLeKTic: a Novel Challenge Set for Evaluation of Cross-Lingual Knowledge Transfer

Goldman, Omer; Shaham, Uri; Malkin, Dan; Eiger, Sivan; Hassidim, Avinatan; Matias, Yossi; Maynez, Joshua; Gilady, Adi Mayrav; Riesa, Jason; Rijhwani, Shruti; Rimell, Laura; Szpektor, Idan; Tsarfaty, Reut; Eyal, Matan

Computer Science > Computation and Language

arXiv:2502.21228 (cs)

[Submitted on 28 Feb 2025]

Title:ECLeKTic: a Novel Challenge Set for Evaluation of Cross-Lingual Knowledge Transfer

Authors:Omer Goldman, Uri Shaham, Dan Malkin, Sivan Eiger, Avinatan Hassidim, Yossi Matias, Joshua Maynez, Adi Mayrav Gilady, Jason Riesa, Shruti Rijhwani, Laura Rimell, Idan Szpektor, Reut Tsarfaty, Matan Eyal

View PDF HTML (experimental)

Abstract:To achieve equitable performance across languages, multilingual large language models (LLMs) must be able to abstract knowledge beyond the language in which it was acquired. However, the current literature lacks reliable ways to measure LLMs' capability of cross-lingual knowledge transfer. To that end, we present ECLeKTic, a multilingual closed-book QA (CBQA) dataset that Evaluates Cross-Lingual Knowledge Transfer in a simple, black-box manner. We detected information with uneven coverage across languages by controlling for presence and absence of Wikipedia articles in 12 languages. We generated knowledge-seeking questions in a source language, for which the answer appears in a relevant Wikipedia article and translated them to all other 11 languages, for which the respective Wikipedias lack equivalent articles. Assuming that Wikipedia reflects the prominent knowledge in the LLM's training data, to solve ECLeKTic's CBQA task the model is required to transfer knowledge between languages. Experimenting with 8 LLMs, we show that SOTA models struggle to effectively share knowledge across, languages even if they can predict the answer well for queries in the same language the knowledge was acquired in.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2502.21228 [cs.CL]
	(or arXiv:2502.21228v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.21228

Submission history

From: Omer Goldman [view email]
[v1] Fri, 28 Feb 2025 16:59:30 UTC (1,339 KB)

Computer Science > Computation and Language

Title:ECLeKTic: a Novel Challenge Set for Evaluation of Cross-Lingual Knowledge Transfer

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:ECLeKTic: a Novel Challenge Set for Evaluation of Cross-Lingual Knowledge Transfer

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators