HRET: A Self-Evolving LLM Evaluation Toolkit for Korean

Lee, Hanwool; Kim, Soo Yong; Choi, Dasol; Baek, SangWon; Hong, Seunghyeok; Jeong, Ilgyun; Hwang, Inseon; Lee, Naeun; Son, Guijin

Computer Science > Computational Engineering, Finance, and Science

arXiv:2503.22968 (cs)

[Submitted on 29 Mar 2025 (v1), last revised 1 Apr 2025 (this version, v2)]

Title:HRET: A Self-Evolving LLM Evaluation Toolkit for Korean

Authors:Hanwool Lee, Soo Yong Kim, Dasol Choi, SangWon Baek, Seunghyeok Hong, Ilgyun Jeong, Inseon Hwang, Naeun Lee, Guijin Son

View PDF HTML (experimental)

Abstract:Recent advancements in Korean large language models (LLMs) have spurred numerous benchmarks and evaluation methodologies, yet the lack of a standardized evaluation framework has led to inconsistent results and limited comparability. To address this, we introduce HRET Haerae Evaluation Toolkit, an open-source, self-evolving evaluation framework tailored specifically for Korean LLMs. HRET unifies diverse evaluation methods, including logit-based scoring, exact-match, language-inconsistency penalization, and LLM-as-a-Judge assessments. Its modular, registry-based architecture integrates major benchmarks (HAE-RAE Bench, KMMLU, KUDGE, HRM8K) and multiple inference backends (vLLM, HuggingFace, OpenAI-compatible endpoints). With automated pipelines for continuous evolution, HRET provides a robust foundation for reproducible, fair, and transparent Korean NLP research.

Subjects:	Computational Engineering, Finance, and Science (cs.CE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2503.22968 [cs.CE]
	(or arXiv:2503.22968v2 [cs.CE] for this version)
	https://doi.org/10.48550/arXiv.2503.22968

Submission history

From: Hanwool Lee [view email]
[v1] Sat, 29 Mar 2025 04:17:58 UTC (53 KB)
[v2] Tue, 1 Apr 2025 12:37:16 UTC (53 KB)

Computer Science > Computational Engineering, Finance, and Science

Title:HRET: A Self-Evolving LLM Evaluation Toolkit for Korean

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computational Engineering, Finance, and Science

Title:HRET: A Self-Evolving LLM Evaluation Toolkit for Korean

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators