KFinEval-Pilot: A Comprehensive Benchmark Suite for Korean Financial Language Understanding

Hwang, Bokwang; Lim, Seonkyu; Kim, Taewoong; Geun, Yongjae; Bang, Sunghyun; Park, Sohyun; Park, Jihyun; Lee, Myeonggyu; Lee, Jinwoo; Kim, Yerin; Yoo, Jinsun; Hong, Jingyeong; Park, Jina; Kim, Yongchan; Kim, Suhyun; Hahm, Younggyun; Lee, Yiseul; Kang, Yejee; Yoon, Chanhyuk; Lee, Chansu; Jeong, Heeyewon; Lee, Jiyeon; Gu, Seonhye; Kang, Hyebin; Cho, Yousang; Yoo, Hangyeol; Lim, KyungTae

Computer Science > Computation and Language

arXiv:2504.13216 (cs)

[Submitted on 17 Apr 2025]

Title:KFinEval-Pilot: A Comprehensive Benchmark Suite for Korean Financial Language Understanding

Abstract:We introduce KFinEval-Pilot, a benchmark suite specifically designed to evaluate large language models (LLMs) in the Korean financial domain. Addressing the limitations of existing English-centric benchmarks, KFinEval-Pilot comprises over 1,000 curated questions across three critical areas: financial knowledge, legal reasoning, and financial toxicity. The benchmark is constructed through a semi-automated pipeline that combines GPT-4-generated prompts with expert validation to ensure domain relevance and factual accuracy. We evaluate a range of representative LLMs and observe notable performance differences across models, with trade-offs between task accuracy and output safety across different model families. These results highlight persistent challenges in applying LLMs to high-stakes financial applications, particularly in reasoning and safety. Grounded in real-world financial use cases and aligned with the Korean regulatory and linguistic context, KFinEval-Pilot serves as an early diagnostic tool for developing safer and more reliable financial AI systems.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2504.13216 [cs.CL]
	(or arXiv:2504.13216v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2504.13216

Submission history

From: Seonkyu Lim [view email]
[v1] Thu, 17 Apr 2025 00:12:58 UTC (179 KB)

Computer Science > Computation and Language

Title:KFinEval-Pilot: A Comprehensive Benchmark Suite for Korean Financial Language Understanding

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:KFinEval-Pilot: A Comprehensive Benchmark Suite for Korean Financial Language Understanding

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators