bgGLUE: A Bulgarian General Language Understanding Evaluation Benchmark

Hardalov, Momchil; Atanasova, Pepa; Mihaylov, Todor; Angelova, Galia; Simov, Kiril; Osenova, Petya; Stoyanov, Ves; Koychev, Ivan; Nakov, Preslav; Radev, Dragomir

Computer Science > Computation and Language

arXiv:2306.02349 (cs)

[Submitted on 4 Jun 2023 (v1), last revised 7 Jun 2023 (this version, v2)]

Title:bgGLUE: A Bulgarian General Language Understanding Evaluation Benchmark

Authors:Momchil Hardalov, Pepa Atanasova, Todor Mihaylov, Galia Angelova, Kiril Simov, Petya Osenova, Ves Stoyanov, Ivan Koychev, Preslav Nakov, Dragomir Radev

View PDF

Abstract:We present bgGLUE(Bulgarian General Language Understanding Evaluation), a benchmark for evaluating language models on Natural Language Understanding (NLU) tasks in Bulgarian. Our benchmark includes NLU tasks targeting a variety of NLP problems (e.g., natural language inference, fact-checking, named entity recognition, sentiment analysis, question answering, etc.) and machine learning tasks (sequence labeling, document-level classification, and regression). We run the first systematic evaluation of pre-trained language models for Bulgarian, comparing and contrasting results across the nine tasks in the benchmark. The evaluation results show strong performance on sequence labeling tasks, but there is a lot of room for improvement for tasks that require more complex reasoning. We make bgGLUE publicly available together with the fine-tuning and the evaluation code, as well as a public leaderboard at this https URL, and we hope that it will enable further advancements in developing NLU models for Bulgarian.

Comments:	Accepted to ACL 2023 (Main Conference)
Subjects:	Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
MSC classes:	68T50
ACM classes:	F.2.2; I.2.7
Cite as:	arXiv:2306.02349 [cs.CL]
	(or arXiv:2306.02349v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2306.02349
Journal reference:	ACL 2023

Submission history

From: Momchil Hardalov [view email]
[v1] Sun, 4 Jun 2023 12:54:00 UTC (404 KB)
[v2] Wed, 7 Jun 2023 03:57:51 UTC (402 KB)

Computer Science > Computation and Language

Title:bgGLUE: A Bulgarian General Language Understanding Evaluation Benchmark

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:bgGLUE: A Bulgarian General Language Understanding Evaluation Benchmark

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators