MDEval: Evaluating and Enhancing Markdown Awareness in Large Language Models

Chen, Zhongpu; Liu, Yinfeng; Shi, Long; Wang, Zhi-Jie; Chen, Xingyan; Zhao, Yu; Ren, Fuji

Computer Science > Computation and Language

arXiv:2501.15000 (cs)

[Submitted on 25 Jan 2025]

Title:MDEval: Evaluating and Enhancing Markdown Awareness in Large Language Models

Authors:Zhongpu Chen, Yinfeng Liu, Long Shi, Zhi-Jie Wang, Xingyan Chen, Yu Zhao, Fuji Ren

View PDF HTML (experimental)

Abstract:Large language models (LLMs) are expected to offer structured Markdown responses for the sake of readability in web chatbots (e.g., ChatGPT). Although there are a myriad of metrics to evaluate LLMs, they fail to evaluate the readability from the view of output content structure. To this end, we focus on an overlooked yet important metric -- Markdown Awareness, which directly impacts the readability and structure of the content generated by these language models. In this paper, we introduce MDEval, a comprehensive benchmark to assess Markdown Awareness for LLMs, by constructing a dataset with 20K instances covering 10 subjects in English and Chinese. Unlike traditional model-based evaluations, MDEval provides excellent interpretability by combining model-based generation tasks and statistical methods. Our results demonstrate that MDEval achieves a Spearman correlation of 0.791 and an accuracy of 84.1% with human, outperforming existing methods by a large margin. Extensive experimental results also show that through fine-tuning over our proposed dataset, less performant open-source models are able to achieve comparable performance to GPT-4o in terms of Markdown Awareness. To ensure reproducibility and transparency, MDEval is open sourced at this https URL.

Comments:	WWW 2025
Subjects:	Computation and Language (cs.CL); Information Retrieval (cs.IR)
Cite as:	arXiv:2501.15000 [cs.CL]
	(or arXiv:2501.15000v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2501.15000

Submission history

From: Zhongpu Chen [view email]
[v1] Sat, 25 Jan 2025 00:26:01 UTC (434 KB)

Computer Science > Computation and Language

Title:MDEval: Evaluating and Enhancing Markdown Awareness in Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:MDEval: Evaluating and Enhancing Markdown Awareness in Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators