Assessing Human Editing Effort on LLM-Generated Texts via Compression-Based Edit Distance

Devatine, Nicolas; Abraham, Louis

Computer Science > Computation and Language

arXiv:2412.17321 (cs)

[Submitted on 23 Dec 2024]

Title:Assessing Human Editing Effort on LLM-Generated Texts via Compression-Based Edit Distance

Authors:Nicolas Devatine, Louis Abraham

View PDF HTML (experimental)

Abstract:Assessing the extent of human edits on texts generated by Large Language Models (LLMs) is crucial to understanding the human-AI interactions and improving the quality of automated text generation systems. Existing edit distance metrics, such as Levenshtein, BLEU, ROUGE, and TER, often fail to accurately measure the effort required for post-editing, especially when edits involve substantial modifications, such as block operations. In this paper, we introduce a novel compression-based edit distance metric grounded in the Lempel-Ziv-77 algorithm, designed to quantify the amount of post-editing applied to LLM-generated texts. Our method leverages the properties of text compression to measure the informational difference between the original and edited texts. Through experiments on real-world human edits datasets, we demonstrate that our proposed metric is highly correlated with actual edit time and effort. We also show that LLMs exhibit an implicit understanding of editing speed, that aligns well with our metric. Furthermore, we compare our metric with existing ones, highlighting its advantages in capturing complex edits with linear computational efficiency. Our code and data are available at: this https URL

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2412.17321 [cs.CL]
	(or arXiv:2412.17321v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.17321

Submission history

From: Louis Abraham [view email]
[v1] Mon, 23 Dec 2024 06:29:25 UTC (9,208 KB)

Computer Science > Computation and Language

Title:Assessing Human Editing Effort on LLM-Generated Texts via Compression-Based Edit Distance

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Assessing Human Editing Effort on LLM-Generated Texts via Compression-Based Edit Distance

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators