SummExecEdit: A Factual Consistency Benchmark in Summarization with Executable Edits

Thorat, Onkar; Laban, Philippe; Wu, Chien-Sheng

Computer Science > Computation and Language

arXiv:2412.13378 (cs)

[Submitted on 17 Dec 2024]

Title:SummExecEdit: A Factual Consistency Benchmark in Summarization with Executable Edits

Authors:Onkar Thorat, Philippe Laban, Chien-Sheng Wu

View PDF HTML (experimental)

Abstract:Detecting factual inconsistencies in summarization is critical, yet existing benchmarks lack the necessary challenge and interpretability for robust evaluation. In this paper, we introduce SummExecEdit, a novel benchmark leveraging executable edits to assess models on their ability to both detect factual errors and provide accurate explanations. The top-performing model, Claude3-Opus, achieves a joint detection and explanation score of only 0.49 in our benchmark, with individual scores of 0.67 for detection and 0.73 for explanation. Furthermore, we identify four primary types of explanation errors, with 45.4% of errors focusing on completely unrelated parts of the summary.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2412.13378 [cs.CL]
	(or arXiv:2412.13378v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.13378

Submission history

From: Onkar Thorat [view email]
[v1] Tue, 17 Dec 2024 23:26:44 UTC (990 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2024-12

Change to browse by:

References & Citations

export BibTeX citation

Computer Science > Computation and Language

Title:SummExecEdit: A Factual Consistency Benchmark in Summarization with Executable Edits

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:SummExecEdit: A Factual Consistency Benchmark in Summarization with Executable Edits

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators