Unstructured Evidence Attribution for Long Context Query Focused Summarization

Wright, Dustin; Mujahid, Zain Muhammad; Wang, Lu; Augenstein, Isabelle; Jurgens, David

Computer Science > Computation and Language

arXiv:2502.14409 (cs)

[Submitted on 20 Feb 2025]

Title:Unstructured Evidence Attribution for Long Context Query Focused Summarization

Authors:Dustin Wright, Zain Muhammad Mujahid, Lu Wang, Isabelle Augenstein, David Jurgens

View PDF HTML (experimental)

Abstract:Large language models (LLMs) are capable of generating coherent summaries from very long contexts given a user query. Extracting and properly citing evidence spans could help improve the transparency and reliability of these summaries. At the same time, LLMs suffer from positional biases in terms of which information they understand and attend to, which could affect evidence citation. Whereas previous work has focused on evidence citation with predefined levels of granularity (e.g. sentence, paragraph, document, etc.), we propose the task of long-context query focused summarization with unstructured evidence citation. We show how existing systems struggle to generate and properly cite unstructured evidence from their context, and that evidence tends to be "lost-in-the-middle". To help mitigate this, we create the Summaries with Unstructured Evidence Text dataset (SUnsET), a synthetic dataset generated using a novel domain-agnostic pipeline which can be used as supervision to adapt LLMs to this task. We demonstrate across 5 LLMs of different sizes and 4 datasets with varying document types and lengths that LLMs adapted with SUnsET data generate more relevant and factually consistent evidence than their base models, extract evidence from more diverse locations in their context, and can generate more relevant and consistent summaries.

Comments:	24 pages; 21 figures; 5 tables
Subjects:	Computation and Language (cs.CL); Information Retrieval (cs.IR)
Cite as:	arXiv:2502.14409 [cs.CL]
	(or arXiv:2502.14409v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.14409

Submission history

From: Dustin Wright [view email]
[v1] Thu, 20 Feb 2025 09:57:42 UTC (227 KB)

Computer Science > Computation and Language

Title:Unstructured Evidence Attribution for Long Context Query Focused Summarization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Unstructured Evidence Attribution for Long Context Query Focused Summarization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators