LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion

Ling, Zhan; Liu, Kang; Yan, Kai; Yang, Yifan; Lin, Weijian; Fan, Ting-Han; Shen, Lingfeng; Du, Zhengyin; Chen, Jiecao

Computer Science > Computation and Language

arXiv:2501.15089 (cs)

[Submitted on 25 Jan 2025 (v1), last revised 28 Feb 2025 (this version, v2)]

Title:LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion

Authors:Zhan Ling, Kang Liu, Kai Yan, Yifan Yang, Weijian Lin, Ting-Han Fan, Lingfeng Shen, Zhengyin Du, Jiecao Chen

View PDF HTML (experimental)

Abstract:Large language models (LLMs) have demonstrated remarkable progress in understanding long-context inputs. However, benchmarks for evaluating the long-context reasoning abilities of LLMs fall behind the pace. Existing benchmarks often focus on a narrow range of tasks or those that do not demand complex reasoning. To address this gap and enable a more comprehensive evaluation of the long-context reasoning capabilities of current LLMs, we propose a new synthetic benchmark, LongReason, which is constructed by synthesizing long-context reasoning questions from a varied set of short-context reasoning questions through context expansion. LongReason consists of 794 multiple-choice reasoning questions with diverse reasoning patterns across three task categories: reading comprehension, logical inference, and mathematical word problems. We evaluate 21 LLMs on LongReason, revealing that most models experience significant performance drops as context length increases. Our further analysis shows that even state-of-the-art LLMs still have significant room for improvement in providing robust reasoning across different tasks. We have open-sourced LongReason under this https URL to support the comprehensive evaluation of LLMs' long-context reasoning capabilities.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2501.15089 [cs.CL]
	(or arXiv:2501.15089v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2501.15089

Submission history

From: Zhan Ling [view email]
[v1] Sat, 25 Jan 2025 05:32:14 UTC (1,855 KB)
[v2] Fri, 28 Feb 2025 07:53:20 UTC (2,086 KB)

Computer Science > Computation and Language

Title:LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators