Benchmarking and Understanding Compositional Relational Reasoning of LLMs

Ni, Ruikang; Xiao, Da; Meng, Qingye; Li, Xiangyu; Zheng, Shihui; Liang, Hongliang

Computer Science > Computation and Language

arXiv:2412.12841 (cs)

[Submitted on 17 Dec 2024]

Title:Benchmarking and Understanding Compositional Relational Reasoning of LLMs

Authors:Ruikang Ni, Da Xiao, Qingye Meng, Xiangyu Li, Shihui Zheng, Hongliang Liang

View PDF HTML (experimental)

Abstract:Compositional relational reasoning (CRR) is a hallmark of human intelligence, but we lack a clear understanding of whether and how existing transformer large language models (LLMs) can solve CRR tasks. To enable systematic exploration of the CRR capability of LLMs, we first propose a new synthetic benchmark called Generalized Associative Recall (GAR) by integrating and generalizing the essence of several tasks in mechanistic interpretability (MI) study in a unified framework. Evaluation shows that GAR is challenging enough for existing LLMs, revealing their fundamental deficiency in CRR. Meanwhile, it is easy enough for systematic MI study. Then, to understand how LLMs solve GAR tasks, we use attribution patching to discover the core circuits reused by Vicuna-33B across different tasks and a set of vital attention heads. Intervention experiments show that the correct functioning of these heads significantly impacts task performance. Especially, we identify two classes of heads whose activations represent the abstract notion of true and false in GAR tasks respectively. They play a fundamental role in CRR across various models and tasks. The dataset and code are available at this https URL.

Comments:	Accepted to the 39th Annual AAAI Conference on Artificial Intelligence (AAAI-25)
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2412.12841 [cs.CL]
	(or arXiv:2412.12841v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.12841

Submission history

From: Da Xiao [view email]
[v1] Tue, 17 Dec 2024 12:10:38 UTC (3,832 KB)

Computer Science > Computation and Language

Title:Benchmarking and Understanding Compositional Relational Reasoning of LLMs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Benchmarking and Understanding Compositional Relational Reasoning of LLMs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators