Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA

Huang, Wenyu; Zhou, Guancheng; Wang, Hongru; Vougiouklis, Pavlos; Lapata, Mirella; Pan, Jeff Z.

Computer Science > Computation and Language

arXiv:2410.06121 (cs)

[Submitted on 8 Oct 2024]

Title:Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA

Authors:Wenyu Huang, Guancheng Zhou, Hongru Wang, Pavlos Vougiouklis, Mirella Lapata, Jeff Z. Pan

View PDF HTML (experimental)

Abstract:Retrieval-Augmented Generation (RAG) is widely used to inject external non-parametric knowledge into large language models (LLMs). Recent works suggest that Knowledge Graphs (KGs) contain valuable external knowledge for LLMs. Retrieving information from KGs differs from extracting it from document sets. Most existing approaches seek to directly retrieve relevant subgraphs, thereby eliminating the need for extensive SPARQL annotations, traditionally required by semantic parsing methods. In this paper, we model the subgraph retrieval task as a conditional generation task handled by small language models. Specifically, we define a subgraph identifier as a sequence of relations, each represented as a special token stored in the language models. Our base generative subgraph retrieval model, consisting of only 220M parameters, achieves competitive retrieval performance compared to state-of-the-art models relying on 7B parameters, demonstrating that small language models are capable of performing the subgraph retrieval task. Furthermore, our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks. Our model and data will be made available online: this https URL.

Comments:	Accepted by EMNLP 2024 Findings
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2410.06121 [cs.CL]
	(or arXiv:2410.06121v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2410.06121

Submission history

From: Wenyu Huang [view email]
[v1] Tue, 8 Oct 2024 15:22:36 UTC (285 KB)

Computer Science > Computation and Language

Title:Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators