SQLong: Enhanced NL2SQL for Longer Contexts with LLMs

Nguyen, Dai Quoc; Hoang, Cong Duy Vu; Vu, Duy; Tangari, Gioacchino; Vu, Thanh Tien; Dharmasiri, Don; Li, Yuan-Fang; Duong, Long

Computer Science > Computation and Language

arXiv:2502.16747 (cs)

[Submitted on 23 Feb 2025]

Title:SQLong: Enhanced NL2SQL for Longer Contexts with LLMs

Authors:Dai Quoc Nguyen, Cong Duy Vu Hoang, Duy Vu, Gioacchino Tangari, Thanh Tien Vu, Don Dharmasiri, Yuan-Fang Li, Long Duong

View PDF HTML (experimental)

Abstract:Open-weight large language models (LLMs) have significantly advanced performance in the Natural Language to SQL (NL2SQL) task. However, their effectiveness diminishes when dealing with large database schemas, as the context length increases. To address this limitation, we present SQLong, a novel and efficient data augmentation framework designed to enhance LLM performance in long-context scenarios for the NL2SQL task. SQLong generates augmented datasets by extending existing database schemas with additional synthetic CREATE TABLE commands and corresponding data rows, sampled from diverse schemas in the training data. This approach effectively simulates long-context scenarios during finetuning and evaluation. Through experiments on the Spider and BIRD datasets, we demonstrate that LLMs finetuned with SQLong-augmented data significantly outperform those trained on standard datasets. These imply SQLong's practical implementation and its impact on improving NL2SQL capabilities in real-world settings with complex database schemas.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Software Engineering (cs.SE)
Cite as:	arXiv:2502.16747 [cs.CL]
	(or arXiv:2502.16747v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.16747

Submission history

From: Dai Quoc Nguyen [view email]
[v1] Sun, 23 Feb 2025 23:23:51 UTC (705 KB)

Computer Science > Computation and Language

Title:SQLong: Enhanced NL2SQL for Longer Contexts with LLMs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:SQLong: Enhanced NL2SQL for Longer Contexts with LLMs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators