Data Augmentation with Hierarchical SQL-to-Question Generation for Cross-domain Text-to-SQL Parsing

Zhang, Ao; Wu, Kun; Wang, Lijie; Li, Zhenghua; Xiao, Xinyan; Wu, Hua; Zhang, Min; Wang, Haifeng

Computer Science > Computation and Language

arXiv:2103.02227v1 (cs)

[Submitted on 3 Mar 2021 (this version), latest version 15 Nov 2022 (v4)]

Title:Data Augmentation with Hierarchical SQL-to-Question Generation for Cross-domain Text-to-SQL Parsing

Authors:Ao Zhang, Kun Wu, Lijie Wang, Zhenghua Li, Xinyan Xiao, Hua Wu, Min Zhang, Haifeng Wang

View PDF

Abstract:Data augmentation has attracted a lot of research attention in the deep learning era for its ability in alleviating data sparseness. The lack of data for unseen evaluation databases is exactly the major challenge for cross-domain text-to-SQL parsing. Previous works either require human intervention to guarantee the quality of generated data \cite{yu2018syntaxsqlnet}, or fail to handle complex SQL queries \cite{guo2018question}. This paper presents a simple yet effective data augmentation framework. First, given a database, we automatically produce a large amount of SQL queries based on an abstract syntax tree grammar \cite{yin2018tranx}. We require the generated queries cover at least 80\% of SQL patterns in the training data for better distribution matching. Second, we propose a hierarchical SQL-to-question generation model to obtain high-quality natural language questions, which is the major contribution of this work. Experiments on three cross-domain datasets, i.e., WikiSQL and Spider in English, and DuSQL in Chinese, show that our proposed data augmentation framework can consistently improve performance over strong baselines, and in particular the hierarchical generation model is the key for the improvement.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2103.02227 [cs.CL]
	(or arXiv:2103.02227v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2103.02227

Submission history

From: Kun Wu [view email]
[v1] Wed, 3 Mar 2021 07:37:38 UTC (1,741 KB)
[v2] Mon, 8 Mar 2021 07:33:28 UTC (1,560 KB)
[v3] Tue, 26 Oct 2021 12:04:00 UTC (1,584 KB)
[v4] Tue, 15 Nov 2022 02:12:31 UTC (1,585 KB)

Computer Science > Computation and Language

Title:Data Augmentation with Hierarchical SQL-to-Question Generation for Cross-domain Text-to-SQL Parsing

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Data Augmentation with Hierarchical SQL-to-Question Generation for Cross-domain Text-to-SQL Parsing

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators