NL2SQL Handbook
NL2SQL Handbook
2
Outline
• NL2SQL Problem and Background
• Language Model-Powered NL2SQL Solutions
• NL2SQL Benchmarks
• Evaluation and Error Analysis
• Practical Guidance for Developing NL2SQL Solutions
• Open Problems
3
Outline
• NL2SQL Problem and Background
• Problem Formulation
• NL2SQL Human Workflow
• NL2SQL Task Challenges
• Challenges Solving with Large Language Models
• Language Model-Powered NL2SQL Solutions
• NL2SQL Benchmarks
• Evaluation and Error Analysis
• Practical Guidance for Developing NL2SQL Solutions
• Open Problems 4
What is Natural Language to SQL (NL2SQL)?
Who are the three youngest
winners across all matches?
And their ranks?
NL Query DB
Users
Information
NL2SQL Model
5
NL2SQL Model Evolution Stream Graph
6
Outline
• NL2SQL Problem and Background
• Problem Formulation
• NL2SQL Human Workflow
• NL2SQL Task Challenges
• Challenges Solving with Large Language Models
• Language Model-Powered NL2SQL Solutions
• NL2SQL Benchmarks
• Evaluation and Error Analysis
• Practical Guidance for Developing NL2SQL Solutions
• Open Problems 7
NL2SQL Human Workflow
Step-1 NL Understanding
Find the number of dog pets that are raised by female student
Pets Student
PetID PetType PetAge ... StuID Sex Age ...
Dog F
Has_Pet
PetID StuID ...
C1
C3
C2
10
C1. Ambiguous NL Query
NL2SQL Task Challenges C2. Requiring Domain Knowledge
LiteraryGenre
C1
C2
11
LiteraryGenre 11
C1. Ambiguous NL Query
NL2SQL Task Challenges C2. Requiring Domain Knowledge
12
12
Lexical Ambiguity
(inherent challenges)
User Mistakes
Task Challenges
Complex Relationships Among Tables
A single word with multiple • The word “bat” can refer to: 1) an animal, 2) a
Lexical Ambiguity
meanings baseball bat, or 3) the action of swinging
A sentence can be parsed in • Mary [saw [the man [with the telescope]]]
Syntactic Ambiguity
multiple ways • Mary [saw [the man]] [with the telescope]
14
Complex Database and Dirty Content
Type Explanation Example
Complex Relationships Databases often contain hundreds of tables • The FIBEN Benchmark[1] has 152 tables
Among Tables with complex interrelationships. per database.
Example
16
16
NL2SQL Translation
Challenge 3: Database Schema Dependency
The NL2SQL translation process is highly dependent on the database schema it interacts with.
• Example: The same user intent can result in different SQL queries for different database schemas
17
NL2SQL Translation
Challenge 3: Database Schema Dependency
The NL2SQL translation process is highly dependent on the database schema it interacts with.
• Example: The same user intent can result in different SQL queries for different database schemas
18
Technical Challenges in Developing Solutions
Type Explanation
The SQL generated by NL2SQL models must be both correct and optimized for
SQL Efficiency
the SQL execution efficiency.
19
19
Technical Challenges in Developing Solutions
Type Explanation
Data privacy is crucial for NL2SQL systems, especially when the databases
Data Privacy
contain sensitive data.
For NL2SQL models to be widely used, they must be trustworthy and reliable,
Trustworthiness and
consistently delivering accurate results across various datasets and use cases.
Reliability
Trustworthiness also requires that the model’s decisions are transparent.
20
20
Outline
• NL2SQL Problem and Background
• Problem Formulation
• NL2SQL Human Workflow
• NL2SQL Task Challenges
• Challenges Solving with Language Models
• Language Model-Powered NL2SQL Solutions
• NL2SQL Benchmarks
• Evaluation and Error Analysis
• Practical Guidance for Developing NL2SQL Solutions
• Open Problems 21
Challenges Solving with Language Models
Figure: The Evolution of NL2SQL Solutions from the Perspective of Language Models. 22
Recap: LLM vs. PLM
emergent capabilities
BLOOMZ 2023
WeLM 1-4
Galatica 5-8
9-11
• Examples: GPT-4, LLama2 OPT-IML ChatGPT GPT-4 LLaMA2
Xiaowei Huang, Wenjie Ruan, Wei Huang, Gaojie Jin, Yi Dong, Changshun Wu, Saddek Bensalem, Ronghui Mu, Yi Qi, Xingyu
Zhao, Kaiwen Cai, Yanghao Zhang, Sihao Wu, Peipei Xu, Dengyu Wu, André Freitas, Mustafa A. Mustafa: A survey of safety and
trustworthiness of large language models through the lens of verification and validation. Artif. Intell. Rev. 57(7): 175 (2024) 23
How to adapt LLM/PLM to NL2SQL?
• The LLMs are a type of PLMs characterized by superior language understanding and
emergent capabilities
• Diverse datasets (Dp) include a broad range of linguistic patterns and domain-general knowledge, enabling
the model to develop robust understanding capabilities.
• Specialized datasets (Df ) are closely aligned with the NL2SQL task.
Finetuning NL2SQL
You can get more idea from Section Practical Guidance for Developing NL2SQL Solutions (click it)
🔘 https://dbgroup.cs.tsinghua.edu.cn/ligl/papers/tutorial-llm-meets-db.pdf
25
NL2SQL Solution in the Era of LLMs
26
Outline
• NL2SQL Problem and Background
• Language Model-Powered NL2SQL Solutions
• Pre-Processing
• NL2SQL Translation Methods
• Post-Processing
• NL2SQL Benchmarks
• Evaluation and Error Analysis
• Practical Guidance for Developing NL2SQL Solutions
• Open Problems
27
An Overview of Language Model-Powered NL2SQL
• Pre-Processing
• Schema Linking
• Database Content Retrieval
• Additional Information Acquisition
• NL2SQL Translation Methods
• Encoding Strategy
• Decoding Strategy
• Task-specific Prompt Strategies
• Intermediate Representation
• Post-Processing
• Correction
• Consistency
• Execution- Guided
• N-best Rerankers
Figure: An Overview of NL2SQL Methods in the LM Era 28
F LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2020
TABLE I: Comparisons of Existing NL2SQL Solutions.
Pre-Processing NL2SQL Translation Methods Post-Processing
Methods Years Finetuning
Schema DB Content Additional Information Backbone Encoding Intermediate Task-specific Decoding Execution N-best
Correction Consistency
Linking Retrieval Acquisition Model Strategy Representation Prompt Strategy Strategy Guided Rerankers
CHESS [51] 2024 - 3 3 3 Decoder-Only Sequential Encoding - COT Greedy Search 3 3 3 -
CodeS [46] 2024 - 3 3 - Decoder-Only Sequential Encoding - - Greedy Search - - 3 -
SFT CodeS [46] 2024 3 3 3 3 Decoder-Only Sequential Encoding - - Greedy Search - - 3 -
FinSQL [47] 2024 3 3 - 3 Decoder-Only Sequential Encoding - - Greedy Search 3 3 - -
DTS-SQL [52] 2024 3 3 - - Decoder-Only Sequential Encoding - - Greedy Search - - - -
TA-SQL [53] 2024 - 3 - - Decoder-Only Sequential Encoding Sketch Structure COT Greedy Search - - - -
SuperSQL [45] 2024 - 3 3 3 Decoder-Only Sequential Encoding - - Greedy Search - 3 - -
ZeroNL2SQL [44] 2024 3 - - - Encoder-Decoder Sequential Encoding Sketch Structure Decomposition Beam Search 3 - 3 -
PET-SQL [54] 2024 3 3 - 3 Decoder-Only Sequential Encoding - - Greedy Search - 3 - -
CoE-SQL [55] 2024 - - - 3 Decoder-Only Sequential Encoding - CoT Greedy Search 3 - - -
PURPLE [56] 2024 - 3 - 3 Decoder-Only Sequential Encoding - - Greedy Search 3 3 3 -
MetaSQL [57] 2024 - 3 - 3 Decoder-Only Sequential Encoding - Decomposition Greedy Search - - - 3
DEA-SQL [58] 2024 - 3 - 3 Decoder-Only Sequential Encoding - Decomposition Greedy Search 3 - - -
DIN-SQL [5] 2023 - 3 - 3 Decoder-Only Sequential Encoding Syntax Language Decomposition Greedy Search 3 - - -
DAIL-SQL [6] 2023 - - - 3 Decoder-Only Sequential Encoding - - Greedy Search - 3 - -
C3-SQL [59] 2023 - 3 - - Decoder-Only Sequential Encoding - COT Greedy Search - 3 - -
RESDSQL [7] 2023 3 3 3 - Encoder-Decoder Sequential Encoding Syntax Language Decomposition Beam Search - - - -
T5-3B+NatSQL+Token Preprocessing [60] 2023 3 3 3 - Encoder-Decoder Sequential Encoding Syntax Language - Greedy Search - - - -
ACT-SQL [61] 2023 - 3 - 3 Decoder-Only Sequential Encoding - CoT Greedy Search - - - -
ODIS [62] 2023 - - - 3 Decoder-Only Sequential Encoding - - Greedy Search - - - -
MAC-SQL [63] 2023 - 3 - - Decoder-Only Sequential Encoding - Decomposition Greedy Search 3 - 3 -
SC-Prompt [64] 2023 3 - - - Encoder-Decoder Separate Encoding Sketch Structure - Beam Search 3 - - -
CatSQL [65] 2023 3 - - - Encoder-Only Sequential Encoding Sketch Structure - Beam Search 3 - - -
SQLFormer [66] 2023 3 3 - - Encoder-Decoder Graph-based Encoding - - Beam Search - - - -
G³R [67] 2023 3 3 3 - Encoder-Only Graph-based Encoding - COT Beam Search - - - 3
Graphix-T5 [43] 2022 3 3 3 - Encoder-Decoder Graph-based Encoding - - Constraint-aware Incremental - - - -
SHiP [68] 2022 3 - 3 - Encoder-Decoder Graph-based Encoding - - Constraint-aware Incremental - - - -
N-best List Rerankers [69] 2022 3 3 3 - Encoder-Decoder Sequential Encoding - - Constraint-aware Incremental - - - 3
RASAT [70] 2022 3 - 3 - Encoder-Decoder Graph-based Encoding - - Constraint-aware Incremental - - - -
PICARD [71] 2022 3 - 3 - Encoder-Decoder Sequential Encoding - - Constraint-aware Incremental - - - -
TKK [72] 2022 3 - 3 - Encoder-Decoder Separate Encoding Sketch Structure Decomposition Constraint-aware Incremental - - - -
S²SQL [73] 2022 3 3 3 - Encoder-Only Graph-based Encoding - - Greedy Search - - - -
RAT-SQL [74] 2021 3 3 3 - Encoder-Only Graph-based Encoding Syntax Language - Beam Search - - - -
SmBoP [75] 2021 3 - 3 - Encoder-Only Graph-based Encoding - - Beam Search - - - -
RaSaP [76] 2021 3 3 3 - Encoder-Only Graph-based Encoding - - Beam Search - - - -
BRIDGE [77] 2020 3 - 3 - Encoder-Only Sequential Encoding - - Others - - - -
Xinyu Liu, Shuyu Shen, Boyan Li, Peixian Ma, Runzhi Jiang, Yuyu Luo, Yuxin Zhang, Ju Fan, Guoliang Li, and Nan Tang. 2024.
29
A Survey of NL2SQL with Large Language Models: Where are we, and where are we going? arXiv:2408.05109 [cs.DB] https://arxiv.org/abs/2408.05109
Outline
• NL2SQL Problem and Background
• Language Model-Powered NL2SQL Solutions
• Pre-Processing
• Schema Linking
• Database Content Retrieval
• Additional Information Acquisition
• NL2SQL Translation Methods
• Post-Processing
• NL2SQL Benchmarks
• Evaluation and Error Analysis
• Practical Guidance for Developing NL2SQL Solutions
• Open Problems 30
Pre-Processing
Motivation:
• The pre-processing ensures the accurate mapping and processing of key
information within the limited input, by identifying the tables and columns related
to the given NL query.
• The pre-processing serves as an enhancement to the model’s inputs in the
NL2SQL translation.
Goal:
• Identify the tables and columns related to the given NL.
32
Workflow of Schema Linking
Schema Students NL Query Students
id Display the names of all students who Grades id
Grades
Name have received an 'A' grade and the names Name
id of the courses they are enrolled in. id
Grade_level Student_id Grade_level
Student_id
Class Course_id
Course_id
Age Grade
Grade
Schema Linking
Course Course
id
Module id
id Teacher_id Teacher_id
Describe
Name
❌
Subject
Age
33
Method Classification of Schema Linking
Method classification:
• String Matching-based Schema Linking
• Neural Network-based Schema Linking
• In-Context Learning for Schema Linking Schema Linking
34
String Matching-based Schema Linking
IRNet designs string-based schema linking and an intermediate representation called SemQL.
• It uses n-grams (1-6 length) extracted from user queries as query candidates
• It evaluates the string-level similarity between the query candidates and the database schema,
then classifies them as Exact or Partial matches.
Jiaqi Guo, Zecheng Zhan, Yan Gao, Yan Xiao, Jian-Guang Lou, Ting Liu, Dongmei Zhang:
35
Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation. ACL (1) 2019: 4524-4535
String Matching-based Schema Linking
Limitations:
• False positives when candidates share common words.
• Lack the ability to handle synonyms.
36
Neural Network-based Schema Linking
RESDSQL proposes a ranking-enhanced encoding and skeleton-aware decoding framework.
• A cross-encoder is trained to classify tables and columns based on the input query.
Chao Zhang, Yuren Mao, Yijiang Fan, Yu Mi, Yunjun Gao, Lu Chen, Dongfang Lou, Jinshu Lin:
38
FinSQL: Model-Agnostic LLMs-based Text-to-SQL Framework for Financial Analysis. SIGMOD Conference Companion 2024: 93-105
Neural Network-based Schema Linking
Limitations:
• Highly rely on the quantity and quality of training data
• Struggle to generalize effectively across databases with new schemas
or domains
39
In-Context Learning for Schema Linking
CHESS breaks down the NL2SQL task into a 3-staged pipeline, including entity and context
retrieval, schema selection, and query generation.
• Utilizing GPT-4 to extract keywords from both NL and Hint.
• It implements an efficient three-stage schema selection with different prompts.
Shayan Talaei, Mohammadreza Pourreza, Yu-Chen Chang, Azalia Mirhoseini, Amin Saberi:
40
CHESS: Contextual Harnessing for Efficient SQL Synthesis. CoRR abs/2405.16755 (2024) Shayan Talaei, Mohammadreza
In-Context Learning for Schema Linking
MAC-SQL is a multi-agent collaborative framework for NL2SQL.
• The Selector agent performs the schema linking task.
• The Selector is activated only when the length of the database schema prompt exceeds a
specified threshold.
Bing Wang, Changyu Ren, Jian Yang, Xinnian Liang, Jiaqi Bai, Qian-Wen Zhang, Zhao Yan, Zhoujun Li:
41
MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL. CoRR abs/2312.11242 (2023)
In-Context Learning for Schema Linking
Limitations:
• Large schema may exceed context length limitation
• Highly rely on the quality of the prompt
• High API cost
42
Outline
• NL2SQL Problem and Background
• Language Model-Powered NL2SQL Solutions
• Pre-Processing
• Schema Linking
• Database Content Retrieval
• Additional Information Acquisition
• NL2SQL Translation Methods
• Post-Processing
• NL2SQL Benchmarks
• Evaluation and Error Analysis
• Practical Guidance for Developing NL2SQL Solutions
• Open Problems 43
Database Content Retrieval
Motivation:
• Due to the large databases and input limitations of the model,
retrieving cell values is resource-intensive.
• Effectively handle some SQL clauses such as WHERE and JOIN .
Goal:
• Efficiently retrieve cell values related to the given NL.
44
Workflow of Database Content Retrieval
Schema Students NL Query Schema Students
id Display the names of all students who id
Name have received an 'A' grade and the names Name
Grades Grades
Grade_level of the courses they are enrolled in.
id … … Grade_level
id … …
Class Class
Student_id … … Student_id … …
Age Age
Course_id … … Course_id … …
Grade A B Database Content Retrieval Grade A B
Course Module Course
id id
Teacher Course_name Teacher Course_name
id Teacher_id id Teacher_id
Name Describe Name Describe
Subject Subject
Age Age
45
Method Classification of DB Content Retrieval
Method classification:
• String Matching-based Database Content Retrieval
• Neural Network-based Database Content Retrieval
• Index Strategy for Database Content Retrieval
Database Content Retrieval
46
String Matching-based Database Content Retrieval
BRIDGE designs an anchor text matching to automatically extract cell values mentioned in
the NL.
• It uses a heuristic method to calculate the maximum sequence match between the
problem and the cell values to determine the matching boundary.
Limitations:
• Struggles with synonyms
• High computational cost for large databases
48
Neural Network-based Database Content Retrieval
TABERT uses a method called database content snapshots.
• It uses an attention mechanism to manage information between cell values across
different rows.
Jinyang Li, Binyuan Hui, Reynold Cheng, Bowen Qin, Chenhao Ma, Nan Huo, Fei Huang, Wenyu Du, Luo Si, Yongbin Li:
50
Graphix-T5: Mixing Pre-Trained Transformers with Graph-Aware Layers for Text-to-SQL Parsing. CoRR abs/2301.07507 (2023)
Neural Network-based Database Content Retrieval
Limitations:
• Lead to the misinterpretation of ambiguous or context-sensitive
queries.
• Require significant computing resources.
51
Index Strategy for Database Content Retrieval
CodeS introduces a coarse-to-fine cell value matching approach.
• It builds the index for all values using BM25.
• The index identifies candidate values that are relevant to NL.
Haoyang Li, Jing Zhang, Hanbing Liu, Ju Fan, Xiaokang Zhang, Jun Zhu, Renjie Wei, Hongyan Pan, Cuiping Li, Hong Chen:
52
CodeS: Towards Building Open-source Language Models for Text-to-SQL. CoRR abs/2402.16347 (2024)
Index Strategy for Database Content Retrieval
CHESS uses a Locality-sensitive Hashing algorithm for approximate nearest
neighbor searches.
• It indexes unique cell values to quickly identify the top similar values related to
the NL query.
Shayan Talaei, Mohammadreza Pourreza, Yu-Chen Chang, Azalia Mirhoseini, Amin Saberi:
53
CHESS: Contextual Harnessing for Efficient SQL Synthesis. CoRR abs/2405.16755 (2024)
Index Strategy for Database Content Retrieval
Limitations:
• Time-consuming to build.
• Requires frequent updates for dynamic databases, which increases
inference overhead.
54
Outline
• NL2SQL Problem and Background
• Language Model-Powered NL2SQL Solutions
• Pre-Processing
• Schema Linking
• Database Content Retrieval
• Additional Information Acquisition
• NL2SQL Translation Methods
• Post-Processing
• NL2SQL Benchmarks
• Evaluation and Error Analysis
• Practical Guidance for Developing NL2SQL Solutions
• Open Problems 55
Additional Information Acquisition
Motivation:
Ø Incorporate additional information (e.g., domain
knowledge) to improve the comprehension capabilities
of NL2SQL models for understanding the NL query,
performing the schema linking, and benefiting the
Additional Information
NL2SQL translation. Acquisition
Goal:
Ø To retrieve useful information such as Demonstration examples,
Specific domain knowledge, Formulaic Evidence, and Format
Information for the NL2SQL backbone model or specific modules.
56
Method Classification?
Method classification:
• Sample-based Methods
• Retrieval-based Methods Additional Information
Acquisition
57
Sample-based Methods
DIN-SQL inserts additional information through few-shot learning across multiple stages of
the workflow.
• These stages comprise schema linking, query classification, task decomposition, and self-
correction.
• It allows DIN-SQL to effectively tackle various challenges, including the complexity of
schema links, identification of multiple table joins, and handling of nested queries.
58
Mohammadreza Pourreza, Davood Rafiei: DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction. NeurIPS 2023
Sample-based Methods
CodeS utilizes metadata examples of cross-domain databases as the main additional
information, including data types and annotation text, which help the model resolve
potential ambiguity issues and understand entity relationships
• This extracted information is transformed into coherent text and concatenated with the
question query to form the final input context.
Haoyang Li, Jing Zhang, Hanbing Liu, Ju Fan, Xiaokang Zhang, Jun Zhu, Renjie Wei, Hongyan Pan, Cuiping Li, Hong Chen:
CodeS: Towards Building Open-source Language Models for Text-to-SQL. Proc. ACM Manag. Data 2(3): 127 (2024) 59
Sample-based Methods
Limitations:
60
Retrieval-based Methods
ReBoost engages with the LLMs model using the Explain-Squeeze Schema Linking
mechanism.
• This mechanism is a two-phase strategy. Initially, it presents a generalized schema to the
LLMs to establish a foundational understanding.
• Subsequently, it employs targeted prompting to elicit detailed associations between query
phrases and specific database entities, thereby enhancing accuracy in mapping queries to
database structures without incurring excessive token cost.
Guanghu Sui, Zhishuai Li, Ziyue Li, Sun Yang, Jingqing Ruan, Hangyu Mao, Rui Zhao: Reboost Large Language Model-based Text-to-
61
SQL, Text-to-Python, and Text-to-Function - with Real Applications in Traffic Domain. CoRR abs/2310.18752 (2023)
Retrieval-based Methods
REGROUP constructs a formulaic knowledge base encompassing various domains, such
as finance, real estate, and transportation.
• It leverages a Dense Passage Retriever (DPR) to compute similarity scores for the
retrieval results from the formulaic knowledge base.
• Subsequently, an Erasing-Then-Awakening (ETA) model is used to integrate the entities
in these formulaic knowledge items with the entities in NL and schema.
Longxu Dou, Yan Gao, Xuqi Liu, Mingyang Pan, Dingzirui Wang, Wanxiang Che, Dechen Zhan, Min-Yen Kan, Jian-Guang Lou:
62
Towards Knowledge-Intensive Text-to-SQL Semantic Parsing with Formulaic Knowledge. CoRR abs/2301.01067 (2023)
Retrieval-based Methods
Limitations:
63
Outline
• NL2SQL Problem and Background
• Language Model-Powered NL2SQL Solutions
• Pre-Processing
• NL2SQL Translation Methods
• Encoding Strategy
• Decoding Strategy
• Task-specific Prompt Strategies
• Intermediate Representation
• Post-Processing
• NL2SQL Benchmarks
• Evaluation and Error Analysis
• Practical Guidance for Developing NL2SQL Solutions
• Open Problems 64
Encoding Strategy
Motivation:
• Converts unstructured and semi-structured data into a form that can be
processed for generating SQL queries.
• The encoding process captures the semantic meaning of the NL input and the
structural information of the database schema, enabling the model to
understand and map the user’s intent to the corresponding SQL query.
Goal:
• Transform NL and database schema into a structured format that can
be effectively utilized by a language model.
65
Method classification of Encoding Strategy
Method classification:
• Sequential Encoding Strategy
• Graph-based Encoding Strategy
• Separate Encoding Strategy
66
Sequential Encoding Strategy
BRIDGE enhances the alignment between text and the database schema.
• It represents the NL question and database schema as a tagged sequence.
• It inserts matched database cell values (anchor texts) after the corresponding fields.
• This method ensures that the most relevant schema items are prioritized, improving the
overall efficiency of the encoding process.
Limitations:
• Its linear approach, which treats all inputs as sequences of tokens, may fail
to fully capture the complex relationships between the database schema
and NL query.
• This can affect the model’s ability to understand and generate complex
queries.
69
Graph-based Encoding Strategy
RAT-SQL introduces a relation-aware self-attention mechanism.
• It allows the model to explicitly consider and utilize predefined relational information when jointly
encoding the question and the database schema.
• These relationships are represented as a graph structure, and it can more effectively capture the
structural information in the schema and its alignment with the NL query.
An illustration of an example schema as a graph G One RAT layer in the schema encoder.
Bailin Wang, Richard Shin, Xiaodong Liu, Oleksandr Polozov, Matthew Richardson:
70
RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers. ACL 2020: 7567-7578
Graph-based Encoding Strategy
S²SQL uses ELECTRA as its pre-trained language model.
• It enhances encoding by injecting syntactic structure information at the encoding stage, improving
the semantic understanding and generation of models.
An overview
of S²SQL framework.
Binyuan Hui, Ruiying Geng, Lihan Wang, Bowen Qin, Yanyang Li, Bowen Li, Jian Sun, Yongbin Li:
71
S²SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers. ACL (Findings) 2022: 1254-1262
Graph-based Encoding Strategy
G³R introduces the LGESQL encoder.
• It captures and integrates multisource heterogeneous information by constructing and utilizing a
heterogeneous graph and a Graph Attention Network (GAT) , thereby enhancing the representation
capability for NL and database schema.
An overview of the
architecture of G³R
framework.
Yanzheng Xiang, Qian-Wen Zhang, Xu Zhang, Zejie Liu, Yunbo Cao, Deyu Zhou:
72
G³R: A Graph-Guided Generate-and-Rerank Framework for Complex and Cross-domain Text-to-SQL Generation. ACL (Findings) 2023: 338-352
Graph-based Encoding Strategy
Limitation:
• Require more sophisticated algorithms for constructing and processing
graph structures.
• Need a large amount of training data to fully leverage its structural
advantages.
73
Separate Encoding Strategy
TKK employs task decomposition and multi-task learning strategies in encoding.
• It breaks down the complex NL2SQL task into multiple subtasks.
• In this way, it can progressively acquire and combine knowledge.
Chang Gao, Bowen Li, Wenxuan Zhang, Wai Lam, Binhua Li, Fei Huang, Luo Si, Yongbin Li:
74
Towards Generalizable and Robust Text-to-SQL Parsing. EMNLP (Findings) 2022: 2113-2125
Separate Encoding Strategy
SC-Prompt divides text encoding into two stages: the structure stage and the content
stage, with each stage being encoded separately.
Zihui Gu, Ju Fan, Nan Tang, Lei Cao, Bowen Jia, Sam Madden, Xiaoyong Du:
75
Few-shot Text-to-SQL Translation using Structure and Content Prompt Learning. Proc. ACM Manag. Data 1(2): 147:1-147:28 (2023)
Separate Encoding Strategy
Limitations:
• Require multiple processing of input data
• May extend the training and inference time of the model
76
Outline
• NL2SQL Problem and Background
• Language Model-Powered NL2SQL Solutions
• Pre-Processing
• NL2SQL Translation Methods
• Encoding Strategy
• Decoding Strategy
• Task-specific Prompt Strategies
• Intermediate Representation
• Post-Processing
• NL2SQL Benchmarks
• Evaluation and Error Analysis
• Practical Guidance for Developing NL2SQL Solutions
• Open Problems 77
Decoding Strategy
Motivation:
• The choice of decoding strategy directly affects the quality and
performance of the generated SQL queries.
• An excellent decoding strategy not only produces syntactically correct SQL
queries but also ensures that the semantics of the SQL queries align with
the NL and can even optimize the execution efficiency of the queries.
Goal:
• Convert the representations generated by the encoder into the target
SQL queries.
78
Method classification of Decoding Strategy
Method classification:
• Greedy search-based decoding strategy
• Beam search-based decoding strategy
• Constraint-aware incremental decoding
strategy
79
Greedy Search-based Decoding Strategy
The greedy search-based decoding strategy is a
simple and fast approach for decoding.
• At each decoding step, greedy search selects the token with
the highest current probability as the output.
• This strategy builds the final output sequence by continuously
choosing the locally optimal solution.
• Since the default decoding strategy of GPT series models
(e.g., GPT-4) is greedy search-based decoding, NL2SQL
solutions based on GPT fall into this category.
81
Beam Search-based Decoding Strategy
RAT-SQL combines relation-aware graph structure encoding and generation techniques.
• During the decoding process, RAT-SQL uses beam search to generate multiple
candidate SQL queries
• These queries are then reranked, and the optimal query is selected based on graph
structure information.
Bailin Wang, Richard Shin, Xiaodong Liu, Oleksandr Polozov, Matthew Richardson:
RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers. ACL 2020: 7567-7578 82
Beam Search-based Decoding Strategy
SmBoP uses a semi-autoregressive bottom-up decoding approach, improving decoding
efficiency by parallelizing the construction and scoring of sub-trees, achieving
logarithmic time complexity.
Limitations:
• More data needs to be processed at each decoding step.
• This significantly increases the demand for memory and computational
resources and results in slower decoding speeds than greedy search.
84
Constraint-aware Incremental Decoding Strategy
PICARD (Parsing Incrementally for Constrained Auto-Regressive Decoding) introduces the
constraint-aware incremental decoding strategy.
• The constraint-aware incremental decoding strategy is specifically designed for NL2SQL
tasks.
• This strategy aims to ensure the generation of syntactically correct SQL queries by
incorporating constraints during the decoding process.
• Compared to versions without PICARD, these models have shown improved accuracy.
Limitations:
87
Outline
• NL2SQL Problem and Background
• Language Model-Powered NL2SQL Solutions
• Pre-Processing
• NL2SQL Translation Methods
• Encoding Strategy
• Decoding Strategy
• Task-specific Prompt Strategies
• Intermediate Representation
• Post-Processing
• NL2SQL Benchmarks
• Evaluation and Error Analysis
• Practical Guidance for Developing NL2SQL Solutions
• Open Problems 88
Task-Specific Prompt Strategy
Motivation:
• Instruct the LLMs to optimize the SQL query generation process
according to task-specific rules.
Goal:
• Improve the accuracy of translating complex semantic NL query into
the corresponding SQL query.
89
Task-Specific Prompt Strategy
Method classification:
• Chain-of-Thought
• Task Decomposition
90
Chain-of-Thought
CHESS transforms NL into SQL statements using a streamlined pipeline that relies on
LLMs and CoT.
• This CoT process comprises entity and context retrieval, schema selection, SQL
generation, and revision.
Jiaqi Guo, Zecheng Zhan, Yan Gao, Yan Xiao, Jian-Guang Lou, Ting Liu, Dongmei Zhang:
Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation. ACL (1) 2019: 4524-4535 91
Chain-of-Thought
Limitations:
• Hallucination caused by unknown domain knowledge
• Hard to control the output of key components
92
Decomposition
TKK divides the initial NL2SQL parsing tasks into various small individual subtasks, with
each corresponding to the mapping of the NL query to one or more clauses of the SQL query.
• This decomposition approach allows the model to concentrate on learning the generation of
each clause, thereby compelling the model to understand the problem, the database schema,
and the alignment between each SQL clause.
Chang Gao, Bowen Li, Wenxuan Zhang, Wai Lam, Binhua Li, Fei Huang, Luo Si, Yongbin Li:
93
Towards Generalizable and Robust Text-to-SQL Parsing. EMNLP (Findings) 2022: 2113-2125
Decomposition
MAC-SQL incorporates a Decomposer agent designed to break down the user's original
problem into several subproblems.
• This decomposition process aims to lessen the complexity of the origin question,
enabling the generation of simpler SQL queries to solve each individual subproblem.
Bing Wang, Changyu Ren, Jian Yang, Xinnian Liang, Jiaqi Bai, Qian-Wen Zhang, Zhao Yan, Zhoujun Li:
MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL. CoRR abs/2312.11242 (2023) 94
Decomposition
DIN-SQL employs a sophisticated categorization module for decomposition.
• It classifies queries into distinct complexity groups: Easy, Non-Nested Complex, and Nested Complex,
with the reference of NL and database schema.
• By strategically identifying and separating schema linking, join conditions, and nested structures, the
module facilitates a structured generation of SQL queries and amplifies the accuracy of translating
complex the NL query into executable SQL.
Mohammadreza Pourreza, Davood Rafiei: DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction. NeurIPS 2023
95
Decomposition
Limitations:
• Increasing computational cost
• Lowing efficiency because the model need to finish more sub-tasks
96
Outline
• NL2SQL Problem and Background
• Language Model-Powered NL2SQL Solutions
• Pre-Processing
• NL2SQL Translation Methods
• Encoding Strategy
• Decoding Strategy
• Task-specific Prompt Strategies
• Intermediate Representation
• Post-Processing
• NL2SQL Benchmarks
• Evaluation and Error Analysis
• Practical Guidance for Developing NL2SQL Solutions
• Open Problems 97
Intermediate Representation
Motivation:
• Design a grammar-free intermediate
representation compared to SQL as the bridge
between the ''free-form'' NL query and the
''constrained and formal'' SQL query.
Goal:
• Capture the essential components and
relationships of an NL query without the
strict syntax rules of SQL.
98
Method Classification of Intermediate Representation
Method classification:
99
Intermediate Representation
NatSQL is a widely recognized SQL-like
syntax language that eliminates SQL
statement operators, keywords, set
operators, and other elements seldom
found in user problem descriptions.
• It enhances schema linking by
minimizing the necessary number of
schema items.
Yujian Gan, Xinyun Chen, Jinxia Xie, Matthew Purver, John R. Woodward, John H. Drake, Qiaofu Zhang:
Natural SQL: Making SQL Easier to Infer from Natural Language Specifications. EMNLP (Findings) 2021: 2030-2042 100
SQL-Like Syntax Language
QPL leverages the problem decomposition
strategy to improve the parsing of intricate
SQL queries.
• By breaking down a SQL query into
modularized sub-queries, the complexity of
the original query is reduced.
• This approach mitigates parsing difficulties
associated with complex problems and
cross-domain complex queries.
Ben Eyal, Moran Mahabi, Ophir Haroche, Amir Bachar, Michael Elhadad:
Semantic Decomposition of Question and SQL for Text-to-SQL Parsing. EMNLP (Findings) 2023: 13629-13645 101
SQL-Like Syntax Language
QDMR decomposes the original question into a number of atomic questions
• Each atomic question serves as an intermediate representation of the original question and
can be translated into a set of small-scale formal operations involving tasks such as selecting
entities, retrieving attributes, or aggregating information.
Limitations:
103
SQL-Like Sketch Structure
CatSQL constructs a general template sketch with slots serving as initial placeholders.
• Its base model focus on the parsing of user queries to fill these placeholders, consequently
decreasing the computational resource cost.
• It implements a novel semantic correction algorithm to assess the semantic accuracy of the
resulting SQL queries and rectify any semantic issues detected in the generated queries.
Han Fu, Chang Liu, Bin Wu, Feifei Li, Jian Tan, Jianling Sun:
104
CatSQL: Towards Real World Natural Language to SQL Applications. Proc. VLDB Endow. 16(6): 1534-1547 (2023)
SQL-Like Sketch Structure
SC-prompt utilizes a two-stage divide and conquer method for NL2SQL parsing
• It instructs PLM to generate specific SQL structures.
• In the subsequent phase, it directs the PLM to generate SQL structures containing actual
values to fill the previously provided placeholders.
Zihui Gu, Ju Fan, Nan Tang, Lei Cao, Bowen Jia, Sam Madden, Xiaoyong Du:
105
Few-shot Text-to-SQL Translation using Structure and Content Prompt Learning. Proc. ACM Manag. Data 1(2): 147:1-147:28 (2023)
SQL-Like Sketch Structure
ZeroNL2SQL integrates the schema alignment capabilities of PLM with the complex reasoning
capabilities of LLMs.
• Initially, it utilizes PLM to produce SQL sketches for achieving schema alignment and
subsequently employs LLMs to execute complex content reasoning for populating missing
information.
• Additionally, it also proposes a predicate calibration method for guiding the design of language
models for SQL sketches based on database instances and selecting the optimal SQL query.
Zihui Gu, Ju Fan, Nan Tang, Songyue Zhang, Yuxin Zhang, Zui Chen, Lei Cao, Guoliang Li, Sam Madden, Xiaoyong Du:
106
Interleaving Pre-Trained Language Models and Large Language Models for Zero-Shot NL2SQL Generation. CoRR abs/2306.08891 (2023)
SQL-Like Sketch Structure
Limitations:
• High computational cost
• Hard to construct accurate sketch when the query is much more complex
107
Outline
• NL2SQL Problem and Background
• Language Model-Powered NL2SQL Solutions
• Pre-Processing
• NL2SQL Translation Methods
• Post-Processing
• NL2SQL Benchmarks
• Evaluation and Error Analysis
• Practical Guidance for Developing NL2SQL Solutions
• Open Problems
108
Post Processing
Motivation:
• Post-processing is a crucial step to refine the generated SQL queries,
ensuring they meet user expectations more accurately.
• This involves enhancing the initial SQL output using various
strategies.
109
SQL Correction Strategies
Motivation:
• The SQL generated by the model may sometimes contain syntax errors.
• SQL Correction Strategies aim to prompt LLMs to correct syntax errors
in SQL.
Goal:
• Convert the buggy SQL query into the correct SQL queries.
110
SQL Correction Strategies
DIN-SQL introduces a self-correction module for refining SQL queries generated by NL-
to-SQL models, addressing issues like redundant keywords or missing functions.
Two prompts are used: a general prompt for the CodeX model, which explicitly asks to
identify and fix errors in the “BUGGY SQL,” and a milder prompt for GPT-4, which simply
asks the model to point out potential issues without assuming the presence of errors.
Pourreza, Mohammadreza, and Davood Rafiei. "Din-sql: Decomposed in-context learning of text-to-sql with self correction." Advances in
Neural Information Processing Systems 36 (2024). 111
SQL Correction Strategies
Fin-SQL introduces an output calibration
algorithm to avoid executing time-
consuming SQL queries on large
financial databases.
Fin-SQL corrects spelling errors,
performs fuzzy matching for invalid
columns, and ensures that all tables
containing referenced columns are
included in the FROM clause.
This approach helps improve SQL query
accuracy without actual query execution.
Zhang, C., Mao, Y., Fan, Y., Mi, Y., Gao, Y., Chen, L., Lou, D. and Lin, J. 2024. FinSQL: Model-Agnostic LLMs-based Text-to-SQL
Framework for Financial Analysis. arXiv. 112
SQL Correction Strategies
Limitations:
• Primarily focus on specific syntax errors correction
• Lack of generalized error correction for other SQL errors
113
Output Consistency
Motivation:
• In complex reasoning problems, there are multiple reasoning paths
that can lead to a unique correct answer. It first samples multiple
different reasoning paths and then selects the most consistent answer
to significantly improve the quality of the output.
• In the NL2SQL task, being similar to reasoning tasks, can also have
multiple ways to write SQL queries that express the same meaning.
Goal:
• Vote to determine the final SQL to enhance the consistency of model
outputs.
114
Output Consistency
C3-SQL incorporates the Consistency
Output (CO) component to enhance
the reliability of SQL queries
generated by large language models,
addressing their inherent
randomness.
CO samples multiple reasoning
paths to generate different SQL
queries, executes them, and uses a
voting mechanism to select the most
consistent result after removing
errors.
Dong, X., Zhang, C., Ge, Y., Mao, Y., Gao, Y., Chen, lu, Lin, J. and Lou, D. 2023. C3: Zero-shot Text-to-SQL with ChatGPT. arXiv. 115
Output Consistency
Fin-SQL addresses the randomness of
LLM outputs by generating multiple
candidate SQL queries in parallel. It
evaluates the equivalence of these
queries based on keyword consistency
and clusters them accordingly.
The final query is selected from the
largest cluster, improving the consistency
and reliability of SQL query generation.
Zhang, C., Mao, Y., Fan, Y., Mi, Y., Gao, Y., Chen, L., Lou, D. and Lin, J. 2024. FinSQL: Model-Agnostic LLMs-based Text-to-SQL
Framework for Financial Analysis. arXiv. 116
Output Consistency
The self-consistency method for LLMs uses high temperatures to create diverse
outputs but may suffer from performance degradation due to increased
hallucinations.
PET-SQL addresses this by employing a cross-consistency strategy where multiple
LLMs generate SQL queries at lower temperatures and select results through
execution-based voting. This approach enhances query diversity while preserving
LLM performance at lower temperatures.
Li, Z., Wang, X., Zhao, J., Yang, S., Du, G., Hu, X., Zhang, B., Ye, Y., Li, Z., Zhao, R. and Mao, H. 2024. PET-SQL: A Prompt-Enhanced Two-
Round Refinement of Text-to-SQL with Cross-consistency. arXiv. 117
Output Consistency
Limitations:
Ø Increased inference cost and time
Ø Reliance on multiple queries may not address all types of errors
Ø Effectiveness may decrease as LLM capabilities improve
118
Execution-Guided Strategies
Motivation:
• The execution result of an SQL query is a crucial piece of information,
as it may indicate the quality of the query itself. For instance, if the
query execution results in errors or NULL values, it might indicate an
issue with the SQL query.
Goal:
• Incorporating the results of SQL queries to guide subsequent
processing of the SQL.
119
Execution-Guided Strategies
ZeroNL2SQL generates SQL queries iteratively
by checking the executability of multiple
candidate SQL sketches.
It provides error feedback to the LLMs, refining
the query until a feedback limit is reached, at
which point the current sketch is discarded.
This process continues until the LLMs generate
an optimal, executable SQL query as the final
output.
Gu, Z., Fan, J., Tang, N., Zhang, S., Zhang, Y., Chen, Z., Cao, L., Li, G., Madden, S. and Du, X. 2023. Interleaving Pre-Trained Language
120
Models and Large Language Models for Zero-Shot NL2SQL Generation. arXiv.
Execution-Guided Strategies
MAC-SQL introduces a Refiner agent that detects and corrects SQL errors in a multi-
agent collaboration framework.
The agent evaluates SQL queries for syntactic correctness, execution feasibility, and
whether they retrieve non-empty results from the database. If errors are found, it uses
feedback or guidance to correct the SQL, enhancing the model’s self-diagnosis and
overall accuracy.
Wang, B., Ren, C., Yang, J., Liang, X., Bai, J., Chai, L., Yan, Z., Zhang, Q.-W., Yin, D., Sun, X. and Li, Z. 2024. MAC-SQL: A Multi-Agent
Collaborative Framework for Text-to-SQL. arXiv. 121
Execution-Guided Strategies
Limitations:
Ø Increased SQL generation time, especially with large databases
Ø Dependency on execution results may not address logical errors
Ø Limited scalability for complex or extensive databases
122
N-best Rankers Strategies
Motivation:
• In cross-domain scenarios, the generated SQL queries may exhibit
subtle variances in both structure and semantics.
• We can reranking the generated SQL by larger model or using
additional knowledge.
Goal:
• Reordering the top n results from the original model using a larger
model or incorporating additional knowledge sources.
123
N-best Rankers Strategies
G3R addresses the issue of unstable and dependent on threshold settings by
introducing a feature-enhanced SQL Reranker based on pre-trained language models
with hybrid prompt tuning, allowing it to bridge domain gaps without adding extra
parameters.
Xiang, Y., Zhang, Q.-W., Zhang, X., Liu, Z., Cao, Y. and Zhou, D. G3R: A Graph-Guided Generate-and-Rerank Framework for Complex and
124
Cross-domain Text-to-SQL Generation.
N-best Rankers Strategies
Limitations:
Ø Limited applicability to LLMs with stronger inference capabilities
Ø Reduced effectiveness as LLMs increase in size and sophistication
Ø Reliance on additional models or knowledge for effective filtering
125
Outline
• NL2SQL Problem and Background
• Language Model-Powered NL2SQL Solutions
• Pre-Processing
• NL2SQL Translation Methods
• Post-Processing
• NL2SQL Benchmarks
• Evaluation and Error Analysis
• Practical Guidance for Developing NL2SQL Solutions
• Open Problems
126
Why We Need Benchmark?
Motivation:
• To quantify the performance of NL2SQL models
• To provide sufficient data for training NL2SQL models
• To simulate the challenges in real-world application scenarios
127
An Overview of NL2SQL Benchmarks
128
Single-domain Dataset
Motivation:
• Early NL2SQL datasets focused on specific domains, often featuring
simple SQL query structures
• Recent single-domain datasets feature more complex databases and SQL
queries specific to particular scenarios
Li J, Hui B, Qu G, et al. Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls[J]. Advances
130
in Neural Information Processing Systems, 2024, 36.
Multi-turn Dataset
Motivation:
• In a real-world setting, users tend to
ask a sequence of thematically related
questions to learn about a particular
topic or to achieve a complex goal.
Motivation:
Ø NL2SQL systems often
require domain-specific
knowledge to effectively
perform the NL2SQL task
in real-world applications
within specific domains.
An example in KnowSQL
An example in AmbiQT
Bhaskar A, Tomar T, Sathe A, et al.
135
Benchmarking and improving text-to-sql generation under ambiguity[J]. arXiv preprint arXiv:2310.13659, 2023.
Synthetic NL2SQL Datasets
Motivation:
• Despite the fact that many datasets have been manually annotated so far,
the rapid development of LLMs in the NL2SQL field has led to a
growing demand for NL2SQL data.
End-to-end architecture
for automatic training
data generation in
ScienceBenchmark
Xinyu Liu, Shuyu Shen, Boyan Li, Peixian Ma, Runzhi Jiang, Yuyu Luo, Yuxin Zhang, Ju Fan, Guoliang Li, and Nan Tang. 2024.
137
A Survey of NL2SQL with Large Language Models: Where are we, and where are we going? arXiv:2408.05109 [cs.DB] https://arxiv.org/abs/2408.05109
NL2SQL Benchmark Discussion & Insights
• From the Redundancy Measure perspective
• We observe a trend from early datasets to recent ones where datasets have grown in
size, including increases in the number of questions and unique queries
• From the Database Complexity perspective
• The number of databases (and tables) in datasets correlates with the tasks (e.g.,
Single-domain vs. Robustness) they serve
• From the Query Complexity perspective
• Recent datasets show a growing emphasis on Scalar Functions and Mathematical
Computations in SQL queries, which introduces challenges in SQL generation
structure not seen in earlier datasets
138
The Next For NL2SQL Benchmark
• Despite the growing number of datasets proposed by the NL2SQL
community, we find that current datasets still exhibit a gap in SQL
complexity compared to real-world scenarios
• Recent datasets generally feature lower counts of SELECT keyword,
indicating fewer nested SQL queries or complex set operations
• Challenges related to Scalar Functions (e.g., ROUND) and
Mathematical Computations (e.g., Addition) also need further attention
139
Outline
• NL2SQL Problem and Background
• Language Model-Powered NL2SQL Solutions
• NL2SQL Benchmarks
• Evaluation and Error Analysis
• Evaluation Metrics
• Evaluation Toolkit
• Error Analysis
• Practical Guidance for Developing NL2SQL Solutions
• Open Problems
140
NL2SQL Evaluation Metrics
• Accuracy
• Execution Accuracy (EX)
• String-Match Accuracy (SM)
• Component-Match Accuracy (CM)
• Exact-Match Accuracy (EM)
• Efficiency
• Valid Efficiency Score (VES)
• Robustness
• Query Variance Testing (QVT)
141
NL2SQL Evaluation Metrics
• Accuracy
• Execution Accuracy (EX)
Yu T, Zhang R, Yang K, et al. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql
142
task[J]. arXiv preprint arXiv:1809.08887, 2018.
NL2SQL Evaluation Metrics
• Accuracy
• String-Match Accuracy (SM)
Zhong V, Xiong C, Socher R. Seq2sql: Generating structured queries from natural language using reinforcement learning[J]. arXiv preprint
143
arXiv:1709.00103, 2017.
NL2SQL Evaluation Metrics
• Accuracy
• Component-Match Accuracy (CM) SQL Components:
• SELECT
• WHERE
• GROUP
• …...
• N : the size of dataset
• YiC : the component of the i-th ground-truth SQL query Yi
• : the component of the i-th predicted SQL query
• 1(•) : an indicator function that equals 1 if the condition inside is satisfied,
and 0 otherwise
Yu T, Zhang R, Yang K, et al. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql
144
task[J]. arXiv preprint arXiv:1809.08887, 2018.
NL2SQL Evaluation Metrics
• Accuracy
• Exact-Match Accuracy (EM)
• Yu T, Zhang R, Yang K, et al. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql
• 145
task[J]. arXiv preprint arXiv:1809.08887, 2018.
NL2SQL Evaluation Metrics
• Efficiency
• Valid Efficiency Score (VES)
Li J, Hui B, Qu G, et al. Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls[J]. Advances
146
in Neural Information Processing Systems, 2024, 36.
NL2SQL Evaluation Metrics
• Robustness
• Query Variance Testing (QVT)
Li B, Luo Y, Chai C, et al. The Dawn of Natural Language to SQL: Are We Fully Ready?[J]. arXiv preprint arXiv:2406.01265, 2024.
147
Outline
• NL2SQL Problem and Background
• Language Model-Powered NL2SQL Solutions
• NL2SQL Benchmarks
• Evaluation and Error Analysis
• Evaluation Metrics
• Evaluation Toolkit
• Error Analysis
• Practical Guidance for Developing NL2SQL Solutions
• Open Problems
148
MT-TEQL
• MT-TEQL
• NL Utterance Variations
• Database Schema Variations
Ma P, Wang S. Mt-teql: evaluating and augmenting neural nlidb on real-world linguistic and schema variations[J]. Proceedings of the VLDB
149
Endowment, 2021, 15(3): 569-582.
NL2SQL360
• NL2SQL360
• SQL Complexity
• SQL Characterisitics (e.g., JOIN)
• Database Domain Adaption
• NL Query Variance Testing
• Economy
• Efficiency
• Features
• Multi-angle & Fine-grained Evaluation
• Scenario-based Evaluation
Li B, Luo Y, Chai C, et al. The Dawn of Natural Language to SQL: Are We Fully Ready?. VLDB 2024.
150
NL2SQL360: User Interface
Li B, Luo Y, Chai C, et al. The Dawn of Natural Language to SQL: Are We Fully Ready?. VLDB 2024. 151
NL2SQL360: User Interface
Li B, Luo Y, Chai C, et al. The Dawn of Natural Language to SQL: Are We Fully Ready?. VLDB 2024. 152
NL2SQL360 Case Study
• Dataset: Spider • Metric:
• Model: • Execution Accuracy (EX)
• LLM-based methods: • Exact-Match Accuracy (EM)
• DAILSQL, DAILSQL(SC) • Valid Efficiency Score (VES)
• DINSQL • Query Variance Testing (QVT)
• C3SQL
• SFT CodeS-1B/3B/7B/15B • Multi-Angle:
• PLM-based methods: • SQL Characteristics
• RESDSQL-3B, RESDSQL-3B + NatSQL • SQL Complexity
• Graphix + PICARD • NL Variance
• Database Domain Adaption
• Efficiency
• SQL Efficiency
• Economy (Cost)
Li B, Luo Y, Chai C, et al. The Dawn of Natural Language to SQL: Are We Fully Ready?. VLDB 2024. 153
NL2SQL360: Leaderboard
Li B, Luo Y, Chai C, et al. The Dawn of Natural Language to SQL: Are We Fully Ready?. VLDB 2024. 154
NL2SQL360 Case Study – JOIN Finding
We find that:
DINSQL outperforms in LLMs with prompting.
RESDSQL + NatSQL outperforms in PLMs.
Commonality:
Both methods use NatSQL as the intermediate
representation (IR) of SQL.
Finding:
Taking NatSQL reduces the complexity of predicting
JOIN operations and potentially enhances the model
performance.
Li B, Luo Y, Chai C, et al. The Dawn of Natural Language to SQL: Are We Fully Ready?. VLDB 2024. 155
NL2SQL360 Case Study – Subquery Finding
(P): Prompt-based
(FT): Fine-tuning-based
Li B, Luo Y, Chai C, et al. The Dawn of Natural Language to SQL: Are We Fully Ready?. VLDB 2024. 156
NL2SQL360 Case Study – QVT
• To further evaluate the robustness and flexibility of NL2SQL solutions in handling variations
in natural language queries, we propose a new metric called Query Variance Testing (QVT).
This metric assesses how well the models can adapt to different forms of NL queries.
Query Variance Testing (QVT): Given a SQL query Qi, there typically exist multiple corresponding NL
queries, denoted as pairs {(N1, Qi), (N2, Qi), . . . , (Nm, Qi)}. The formula for computing QVT accuracy is
defined as follows:
Li B, Luo Y, Chai C, et al. The Dawn of Natural Language to SQL: Are We Fully Ready?. VLDB 2024. 157
NL2SQL360 Case Study – QVT
Li B, Luo Y, Chai C, et al. The Dawn of Natural Language to SQL: Are We Fully Ready?. VLDB 2024. 158
Outline
• NL2SQL Problem and Background
• Language Model-Powered NL2SQL Solutions
• NL2SQL Benchmarks
• Evaluation and Error Analysis
• Evaluation Metrics
• Evaluation Toolkit
• Error Analysis
• Practical Guidance for Developing NL2SQL Solutions
• Open Problems
159
Error Analysis
Motivation:
• Error Analysis provides valuable insights into the limitations and
challenges faced by current models.
Goal:
• By systematically examining errors, researchers can identify specific
areas for improvement, enhance model robustness, and develop more
effective training strategies.
160
Existing Taxonomies for NL2SQL Errors
• Schema Linking
• Misunderstanding Database
• Misunderstanding Knowledge
• Reasoning
• Syntax-Related Errors
• Database related Errors
SQL-PaLM
Ruoxi Sun, Sercan Ö. Arik, Hootan Nakhost, Hanjun Dai, Rajarishi Sinha, Pengcheng Yin, Tomas
Pfister:SQLPaLM: Improved Large Language Model Adaptation for Text-to-SQL. CoRR abs/2306.00739 (2023) 161
Existing Taxonomies for NL2SQL Errors
• Schema Linking
• JOIN
• GROUP BY
• Queries with Nesting and
Set Operations
• Invalid SQL
• Miscellaneous
DIN-SQL
Mohammadreza Pourreza, Davood Rafiei:
162
DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction. NeurIPS 2023
Existing Taxonomies for NL2SQL Errors
Limitations:
• Designed for specific datasets
• Limited scalability
• Due to inconsistent taxonomies, the annotated data cannot be shared
or compared between different studies
163
Principles for NL2SQL Errors Taxonomy
• Comprehensiveness: The taxonomy should encompass all potential errors that
could occur during the NL2SQL conversion process.
• Mutual Exclusivity: Each error type should be clearly distinct with no overlap,
to avoid ambiguity in error classification.
• Extensibility: The taxonomy should be adaptable to incorporate new error
types as NL2SQL technologies and methodologies evolve.
• Practicality: The taxonomy should be practical and applicable in real-world
settings, aiding developers in diagnosing and correcting errors effectively
164
Our Two-level Taxonomy
• Error Localization: Focus on
identifying the specific parts of
the SQL where errors occur
165
Outline
• NL2SQL Problem and Background
• Language Model-Powered NL2SQL Solutions
• NL2SQL Benchmarks
• Evaluation and Error Analysis
• Practical Guidance for Developing NL2SQL Solutions
• A Data-Driven Roadmap of Optimizing LLMs for NL2SQL
• Decision Flow of Selecting NL2SQL Modules
• Open Problems
166
A Roadmap of Optimizing LLMs for NL2SQL
167
A Roadmap of Optimizing LLMs for NL2SQL
A Case Study
CodeS designs open-source NL2SQL LLM to solve data privacy risks when using GPT-4
Haoyang Li, Jing Zhang, Hanbing Liu, Ju Fan, Xiaokang Zhang, Jun Zhu, Renjie Wei, Hongyan Pan, Cuiping Li, Hong Chen:CodeS:
168
Towards Building Open-source Language Models for Text-to-SQL. Proc. ACM Manag. Data 2(3): 127 (2024)
Outline
• NL2SQL Problem and Background
• Language Model-Powered NL2SQL Solutions
• NL2SQL Benchmarks
• Evaluation and Error Analysis
• Practical Guidance for Developing NL2SQL Solutions
• A Data-Driven Roadmap of Optimizing LLMs for NL2SQL
• Decision Flow of Selecting NL2SQL Modules
• Open Problems
169
Decision Flow of Selecting Modules
170
Decision Flow of Selecting Modules
• Scenario 1:
• Complex Database Schema with
Numerous Tables and Columns.
• Recommendation:
• Schema Linking.
• Advantages:
• Reduced token costs.
• Decreased noise from the
redundant database schema.
• Disadvantages:
• Increased time costs.
171
Decision Flow of Selecting Modules
• Scenario 2:
• Mismatch between Database Content and NL.
• Recommendation:
• Database Content Retrieval.
• Advantages:
• Increased the accuracy of SQL value selection.
• Disadvantages:
• Increased time costs.
• Increased token costs.
172
Outline
• NL2SQL Problem and Background
• Language Model-Powered NL2SQL Solutions
• NL2SQL Benchmarks
• Evaluation and Error Analysis
• Practical Guidance for Developing NL2SQL Solutions
• Open Problems
173
Open Problem- Open NL2SQL Problem
• Problem & Goal:
• Open NL2SQL Problem: Querying Multiple Databases.
• Motivation:
• In many real-world scenarios, such as government open data
platforms, answering complex questions often requires querying
multiple databases and aggregating results.
• Key Idea:
• Database Retrieval
• Handling Heterogeneous Schemas
• Answer Aggregation
• Domain Adaptation
• Scalability and Efficiency
• Evaluating and Benchmarking 174
Open Problem- Cost-effective NL2SQL Methods
175
Open Problem- Trustworthy NL2SQL Solutions
176
Open Problem- NL2SQL with Ambiguous and
Unspecified NL Queries
• Problem & Goal:
• NL2SQL with Ambiguous and Unspecified NL Queries: Addressing
Vague Inputs.
• Motivation:
• Ambiguous or unspecified queries are common in real-world
scenarios and pose challenges for NL2SQL systems.
• Key Idea:
• NL Query Rewriter.
• NL Query Auto-completion.
• Training with Ambiguous Queries.
177
Open Problem- Adaptive Training Data Synthesis
• Problem:
• Adaptive Training Data Synthesis: Improving Model Generalization.
• Motivation:
• Learning-based NL2SQL methods often struggle with new domains
and require high-quality, diverse training data for effective
performance.
• Key Idea:
• Adaptive Data Synthesis.
• Incorporating Feedback.
178
1. Make NL2SQL Methods Trustworthy
• NL2SQL Debugging
• NL Rewriter
2. Modularized NL2SQL Solutions