arXiv preprint arXiv:2307.07306 , year=

Xuemei Dong, Chao Zhang, Yuhang Ge, Yuren Mao, Yunjun Gao, lu Chen, Jinshu Lin, Dongfang Lou · 2023 · arXiv 2307.07306

23 Pith papers cite this work. Polarity classification is still indexing.

23 Pith papers citing it

read on arXiv browse 23 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

ExCyTIn-Bench: Evaluating LLM agents on Cyber Threat Investigation

cs.CR · 2025-07-14 · unverdicted · novelty 8.0

ExCyTIn-Bench is the first benchmark of 7542 questions from Microsoft Sentinel threat investigation graphs, where the best LLM agent achieves a reward of 0.606.

A Semantic-Layer-Mediated Agent for Natural Language to SQL over Heterogeneous Enterprise Databases

cs.CL · 2026-06-30 · unverdicted · novelty 7.0

A semantic-layer-mediated NL2SQL agent using SMQ achieves 94.15% execution accuracy on the 547-task Spider2-snow benchmark with Gemini 3 Pro.

ACE-SQL: Adaptive Co-Optimization via Empirical Credit Assignment for Text-to-SQL

cs.CL · 2026-06-04 · unverdicted · novelty 7.0

ACE-SQL jointly optimizes schema linking and SQL generation via RL with empirical credit assignment from execution-correct rollouts, achieving 65.3% greedy execution accuracy on BIRD Dev using 0.93k output tokens.

EntSQL: A Benchmark for Grounding Text-to-SQL in Long-Context Enterprise Knowledge

cs.CL · 2026-06-02 · unverdicted · novelty 7.0

EntSQL is a new benchmark with 1,066 examples across five domains where top systems reach only 15.9% accuracy on English inputs when long-form enterprise documents are provided.

Residual Skill Optimization for Text-to-SQL Ensembles

cs.CL · 2026-05-20 · unverdicted · novelty 7.0

Residual skill optimization creates complementary Text-to-SQL agents by training each new skill on prior ensemble failures, yielding accuracy gains on Spider2-Lite and transfer to other dialects and tasks.

EXPO-SQL: Execution-based Clause-level Policy Optimization for Text-to-SQL

cs.CL · 2026-04-29 · unverdicted · novelty 7.0

EXPO-SQL improves Text-to-SQL by using clause-level rewards derived from execution error messages and incremental clause execution instead of uniform query-level rewards.

ROSE: An Intent-Centered Evaluation Metric for NL2SQL

cs.DB · 2026-04-14 · unverdicted · novelty 7.0

ROSE is an intent-centered NL2SQL metric using an adversarial Prover-Refuter cascade that achieves higher human-expert agreement than prior metrics on a new validation set.

NL2SQLBench: A Modular Benchmarking Framework for LLM-Enabled NL2SQL Solutions

cs.DB · 2026-04-13 · conditional · novelty 7.0

NL2SQLBench is a new modular benchmarking framework that evaluates LLM NL2SQL methods across three core modules on existing datasets, exposing large accuracy gaps and computational inefficiency.

Large Language Model-Enhanced Relational Operators: Taxonomy, Benchmark, and Analysis

cs.DB · 2026-03-03 · unverdicted · novelty 7.0

The authors define a taxonomy for LLM-enhanced relational operators categorized into Select, Match, Impute, Cluster and Order, and release LROBench to evaluate single and multi-operator queries on semantic database processing.

Database Context Compression for Text-to-SQL on Real-World Large Databases

cs.DB · 2026-06-26 · unverdicted · novelty 6.0

DBCC applies SGCF-based offline compression and online purification to shrink database context by up to 100x while raising schema recall and execution accuracy 1.8-1.9% on Spider 2.0 and BIRD.

ZAS-SQL: Distilling Rules from Failures for Zero-Shot Text-to-SQL

cs.CL · 2026-06-06 · unverdicted · novelty 6.0

ZAS-SQL distills rules from zero-shot Text-to-SQL failures to reach 87.2-88.6% execution accuracy on Spider, new zero-shot SOTA surpassing some GPT-4 few-shot and fine-tuned baselines.

EviLink: Multi-Path Schema Linking with Uncertainty-Guided Evidence Acquisition for Large-Scale Text-to-SQL

cs.CL · 2026-05-28 · unverdicted · novelty 6.0

EviLink combines multi-hypothesis schema grounding with uncertainty-guided evidence acquisition, reporting 90.15% field-level recall and 123.30K average tokens on Spider2-Snow while improving downstream SQL generation.

EGREFINE: An Execution-Grounded Optimization Framework for Text-to-SQL Schema Refinement

cs.DB · 2026-05-01 · unverdicted · novelty 6.0

EGRefine optimizes column renamings via execution-grounded verification and view materialization to recover Text-to-SQL accuracy lost to schema naming issues while guaranteeing query equivalence.

PiLLar: Matching for Pivot Table Schema via LLM-guided Monte-Carlo Tree Search

cs.DB · 2026-04-29 · unverdicted · novelty 6.0

PiLLar is the first LLM-guided Monte-Carlo Tree Search framework for joint schema-value matching on pivot tables, achieving 87.94% average accuracy on a new benchmark PTbench derived from real-world domains.

SQL Query Engine: A Self-Healing LLM Pipeline for Natural Language to PostgreSQL Translation

cs.DB · 2026-04-15 · unverdicted · novelty 6.0

A self-healing LLM pipeline for natural language to PostgreSQL translation achieves up to 9.3 percentage point accuracy gains on benchmarks through error diagnosis and anti-regression mechanisms.

AV-SQL: Decomposing Complex Text-to-SQL Queries with Agentic Views

cs.DB · 2026-04-08 · unverdicted · novelty 6.0

AV-SQL uses a pipeline of LLM agents to generate intermediate CTE views that decompose complex Text-to-SQL queries, reaching 70.38% execution accuracy on Spider 2.0.

Knapsack Optimization-based Schema Linking for LLM-based Text-to-SQL Generation

cs.CL · 2025-02-18 · unverdicted · novelty 6.0

KaSLA applies knapsack optimization hierarchically to schema linking for LLM text-to-SQL, claiming better results than large models and improved SQL generation on Spider and BIRD.

RAS: Reflection-Augmented Scaling with In-Context Learning for Executable Cypher Query Generation

cs.CL · 2026-05-21 · unverdicted · novelty 5.0

RAS conditions each new Cypher query attempt on prior execution errors through ICL and reduces execution error rate by 41-50% at n=5 versus 32-38% for independent scaling across three Neo4j datasets and five models.

SecureMCP: A Policy-Enforced LLM Data Access Framework for AIoT Systems via Model Context Protocol

cs.CR · 2026-05-06 · unverdicted · novelty 5.0

SecureMCP integrates RBAC with five sequential defense modules in an MCP server to achieve 82.3% policy compliance against adversarial LLM SQL queries in AIoT while preserving execution accuracy.

MARS-SQL: A multi-agent reinforcement learning framework for Text-to-SQL

cs.CL · 2025-11-02 · unverdicted · novelty 5.0

MARS-SQL trains a multi-agent RL system with ReAct-style interaction and generative validation to produce SQL queries, reaching 77.84% execution accuracy on BIRD dev and 89.75% on Spider test.

XiYan-SQL: A Novel Multi-Generator Framework For Text-to-SQL

cs.CL · 2025-07-07 · unverdicted · novelty 5.0

XiYan-SQL achieves SOTA Text-to-SQL accuracy by combining schema filtering, a multi-generator ensemble fine-tuned on varied SQL formats, and a selection model.

CHESS: Contextual Harnessing for Efficient SQL Synthesis

cs.LG · 2024-05-27 · conditional · novelty 5.0

CHESS deploys four LLM agents to retrieve information, prune schemas, generate refined SQL candidates, and validate via unit tests, reporting up to 71.10% accuracy on BIRD with 83% fewer calls than leading proprietary baselines.

Are Diffusion Language Models Good Database Analysts?

cs.DB · 2026-05-27 · unverdicted · novelty 4.0

Introduces a standardized evaluation setup and SQL-D1 agent for diffusion language models on NL2SQL, claiming structural robustness advantages over autoregressive models.

citing papers explorer

Showing 23 of 23 citing papers.

ExCyTIn-Bench: Evaluating LLM agents on Cyber Threat Investigation cs.CR · 2025-07-14 · unverdicted · none · ref 9
ExCyTIn-Bench is the first benchmark of 7542 questions from Microsoft Sentinel threat investigation graphs, where the best LLM agent achieves a reward of 0.606.
A Semantic-Layer-Mediated Agent for Natural Language to SQL over Heterogeneous Enterprise Databases cs.CL · 2026-06-30 · unverdicted · none · ref 6
A semantic-layer-mediated NL2SQL agent using SMQ achieves 94.15% execution accuracy on the 547-task Spider2-snow benchmark with Gemini 3 Pro.
ACE-SQL: Adaptive Co-Optimization via Empirical Credit Assignment for Text-to-SQL cs.CL · 2026-06-04 · unverdicted · none · ref 32
ACE-SQL jointly optimizes schema linking and SQL generation via RL with empirical credit assignment from execution-correct rollouts, achieving 65.3% greedy execution accuracy on BIRD Dev using 0.93k output tokens.
EntSQL: A Benchmark for Grounding Text-to-SQL in Long-Context Enterprise Knowledge cs.CL · 2026-06-02 · unverdicted · none · ref 9
EntSQL is a new benchmark with 1,066 examples across five domains where top systems reach only 15.9% accuracy on English inputs when long-form enterprise documents are provided.
Residual Skill Optimization for Text-to-SQL Ensembles cs.CL · 2026-05-20 · unverdicted · none · ref 10
Residual skill optimization creates complementary Text-to-SQL agents by training each new skill on prior ensemble failures, yielding accuracy gains on Spider2-Lite and transfer to other dialects and tasks.
EXPO-SQL: Execution-based Clause-level Policy Optimization for Text-to-SQL cs.CL · 2026-04-29 · unverdicted · none · ref 18
EXPO-SQL improves Text-to-SQL by using clause-level rewards derived from execution error messages and incremental clause execution instead of uniform query-level rewards.
ROSE: An Intent-Centered Evaluation Metric for NL2SQL cs.DB · 2026-04-14 · unverdicted · none · ref 1
ROSE is an intent-centered NL2SQL metric using an adversarial Prover-Refuter cascade that achieves higher human-expert agreement than prior metrics on a new validation set.
NL2SQLBench: A Modular Benchmarking Framework for LLM-Enabled NL2SQL Solutions cs.DB · 2026-04-13 · conditional · none · ref 10
NL2SQLBench is a new modular benchmarking framework that evaluates LLM NL2SQL methods across three core modules on existing datasets, exposing large accuracy gaps and computational inefficiency.
Large Language Model-Enhanced Relational Operators: Taxonomy, Benchmark, and Analysis cs.DB · 2026-03-03 · unverdicted · none · ref 13
The authors define a taxonomy for LLM-enhanced relational operators categorized into Select, Match, Impute, Cluster and Order, and release LROBench to evaluate single and multi-operator queries on semantic database processing.
Database Context Compression for Text-to-SQL on Real-World Large Databases cs.DB · 2026-06-26 · unverdicted · none · ref 31
DBCC applies SGCF-based offline compression and online purification to shrink database context by up to 100x while raising schema recall and execution accuracy 1.8-1.9% on Spider 2.0 and BIRD.
ZAS-SQL: Distilling Rules from Failures for Zero-Shot Text-to-SQL cs.CL · 2026-06-06 · unverdicted · none · ref 25
ZAS-SQL distills rules from zero-shot Text-to-SQL failures to reach 87.2-88.6% execution accuracy on Spider, new zero-shot SOTA surpassing some GPT-4 few-shot and fine-tuned baselines.
EviLink: Multi-Path Schema Linking with Uncertainty-Guided Evidence Acquisition for Large-Scale Text-to-SQL cs.CL · 2026-05-28 · unverdicted · none · ref 1
EviLink combines multi-hypothesis schema grounding with uncertainty-guided evidence acquisition, reporting 90.15% field-level recall and 123.30K average tokens on Spider2-Snow while improving downstream SQL generation.
EGREFINE: An Execution-Grounded Optimization Framework for Text-to-SQL Schema Refinement cs.DB · 2026-05-01 · unverdicted · none · ref 49
EGRefine optimizes column renamings via execution-grounded verification and view materialization to recover Text-to-SQL accuracy lost to schema naming issues while guaranteeing query equivalence.
PiLLar: Matching for Pivot Table Schema via LLM-guided Monte-Carlo Tree Search cs.DB · 2026-04-29 · unverdicted · none · ref 24
PiLLar is the first LLM-guided Monte-Carlo Tree Search framework for joint schema-value matching on pivot tables, achieving 87.94% average accuracy on a new benchmark PTbench derived from real-world domains.
SQL Query Engine: A Self-Healing LLM Pipeline for Natural Language to PostgreSQL Translation cs.DB · 2026-04-15 · unverdicted · none · ref 14
A self-healing LLM pipeline for natural language to PostgreSQL translation achieves up to 9.3 percentage point accuracy gains on benchmarks through error diagnosis and anti-regression mechanisms.
AV-SQL: Decomposing Complex Text-to-SQL Queries with Agentic Views cs.DB · 2026-04-08 · unverdicted · none · ref 9
AV-SQL uses a pipeline of LLM agents to generate intermediate CTE views that decompose complex Text-to-SQL queries, reaching 70.38% execution accuracy on Spider 2.0.
Knapsack Optimization-based Schema Linking for LLM-based Text-to-SQL Generation cs.CL · 2025-02-18 · unverdicted · none · ref 21
KaSLA applies knapsack optimization hierarchically to schema linking for LLM text-to-SQL, claiming better results than large models and improved SQL generation on Spider and BIRD.
RAS: Reflection-Augmented Scaling with In-Context Learning for Executable Cypher Query Generation cs.CL · 2026-05-21 · unverdicted · none · ref 4
RAS conditions each new Cypher query attempt on prior execution errors through ICL and reduces execution error rate by 41-50% at n=5 versus 32-38% for independent scaling across three Neo4j datasets and five models.
SecureMCP: A Policy-Enforced LLM Data Access Framework for AIoT Systems via Model Context Protocol cs.CR · 2026-05-06 · unverdicted · none · ref 23
SecureMCP integrates RBAC with five sequential defense modules in an MCP server to achieve 82.3% policy compliance against adversarial LLM SQL queries in AIoT while preserving execution accuracy.
MARS-SQL: A multi-agent reinforcement learning framework for Text-to-SQL cs.CL · 2025-11-02 · unverdicted · none · ref 5
MARS-SQL trains a multi-agent RL system with ReAct-style interaction and generative validation to produce SQL queries, reaching 77.84% execution accuracy on BIRD dev and 89.75% on Spider test.
XiYan-SQL: A Novel Multi-Generator Framework For Text-to-SQL cs.CL · 2025-07-07 · unverdicted · none · ref 18
XiYan-SQL achieves SOTA Text-to-SQL accuracy by combining schema filtering, a multi-generator ensemble fine-tuned on varied SQL formats, and a selection model.
CHESS: Contextual Harnessing for Efficient SQL Synthesis cs.LG · 2024-05-27 · conditional · none · ref 66
CHESS deploys four LLM agents to retrieve information, prune schemas, generate refined SQL candidates, and validate via unit tests, reporting up to 71.10% accuracy on BIRD with 83% fewer calls than leading proprietary baselines.
Are Diffusion Language Models Good Database Analysts? cs.DB · 2026-05-27 · unverdicted · none · ref 1
Introduces a standardized evaluation setup and SQL-D1 agent for diffusion language models on NL2SQL, claiming structural robustness advantages over autoregressive models.

arXiv preprint arXiv:2307.07306 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer