DeepEye-SQL: A Software-Engineering-Inspired Text-to-SQL Framework

Boyan Li; Chong Chen; Yinan Mei; Yuyu Luo; Zhujun Xue

arxiv: 2510.17586 · v3 · submitted 2025-10-20 · 💻 cs.DB

DeepEye-SQL: A Software-Engineering-Inspired Text-to-SQL Framework

Boyan Li , Chong Chen , Zhujun Xue , Yinan Mei , Yuyu Luo This is my paper

Pith reviewed 2026-05-18 06:05 UTC · model grok-4.3

classification 💻 cs.DB

keywords Text-to-SQLLarge language modelsSoftware engineeringOrchestrationBIRD benchmarkSpider benchmarkSchema linkingN-version generation

0 comments

The pith

Treating Text-to-SQL as structured software development reaches 89.8 percent accuracy on Spider with modest models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that existing Text-to-SQL systems fall short because they lack structured orchestration across the full workflow rather than because the underlying models are too small. It presents DeepEye-SQL as a framework that models the task after the software development life cycle, using four linked stages to ground the query, generate multiple candidate solutions, verify them before running, and pick the best one with execution feedback. With open-source mixture-of-experts models of roughly 30 billion total parameters and no fine-tuning, the system records 73.5 percent execution accuracy on the BIRD development set, 75.07 percent on the official BIRD test leaderboard, and 89.8 percent on Spider test. A sympathetic reader would see this as evidence that careful process design can deliver more reliable natural-language database access than simply using bigger models or more training data.

Core claim

DeepEye-SQL reframes Text-to-SQL as the development of a small software program executed through a verifiable process guided by the Software Development Life Cycle. The framework integrates four synergistic stages: robust schema linking that enforces relational closure, N-version SQL generation for fault tolerance, a Syntax-Logic-Quality toolchain that intercepts errors before execution, and confidence-aware selection that resolves ambiguity through execution-guided adjudication rather than simple majority voting. Using open-source MoE LLMs without any fine-tuning, it achieves 73.5 percent execution accuracy on BIRD-Dev, 75.07 percent on the official BIRD-Test leaderboard, and 89.8 percent 0

What carries the argument

The four synergistic stages modeled on the Software Development Life Cycle that together provide intent grounding, fault tolerance, pre-execution verification, and confidence-based selection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same staged verification approach could be tested on other structured output tasks such as generating API calls or data transformation scripts.
Applying the framework to production databases with evolving schemas would show whether the schema-linking stage remains effective outside benchmark conditions.
If the verification rules were extended with domain-specific checks, accuracy on specialized enterprise databases might improve further.

Load-bearing premise

The Syntax-Logic-Quality toolchain reliably intercepts errors before execution and N-version generation plus confidence-aware selection together provide fault tolerance that exceeds simple majority voting.

What would settle it

An independent run on the same BIRD or Spider test sets that finds many queries passing the toolchain yet failing at execution time, or that shows confidence-aware selection performing no better than majority voting, would falsify the performance advantage.

Figures

Figures reproduced from arXiv: 2510.17586 by Boyan Li, Chong Chen, Yinan Mei, Yuyu Luo, Zhujun Xue.

**Figure 1.** Figure 1: Key Idea of DeepEye-SQL (LLMs) [42, 47, 52, 56] have substantially advanced Text-to-SQL, achieving strong results on benchmarks such as Spider [51] and BIRD [18]. For example, Alpha-SQL [15] leverages dynamic multistep reasoning, while XiYan-SQL [21] improves SQL generation and multi-candidate SQL selection through task-specific fine-tuning. Despite these advances, state-of-the-art performance on the BIRD… view at source ↗

**Figure 2.** Figure 2: DeepEye-SQL, a plug-and-play Text-to-SQL framework, consistently surpasses prior SOTA methods using ∼30B open-source LLMs without any task-specific fine-tuning. JOIN, NULL/result) that trigger targeted LLM repair, ensuring verifiable correctness (addressing C3). Finally, in release, we introduce confidence-aware selection that clusters execution results to estimate confidence and then takes a high-confi… view at source ↗

**Figure 3.** Figure 3: DeepEye-SQL Overview. (e.g., syntax, JOIN correctness). If a flaw is detected, the tool provides an explicit and actionable directive to the LLM for a targeted revision, mirroring a formal bug report and fix cycle. Phase-4: Confidence-aware SQL Selection. The final phase corresponds to the release stage, governed by a Quality Gate. Instead of simply choosing the most common answer, this stage arbitrates … view at source ↗

**Figure 6.** Figure 6: Confidence-Gated Selection. Based on the confidence of the topranked candidate S1, our framework follows one of two distinct paths, as detailed in Algorithm 5. High-Confidence Shortcut. If the confidence score𝐶𝑜𝑛 𝑓 (S1) exceeds a predefined high-confidence threshold 𝜃𝑐𝑜𝑛 𝑓 , we conclude that there is overwhelming agreement among the generated candidates. In this scenario, we directly select S1 as the fina… view at source ↗

**Figure 5.** Figure 5: Correctness overlap analysis of three SQL genera [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 7.** Figure 7: Execution accuracy vs. Confidence Shorcut Threshold on BIRD-Dev dataset with Qwen3-Coder-30B-A3B model. To answer this, we first analyze the relationship between voting confidence and correctness, and then present a direct comparison of our method against the standard self-consistency approach. The core motivation behind our Confidence-aware SQL Selection is the observation that the reliability of consist… view at source ↗

read the original abstract

Large language models (LLMs) have advanced Text-to-SQL, yet existing solutions still fall short of system-level reliability. The limitation is not merely in individual modules -- e.g., schema linking, reasoning, and verification -- but more critically in the lack of structured orchestration that enforces correctness across the entire workflow. This gap motivates a paradigm shift: treating Text-to-SQL not as free-form language generation but as a software-engineering problem that demands structured, verifiable orchestration. We present DeepEye-SQL, a software-engineering-inspired framework that reframes Text-to-SQL as the development of a small software program, executed through a verifiable process guided by the Software Development Life Cycle (SDLC). DeepEye-SQL integrates four synergistic stages: it grounds user intent through robust schema linking, enforcing relational closure; enhances fault tolerance with N-version SQL generation; ensures deterministic verification via a ``Syntax-Logic-Quality'' tool-chain that intercepts errors pre-execution; and introduces confidence-aware selection that leverages execution-guided adjudication to resolve ambiguity beyond simple majority voting. Leveraging open-source MoE LLMs (~30B total, ~3B activated parameters) without any fine-tuning, DeepEye-SQL achieves 73.5% execution accuracy on BIRD-Dev, 75.07% on the official BIRD-Test leaderboard, and 89.8% on Spider-Test, outperforming state-of-the-art solutions that rely on larger models or extensive training. This highlights that principled orchestration, rather than LLM scaling alone, is key to achieving system-level reliability in Text-to-SQL.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DeepEye-SQL gets competitive benchmark numbers on BIRD and Spider with a four-stage SDLC workflow on modest open models, but the evidence that the full orchestration beats simpler multi-generation plus voting is not yet there.

read the letter

The main point is that this paper wraps Text-to-SQL in four explicit stages—relational-closure schema linking, N-version generation, a Syntax-Logic-Quality pre-execution toolchain, and confidence-aware selection—and reports 73.5% on BIRD-Dev, 75.07% on the official BIRD test, and 89.8% on Spider-Test using ~30B MoE models with no fine-tuning. Those numbers beat some larger-model baselines on public leaderboards, which is the practical takeaway.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces DeepEye-SQL, a Text-to-SQL framework that reframes the task as a software-engineering problem guided by the Software Development Life Cycle (SDLC). It consists of four synergistic stages: robust schema linking to ground user intent and enforce relational closure, N-version SQL generation to enhance fault tolerance, a Syntax-Logic-Quality toolchain for deterministic pre-execution verification, and confidence-aware selection that uses execution-guided adjudication to resolve ambiguity beyond simple majority voting. Using open-source MoE LLMs (~30B total parameters) without fine-tuning, the framework reports execution accuracies of 73.5% on BIRD-Dev, 75.07% on the official BIRD-Test leaderboard, and 89.8% on Spider-Test, outperforming prior state-of-the-art methods that rely on larger models or extensive training.

Significance. If the central claims hold, the work would demonstrate that structured orchestration and verifiable stages can deliver competitive Text-to-SQL performance with smaller open-source models, shifting emphasis from LLM scaling to system-level design. This has potential implications for building reliable, production-grade database query interfaces and could encourage similar SDLC-inspired approaches in other LLM-driven data tasks.

major comments (2)

[Abstract and evaluation sections] Abstract and evaluation sections: the central claim that the Syntax-Logic-Quality toolchain plus confidence-aware selection deliver fault tolerance exceeding simple majority voting is load-bearing, yet no ablation study isolates the selection step while holding the N-version candidate pool fixed. Without this controlled comparison, it remains possible that reported gains derive primarily from multi-generation diversity rather than the orchestration's adjudication logic.
[Description of the four synergistic stages] Description of the four synergistic stages: the manuscript provides no quantitative breakdown or controlled experiments showing the incremental benefit of confidence-aware selection over majority voting on identical candidates, nor details on how execution-guided adjudication resolves ambiguity in practice. This weakens the argument that principled SDLC-style stages outperform basic ensembling.

minor comments (2)

[Abstract] The abstract states results on BIRD-Dev, BIRD-Test, and Spider-Test but provides no error bars, standard deviations across runs, or details on how post-hoc design choices in the stages affect final accuracies.
[Abstract] Implementation details for the open-source MoE LLMs (specific model names, exact parameter counts, and prompting templates) are referenced but not fully specified, limiting reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive report. The concerns regarding the need for controlled ablations of the confidence-aware selection component are well-taken and point to an opportunity to strengthen the empirical support for our claims. We address each major comment below and commit to revisions that directly respond to the feedback.

read point-by-point responses

Referee: [Abstract and evaluation sections] Abstract and evaluation sections: the central claim that the Syntax-Logic-Quality toolchain plus confidence-aware selection deliver fault tolerance exceeding simple majority voting is load-bearing, yet no ablation study isolates the selection step while holding the N-version candidate pool fixed. Without this controlled comparison, it remains possible that reported gains derive primarily from multi-generation diversity rather than the orchestration's adjudication logic.

Authors: We agree that an explicit ablation isolating the confidence-aware selection while holding the N-version candidate pool fixed would provide clearer evidence that the adjudication logic contributes beyond diversity from multi-generation alone. The manuscript reports overall framework results and comparisons to prior SOTA, but does not contain this specific controlled experiment. In the revised version we will add the requested ablation, reporting execution accuracy for majority voting versus execution-guided adjudication on identical candidate sets from the same N-version generation stage. This will be placed in the evaluation section and referenced from the abstract. revision: yes
Referee: [Description of the four synergistic stages] Description of the four synergistic stages: the manuscript provides no quantitative breakdown or controlled experiments showing the incremental benefit of confidence-aware selection over majority voting on identical candidates, nor details on how execution-guided adjudication resolves ambiguity in practice. This weakens the argument that principled SDLC-style stages outperform basic ensembling.

Authors: We acknowledge that the current description of the four stages would benefit from quantitative incremental analysis and concrete examples of ambiguity resolution. The manuscript explains the design of execution-guided adjudication but does not include side-by-side metrics on identical candidates or case studies. In revision we will add (1) a quantitative breakdown comparing confidence-aware selection to majority voting on fixed candidate pools and (2) practical examples illustrating cases where adjudication correctly selects the right SQL when majority voting fails. These additions will be incorporated into the section describing the synergistic stages. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper presents DeepEye-SQL as an engineering framework with four described stages (schema linking, N-version generation, Syntax-Logic-Quality toolchain, confidence-aware selection) and reports execution accuracies on external public benchmarks (BIRD-Dev, BIRD-Test, Spider-Test). No equations, first-principles derivations, fitted parameters, or self-referential definitions appear in the provided text. Performance results are measured against independent datasets rather than quantities defined internally by the framework itself. The central claims rest on empirical evaluation and orchestration description without any reduction of outputs to inputs by construction, self-citation load-bearing, or ansatz smuggling.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that SDLC principles transfer directly to Text-to-SQL without introducing new free parameters or invented entities; no fitted constants or new particles/forces are postulated.

axioms (1)

domain assumption Software Development Life Cycle principles can be mapped to Text-to-SQL to enforce correctness across the workflow.
Invoked in the motivation and framework overview as the basis for the four synergistic stages.

pith-pipeline@v0.9.0 · 5817 in / 1323 out tokens · 40838 ms · 2026-05-18T06:05:50.970719+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

DeepEye-SQL integrates four synergistic stages: ... Syntax-Logic-Quality tool-chain ... confidence-aware selection that clusters execution results
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery theorem unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

N-version Programming for SQL Generation ... three distinct generators (skeleton, ICL, divide-and-conquer)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Harnessing Agentic Evolution
cs.AI 2026-05 unverdicted novelty 7.0

AEvo introduces a meta-agent that edits the evolution procedure or agent context based on accumulated state, outperforming baselines by 26% relative improvement on agentic benchmarks and achieving SOTA on open-ended tasks.
Data-aware candidate selection in NL2SQL translation via small separating instances
cs.DB 2026-05 unverdicted novelty 6.0

A selection technique based on separating instances and provenance outperforms baselines for choosing among 2-3 NL2SQL candidates on a BIRD-DEV subset without consistency scores.
FlexSQL: Flexible Exploration and Execution Make Better Text-to-SQL Agents
cs.CL 2026-05 unverdicted novelty 6.0

FlexSQL reaches 65.4% on Spider2-Snow by allowing agents to flexibly explore schemas, generate diverse plans, choose SQL or Python execution, and apply two-tiered repair.
DPC: Training-Free Text-to-SQL Candidate Selection via Dual-Paradigm Consistency
cs.DB 2026-04 unverdicted novelty 6.0

DPC selects correct text-to-SQL outputs by enforcing execution consistency between SQL and Python on an adversarially constructed minimal distinguishing database.

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · cited by 4 Pith papers · 8 internal anchors

[1]

Gemini: A Family of Highly Capable Multimodal Models

Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Mil- lican, David Silver, Slav Petrov, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy P. Lillicrap, Ange- liki Lazaridou, Orhan Firat, James Molloy, Michae...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2312.11805 2023
[2]

Jinheon Baek, Horst Samulowitz, Oktie Hassanzadeh, Dharmashankar Subrama- nian, Sola Shirai, Alfio Gliozzo, and Debarun Bhattacharjya. 2025. Knowledge Base Construction for Knowledge-Augmented Text-to-SQL. InFindings of the Association for Computational Linguistics, ACL 2025, Vienna, Austria, July 27 - August 1, 2025, Wanxiang Che, Joyce Nabende, Ekaterin...

work page 2025
[3]

Zhenbiao Cao, Yuanlei Zheng, Zhihao Fan, Xiaojin Zhang, Wei Chen, and Xiang Bai. 2024. RSL-SQL: Robust Schema Linking in Text-to-SQL Generation.CoRR abs/2411.00073 (2024). arXiv:2411.00073 doi:10.48550/ARXIV.2411.00073

work page doi:10.48550/arxiv.2411.00073 2024
[4]

Liming Chen and Algirdas Avizienis. 1978. N-version programming: A fault- tolerance approach to reliability of software operation. InProc. 8th IEEE Int. Symp. on Fault-Tolerant Computing (FTCS-8), Vol. 1. 3–9

work page 1978
[5]

2025.The AI-native open-source embedding database

chroma core. 2025.The AI-native open-source embedding database. https://github. com/chroma-core/chroma Accessed: 2025-10-17

work page 2025
[6]

Yeounoh Chung, Gaurav Tarlok Kakkar, Yu Gan, Brenton Milne, and Fatma Ozcan

work page
[7]

VLDB Endow.18, 8 (2025), 2735–2747

Is Long Context All You Need? Leveraging LLM’s Extended Context for NL2SQL.Proc. VLDB Endow.18, 8 (2025), 2735–2747

work page 2025
[8]

Dawei Gao, Haibin Wang, Yaliang Li, Xiuyu Sun, Yichen Qian, Bolin Ding, and Jingren Zhou. 2024. Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation.Proc. VLDB Endow.17, 5 (2024), 1132–1145. doi:10.14778/ 3641204.3641221

work page arXiv 2024
[9]

Hierons, Kirill Bogdanov, Jonathan P

Robert M. Hierons, Kirill Bogdanov, Jonathan P. Bowen, Rance Cleaveland, John Derrick, Jeremy Dick, Marian Gheorghe, Mark Harman, Kalpesh Kapoor, Paul J. Krause, Gerald Lüttgen, Anthony J. H. Simons, Sergiy A. Vilkomir, Martin R. Woodward, and Hussein Zedan. 2009. Using formal specifications to support testing.ACM Comput. Surv.41, 2 (2009), 9:1–9:76. doi:...

work page doi:10.1145/1459352.1459354 2009
[10]

Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Kai Dang, An Yang, Rui Men, Fei Huang, Xingzhang Ren, Xuancheng Ren, Jingren Zhou, and Junyang Lin. 2024. Qwen2.5-Coder Technical Report.CoRRabs/2409.12186 (2024). arXiv:2409.12186 doi:10.48550/ ARXIV.2409.12186

work page internal anchor Pith review Pith/arXiv arXiv 2024
[11]

Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, Aleksander Madry, Alex Baker-Whitcomb, Alex Beutel, Alex Borzunov, Alex Carney, Alex Chow, Alex Kirillov, Alex Nichol, Alex Paino, Alex Renzin, Alex Tachard Passos, Alexander Kirillov, Alexi Christakis, Alexis Conneau,...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2024
[12]

Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-Bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Etienne Pot, Ivo Penchev, Gaël Liu, Francesco Visin, Kathleen Kenealy, Lucas Beyer, Xiaohai ...

work page
[13]

Gemma 3 Technical Report

Gemma 3 Technical Report.CoRRabs/2503.19786 (2025). arXiv:2503.19786 doi:10.48550/ARXIV.2503.19786

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2503.19786 2025
[14]

Kolodner

Janet L. Kolodner. 1993.Case-Based Reasoning. Morgan Kaufmann. doi:10.1016/ C2009-0-27670-7

work page 1993
[15]

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. 2023. Efficient Memory Management for Large Language Model Serving with PagedAtten- tion. InProceedings of the 29th Symposium on Operating Systems Principles, SOSP 2023, Koblenz, Germany, October 23-26, 2023, Jason Flinn, Margo I. Se...

work page doi:10.1145/3600006.3613165 2023
[16]

Boyan Li, Yuyu Luo, Chengliang Chai, Guoliang Li, and Nan Tang. 2024. The Dawn of Natural Language to SQL: Are We Fully Ready? [Experiment, Analysis & Benchmark ].Proc. VLDB Endow.17, 11 (2024), 3318–3331. doi:10.14778/3681954. 3682003

work page doi:10.14778/3681954 2024
[17]

Boyan Li, Jiayi Zhang, Ju Fan, Yanwei Xu, Chong Chen, Nan Tang, and Yuyu Luo. 2025. Alpha-SQL: Zero-Shot Text-to-SQL using Monte Carlo Tree Search. InForty-second International Conference on Machine Learning. OpenReview.net. https://openreview.net/forum?id=kGg1ndttmI

work page 2025
[18]

Haoyang Li, Shang Wu, Xiaokang Zhang, Xinmei Huang, Jing Zhang, Fuxin Jiang, Shuai Wang, Tieying Zhang, Jianjun Chen, Rui Shi, Hong Chen, and Cuiping Li

work page
[19]

VLDB Endow.18, 11 (2025), 4695–4709

OmniSQL: Synthesizing High-quality Text-to-SQL Data at Scale.Proc. VLDB Endow.18, 11 (2025), 4695–4709. https://www.vldb.org/pvldb/vol18/p4695-li.pdf

work page 2025
[20]

Haoyang Li, Jing Zhang, Hanbing Liu, Ju Fan, Xiaokang Zhang, Jun Zhu, Renjie Wei, Hongyan Pan, Cuiping Li, and Hong Chen. 2024. CodeS: Towards Building Open-source Language Models for Text-to-SQL.Proc. ACM Manag. Data2, 3 (2024), 127. doi:10.1145/3654930

work page doi:10.1145/3654930 2024
[21]

Jinyang Li, Binyuan Hui, Ge Qu, Jiaxi Yang, Binhua Li, Bowen Li, Bailin Wang, Bowen Qin, Ruiying Geng, Nan Huo, et al . 2024. Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls. Advances in Neural Information Processing Systems36 (2024)

work page 2024
[22]

Xinyu Liu, Shuyu Shen, Boyan Li, Peixian Ma, Runzhi Jiang, Yuxin Zhang, Ju Fan, Guoliang Li, Nan Tang, and Yuyu Luo. 2025. A Survey of Text-to-SQL in the Era of LLMs: Where Are We, and Where Are We Going?IEEE Trans. Knowl. Data Eng.37, 10 (2025), 5735–5754

work page 2025
[23]

Xinyu Liu, Shuyu Shen, Boyan Li, Nan Tang, and Yuyu Luo. 2025. NL2SQL- BUGs: A Benchmark for Detecting Semantic Errors in NL2SQL Translation.CoRR abs/2503.11984 (2025). arXiv:2503.11984 doi:10.48550/ARXIV.2503.11984

work page doi:10.48550/arxiv.2503.11984 2025
[24]

Yifu Liu, Yin Zhu, Yingqi Gao, Zhiling Luo, Xiaoxia Li, Xiaorong Shi, Yuntao Hong, Jinyang Gao, Yu Li, Bolin Ding, and Jingren Zhou. 2025. XiYan-SQL: A Novel Multi-Generator Framework For Text-to-SQL.CoRRabs/2507.04701 (2025). arXiv:2507.04701 doi:10.48550/ARXIV.2507.04701

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2507.04701 2025
[25]

Tianqi Luo, Chuhan Huang, Leixian Shen, Boyan Li, Shuyu Shen, Wei Zeng, Nan Tang, and Yuyu Luo. 2025. nvBench 2.0: Resolving Ambiguity in Text- to-Visualization through Stepwise Reasoning.arXiv preprint arXiv:2503.12880 (2025). DeepEye-SQL: A Software-Engineering-Inspired Text-to-SQL Framework Conference’17, July 2017, Washington, DC, USA

work page arXiv 2025
[26]

Yuyu Luo, Guoliang Li, Ju Fan, Chengliang Chai, and Nan Tang. 2025. Natural Language to SQL: State of the Art and Open Problems.Proc. VLDB Endow.18, 12 (2025), 5466–5471

work page 2025
[27]

Yuyu Luo, Xuedi Qin, Nan Tang, and Guoliang Li. 2018. DeepEye: Towards Automatic Data Visualization. In34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, April 16-19, 2018. IEEE Computer Society, 101–112. doi:10.1109/ICDE.2018.00019

work page doi:10.1109/icde.2018.00019 2018
[28]

Yuyu Luo, Nan Tang, Guoliang Li, Chengliang Chai, Wenbo Li, and Xuedi Qin

work page
[29]

InSIGMOD ’21: International Conference on Management of Data, Virtual Event, China, June 20-25, 2021, Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava (Eds.)

Synthesizing Natural Language to Visualization (NL2VIS) Benchmarks from NL2SQL Benchmarks. InSIGMOD ’21: International Conference on Management of Data, Virtual Event, China, June 20-25, 2021, Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava (Eds.). ACM, 1235–1247. doi:10.1145/3448016. 3457261

work page doi:10.1145/3448016 2021
[30]

Yuyu Luo, Nan Tang, Guoliang Li, Jiawei Tang, Chengliang Chai, and Xuedi Qin

work page
[31]

Natural Language to Visualization by Neural Machine Translation.IEEE Trans. Vis. Comput. Graph.28, 1 (2022), 217–226. doi:10.1109/TVCG.2021.3114848

work page doi:10.1109/tvcg.2021.3114848 2022
[32]

Peixian Ma, Boyan Li, Runzhi Jiang, Ju Fan, Nan Tang, and Yuyu Luo. 2024. A Plug-and-Play Natural Language Rewriter for Natural Language to SQL.CoRR abs/2412.17068 (2024). arXiv:2412.17068 doi:10.48550/ARXIV.2412.17068

work page doi:10.48550/arxiv.2412.17068 2024
[33]

Karime Maamari, Fadhil Abubaker, Daniel Jaroslawicz, and Amine Mhedhbi

work page
[34]

The death of schema linking? text-to-sql in the age of well-reasoned language models,

The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models.CoRRabs/2408.07702 (2024). arXiv:2408.07702 doi:10.48550/ ARXIV.2408.07702

work page arXiv 2024
[35]

Rozantsev, M

Yury A. Malkov and Dmitry A. Yashunin. 2020. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs.IEEE Trans. Pattern Anal. Mach. Intell.42, 4 (2020), 824–836. doi:10.1109/TPAMI.2018. 2889473

work page doi:10.1109/tpami.2018 2020
[36]

Joel Ossher, Sushil Krishna Bajracharya, and Cristina Videira Lopes. 2010. Au- tomated dependency resolution for open source software. InProceedings of the 7th International Working Conference on Mining Software Repositories, MSR 2010 (Co-located with ICSE), Cape Town, South Africa, May 2-3, 2010, Proceedings, Jim Whitehead and Thomas Zimmermann (Eds.). I...

work page doi:10.1109/msr.2010.5463346 2010
[37]

Mohammadreza Pourreza, Hailong Li, Ruoxi Sun, Yeounoh Chung, Shayan Talaei, Gaurav Tarlok Kakkar, Yu Gan, Amin Saberi, Fatma Ozcan, and Sercan Ö. Arik

work page
[38]

InThe Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025

CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL. InThe Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net. https: //openreview.net/forum?id=CvGqMD5OtX

work page 2025
[39]

Mohammadreza Pourreza and Davood Rafiei. 2023. DIN-SQL: Decomposed In- Context Learning of Text-to-SQL with Self-Correction. InAdvances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, Alice Oh, Tristan Naumann, Amir Globerson, Kate Sae...

work page 2023
[40]

Ge Qu, Jinyang Li, Bowen Qin, Xiaolong Li, Nan Huo, Chenhao Ma, and Reynold Cheng. 2025. SHARE: An SLM-based Hierarchical Action CorREction Assistant for Text-to-SQL. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2025, Vienna, Austria, July 27 - August 1, 2025, Wanxiang Che, Joyce Na...

work page 2025
[41]

M. G. Rekoff. 1985. On reverse engineering.IEEE Trans. Syst. Man Cybern.15, 2 (1985), 244–252. doi:10.1109/TSMC.1985.6313354

work page doi:10.1109/tsmc.1985.6313354 1985
[42]

Per Runeson. 2006. A Survey of Unit Testing Practices.IEEE Softw.23, 4 (2006), 22–29. doi:10.1109/MS.2006.91

work page doi:10.1109/ms.2006.91 2006
[43]

Ruparelia

Nayan B. Ruparelia. 2010. Software development lifecycle models.ACM SIGSOFT Softw. Eng. Notes35, 3 (2010), 8–13

work page 2010
[44]

Joscha Schnell and Gunther Reinhart. 2016. Quality management for battery production: a quality gate concept.Procedia CIRP57 (2016), 568–573

work page 2016
[45]

Philip Sedgwick. 2012. Pearson’s correlation coefficient.Bmj345 (2012)

work page 2012
[46]

Lei Sheng and Shuai-Shuai Xu. 2025. CSC-SQL: Corrective Self-Consistency in Text-to-SQL via Reinforcement Learning.CoRRabs/2505.13271 (2025). arXiv:2505.13271 doi:10.48550/ARXIV.2505.13271

work page doi:10.48550/arxiv.2505.13271 2025
[47]

Lei Sheng, Shuai-Shuai Xu, and Wei Xie. 2025. BASE-SQL: A powerful open source Text-To-SQL baseline approach.CoRRabs/2502.10739 (2025). arXiv:2502.10739 doi:10.48550/ARXIV.2502.10739

work page doi:10.48550/arxiv.2502.10739 2025
[48]

Vladislav Shkapenyuk, Divesh Srivastava, Theodore Johnson, and Parisa Ghane

work page
[49]

arXiv:2505.19988 doi:10.48550/ARXIV.2505.19988

Automatic Metadata Extraction for Text-to-SQL.CoRRabs/2505.19988 (2025). arXiv:2505.19988 doi:10.48550/ARXIV.2505.19988

work page doi:10.48550/arxiv.2505.19988 2025
[50]

Zhihao Shuai, Boyan Li, Siyu Yan, Yuyu Luo, and Weikai Yang. 2025. DeepVIS: Bridging Natural Language and Data Visualization Through Step-wise Reasoning. CoRRabs/2508.01700 (2025). arXiv:2508.01700 doi:10.48550/ARXIV.2508.01700

work page doi:10.48550/arxiv.2508.01700 2025
[51]

Shayan Talaei, Mohammadreza Pourreza, Yu-Chen Chang, Azalia Mirhoseini, and Amin Saberi. 2024. CHESS: Contextual Harnessing for Efficient SQL Synthesis. CoRRabs/2405.16755 (2024). arXiv:2405.16755 doi:10.48550/ARXIV.2405.16755

work page internal anchor Pith review doi:10.48550/arxiv.2405.16755 2024
[52]

Martyn Thomas and Frank E. McGarry. 1994. Top-Down vs. Bottom-Up Process Improvement.IEEE Softw.11, 4 (1994), 12–13. doi:10.1109/52.300121

work page doi:10.1109/52.300121 1994
[53]

Peter Ulbrich, Martin Hoffmann, Rüdiger Kapitza, Daniel Lohmann, Wolfgang Schroder-Preikschat, and Reiner Schmid. 2012. Eliminating single points of failure in software-based redundancy. In2012 Ninth European Dependable Computing Conference. IEEE, 49–60

work page 2012
[54]

Le, Ed H

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V. Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2023. Self-Consistency Improves Chain of Thought Reasoning in Language Models. InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/forum?id=1PL1NIMMrw

work page 2023
[55]

Yifan Wu, Jingze Shi, Bingheng Wu, Jiayi Zhang, Xiaotian Lin, Nan Tang, and Yuyu Luo. 2025. Concise Reasoning, Big Gains: Pruning Long Reasoning Trace with Difficulty-Aware Prompting.CoRRabs/2505.19716 (2025)

work page arXiv 2025
[56]

Xiangjin Xie, Guangwei Xu, Lingyan Zhao, and Ruijie Guo. 2025. OpenSearch- SQL: Enhancing Text-to-SQL with Dynamic Few-shot and Consistency Align- ment.Proc. ACM Manag. Data3, 3 (2025), 194:1–194:24. doi:10.1145/3725331

work page doi:10.1145/3725331 2025
[57]

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jian Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, Le Yu, Liangha...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2505.09388 2025
[58]

Jiaxi Yang, Binyuan Hui, Min Yang, Jian Yang, Junyang Lin, and Chang Zhou. 2024. Synthesizing Text-to-SQL Data from Weak and Strong LLMs. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). A...

work page doi:10.18653/v1/2024.acl-long.425 2024
[59]

Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, and Dragomir R. Radev

work page
[60]

Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross- Domain Semantic Parsing and Text-to-SQL Task. InEMNLP. Association for Computational Linguistics, 3911–3921

work page
[61]

Jiayi Zhang, Jinyu Xiang, Zhaoyang Yu, Fengwei Teng, Xionghui Chen, Jiaqi Chen, Mingchen Zhuge, Xin Cheng, Sirui Hong, Jinlin Wang, Bingnan Zheng, Bang Liu, Yuyu Luo, and Chenglin Wu. 2025. AFlow: Automating Agentic Workflow Generation. InThe Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenRevi...

work page 2025
[62]

Qingjie Zhang, Di Wang, Haoting Qian, Yiming Li, Tianwei Zhang, Minlie Huang, Ke Xu, Hewu Li, Liu Yan, and Han Qiu. 2025. Understanding the Dark Side of LLMs’ Intrinsic Self-Correction. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2025, Vienna, Austria, July 27 - August 1, 2025, Wan...

work page 2025
[63]

Yuxin Zhang, Meihao Fan, Ju Fan, Mingyang Yi, Yuyu Luo, Jian Tan, and Guoliang Li. 2025. Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process- Supervised Rewards.CoRRabs/2505.04671 (2025)

work page arXiv 2025
[64]

Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, Fei Huang, and Jingren Zhou. 2025. Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models.CoRRabs/2506.05176 (2025). arXiv:2506.05176 doi:10.48550/ARXIV.2506.05176

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2506.05176 2025
[65]

Yizhang Zhu, Shiyin Du, Boyan Li, Yuyu Luo, and Nan Tang. 2024. Are Large Language Models Good Statisticians?. InNeurIPS

work page 2024
[66]

Yizhang Zhu, Runzhi Jiang, Boyan Li, Nan Tang, and Yuyu Luo. 2025. El- lieSQL: Cost-Efficient Text-to-SQL with Complexity-Aware Routing.CoRR abs/2503.22402 (2025)

work page arXiv 2025

[1] [1]

Gemini: A Family of Highly Capable Multimodal Models

Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Mil- lican, David Silver, Slav Petrov, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy P. Lillicrap, Ange- liki Lazaridou, Orhan Firat, James Molloy, Michae...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2312.11805 2023

[2] [2]

Jinheon Baek, Horst Samulowitz, Oktie Hassanzadeh, Dharmashankar Subrama- nian, Sola Shirai, Alfio Gliozzo, and Debarun Bhattacharjya. 2025. Knowledge Base Construction for Knowledge-Augmented Text-to-SQL. InFindings of the Association for Computational Linguistics, ACL 2025, Vienna, Austria, July 27 - August 1, 2025, Wanxiang Che, Joyce Nabende, Ekaterin...

work page 2025

[3] [3]

Zhenbiao Cao, Yuanlei Zheng, Zhihao Fan, Xiaojin Zhang, Wei Chen, and Xiang Bai. 2024. RSL-SQL: Robust Schema Linking in Text-to-SQL Generation.CoRR abs/2411.00073 (2024). arXiv:2411.00073 doi:10.48550/ARXIV.2411.00073

work page doi:10.48550/arxiv.2411.00073 2024

[4] [4]

Liming Chen and Algirdas Avizienis. 1978. N-version programming: A fault- tolerance approach to reliability of software operation. InProc. 8th IEEE Int. Symp. on Fault-Tolerant Computing (FTCS-8), Vol. 1. 3–9

work page 1978

[5] [5]

2025.The AI-native open-source embedding database

chroma core. 2025.The AI-native open-source embedding database. https://github. com/chroma-core/chroma Accessed: 2025-10-17

work page 2025

[6] [6]

Yeounoh Chung, Gaurav Tarlok Kakkar, Yu Gan, Brenton Milne, and Fatma Ozcan

work page

[7] [7]

VLDB Endow.18, 8 (2025), 2735–2747

Is Long Context All You Need? Leveraging LLM’s Extended Context for NL2SQL.Proc. VLDB Endow.18, 8 (2025), 2735–2747

work page 2025

[8] [8]

Dawei Gao, Haibin Wang, Yaliang Li, Xiuyu Sun, Yichen Qian, Bolin Ding, and Jingren Zhou. 2024. Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation.Proc. VLDB Endow.17, 5 (2024), 1132–1145. doi:10.14778/ 3641204.3641221

work page arXiv 2024

[9] [9]

Hierons, Kirill Bogdanov, Jonathan P

Robert M. Hierons, Kirill Bogdanov, Jonathan P. Bowen, Rance Cleaveland, John Derrick, Jeremy Dick, Marian Gheorghe, Mark Harman, Kalpesh Kapoor, Paul J. Krause, Gerald Lüttgen, Anthony J. H. Simons, Sergiy A. Vilkomir, Martin R. Woodward, and Hussein Zedan. 2009. Using formal specifications to support testing.ACM Comput. Surv.41, 2 (2009), 9:1–9:76. doi:...

work page doi:10.1145/1459352.1459354 2009

[10] [10]

Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Kai Dang, An Yang, Rui Men, Fei Huang, Xingzhang Ren, Xuancheng Ren, Jingren Zhou, and Junyang Lin. 2024. Qwen2.5-Coder Technical Report.CoRRabs/2409.12186 (2024). arXiv:2409.12186 doi:10.48550/ ARXIV.2409.12186

work page internal anchor Pith review Pith/arXiv arXiv 2024

[11] [11]

Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, Aleksander Madry, Alex Baker-Whitcomb, Alex Beutel, Alex Borzunov, Alex Carney, Alex Chow, Alex Kirillov, Alex Nichol, Alex Paino, Alex Renzin, Alex Tachard Passos, Alexander Kirillov, Alexi Christakis, Alexis Conneau,...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2024

[12] [12]

Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-Bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Etienne Pot, Ivo Penchev, Gaël Liu, Francesco Visin, Kathleen Kenealy, Lucas Beyer, Xiaohai ...

work page

[13] [13]

Gemma 3 Technical Report

Gemma 3 Technical Report.CoRRabs/2503.19786 (2025). arXiv:2503.19786 doi:10.48550/ARXIV.2503.19786

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2503.19786 2025

[14] [14]

Kolodner

Janet L. Kolodner. 1993.Case-Based Reasoning. Morgan Kaufmann. doi:10.1016/ C2009-0-27670-7

work page 1993

[15] [15]

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. 2023. Efficient Memory Management for Large Language Model Serving with PagedAtten- tion. InProceedings of the 29th Symposium on Operating Systems Principles, SOSP 2023, Koblenz, Germany, October 23-26, 2023, Jason Flinn, Margo I. Se...

work page doi:10.1145/3600006.3613165 2023

[16] [16]

Boyan Li, Yuyu Luo, Chengliang Chai, Guoliang Li, and Nan Tang. 2024. The Dawn of Natural Language to SQL: Are We Fully Ready? [Experiment, Analysis & Benchmark ].Proc. VLDB Endow.17, 11 (2024), 3318–3331. doi:10.14778/3681954. 3682003

work page doi:10.14778/3681954 2024

[17] [17]

Boyan Li, Jiayi Zhang, Ju Fan, Yanwei Xu, Chong Chen, Nan Tang, and Yuyu Luo. 2025. Alpha-SQL: Zero-Shot Text-to-SQL using Monte Carlo Tree Search. InForty-second International Conference on Machine Learning. OpenReview.net. https://openreview.net/forum?id=kGg1ndttmI

work page 2025

[18] [18]

Haoyang Li, Shang Wu, Xiaokang Zhang, Xinmei Huang, Jing Zhang, Fuxin Jiang, Shuai Wang, Tieying Zhang, Jianjun Chen, Rui Shi, Hong Chen, and Cuiping Li

work page

[19] [19]

VLDB Endow.18, 11 (2025), 4695–4709

OmniSQL: Synthesizing High-quality Text-to-SQL Data at Scale.Proc. VLDB Endow.18, 11 (2025), 4695–4709. https://www.vldb.org/pvldb/vol18/p4695-li.pdf

work page 2025

[20] [20]

Haoyang Li, Jing Zhang, Hanbing Liu, Ju Fan, Xiaokang Zhang, Jun Zhu, Renjie Wei, Hongyan Pan, Cuiping Li, and Hong Chen. 2024. CodeS: Towards Building Open-source Language Models for Text-to-SQL.Proc. ACM Manag. Data2, 3 (2024), 127. doi:10.1145/3654930

work page doi:10.1145/3654930 2024

[21] [21]

Jinyang Li, Binyuan Hui, Ge Qu, Jiaxi Yang, Binhua Li, Bowen Li, Bailin Wang, Bowen Qin, Ruiying Geng, Nan Huo, et al . 2024. Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls. Advances in Neural Information Processing Systems36 (2024)

work page 2024

[22] [22]

Xinyu Liu, Shuyu Shen, Boyan Li, Peixian Ma, Runzhi Jiang, Yuxin Zhang, Ju Fan, Guoliang Li, Nan Tang, and Yuyu Luo. 2025. A Survey of Text-to-SQL in the Era of LLMs: Where Are We, and Where Are We Going?IEEE Trans. Knowl. Data Eng.37, 10 (2025), 5735–5754

work page 2025

[23] [23]

Xinyu Liu, Shuyu Shen, Boyan Li, Nan Tang, and Yuyu Luo. 2025. NL2SQL- BUGs: A Benchmark for Detecting Semantic Errors in NL2SQL Translation.CoRR abs/2503.11984 (2025). arXiv:2503.11984 doi:10.48550/ARXIV.2503.11984

work page doi:10.48550/arxiv.2503.11984 2025

[24] [24]

Yifu Liu, Yin Zhu, Yingqi Gao, Zhiling Luo, Xiaoxia Li, Xiaorong Shi, Yuntao Hong, Jinyang Gao, Yu Li, Bolin Ding, and Jingren Zhou. 2025. XiYan-SQL: A Novel Multi-Generator Framework For Text-to-SQL.CoRRabs/2507.04701 (2025). arXiv:2507.04701 doi:10.48550/ARXIV.2507.04701

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2507.04701 2025

[25] [25]

Tianqi Luo, Chuhan Huang, Leixian Shen, Boyan Li, Shuyu Shen, Wei Zeng, Nan Tang, and Yuyu Luo. 2025. nvBench 2.0: Resolving Ambiguity in Text- to-Visualization through Stepwise Reasoning.arXiv preprint arXiv:2503.12880 (2025). DeepEye-SQL: A Software-Engineering-Inspired Text-to-SQL Framework Conference’17, July 2017, Washington, DC, USA

work page arXiv 2025

[26] [26]

Yuyu Luo, Guoliang Li, Ju Fan, Chengliang Chai, and Nan Tang. 2025. Natural Language to SQL: State of the Art and Open Problems.Proc. VLDB Endow.18, 12 (2025), 5466–5471

work page 2025

[27] [27]

Yuyu Luo, Xuedi Qin, Nan Tang, and Guoliang Li. 2018. DeepEye: Towards Automatic Data Visualization. In34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, April 16-19, 2018. IEEE Computer Society, 101–112. doi:10.1109/ICDE.2018.00019

work page doi:10.1109/icde.2018.00019 2018

[28] [28]

Yuyu Luo, Nan Tang, Guoliang Li, Chengliang Chai, Wenbo Li, and Xuedi Qin

work page

[29] [29]

InSIGMOD ’21: International Conference on Management of Data, Virtual Event, China, June 20-25, 2021, Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava (Eds.)

Synthesizing Natural Language to Visualization (NL2VIS) Benchmarks from NL2SQL Benchmarks. InSIGMOD ’21: International Conference on Management of Data, Virtual Event, China, June 20-25, 2021, Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava (Eds.). ACM, 1235–1247. doi:10.1145/3448016. 3457261

work page doi:10.1145/3448016 2021

[30] [30]

Yuyu Luo, Nan Tang, Guoliang Li, Jiawei Tang, Chengliang Chai, and Xuedi Qin

work page

[31] [31]

Natural Language to Visualization by Neural Machine Translation.IEEE Trans. Vis. Comput. Graph.28, 1 (2022), 217–226. doi:10.1109/TVCG.2021.3114848

work page doi:10.1109/tvcg.2021.3114848 2022

[32] [32]

Peixian Ma, Boyan Li, Runzhi Jiang, Ju Fan, Nan Tang, and Yuyu Luo. 2024. A Plug-and-Play Natural Language Rewriter for Natural Language to SQL.CoRR abs/2412.17068 (2024). arXiv:2412.17068 doi:10.48550/ARXIV.2412.17068

work page doi:10.48550/arxiv.2412.17068 2024

[33] [33]

Karime Maamari, Fadhil Abubaker, Daniel Jaroslawicz, and Amine Mhedhbi

work page

[34] [34]

The death of schema linking? text-to-sql in the age of well-reasoned language models,

The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models.CoRRabs/2408.07702 (2024). arXiv:2408.07702 doi:10.48550/ ARXIV.2408.07702

work page arXiv 2024

[35] [35]

Rozantsev, M

Yury A. Malkov and Dmitry A. Yashunin. 2020. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs.IEEE Trans. Pattern Anal. Mach. Intell.42, 4 (2020), 824–836. doi:10.1109/TPAMI.2018. 2889473

work page doi:10.1109/tpami.2018 2020

[36] [36]

Joel Ossher, Sushil Krishna Bajracharya, and Cristina Videira Lopes. 2010. Au- tomated dependency resolution for open source software. InProceedings of the 7th International Working Conference on Mining Software Repositories, MSR 2010 (Co-located with ICSE), Cape Town, South Africa, May 2-3, 2010, Proceedings, Jim Whitehead and Thomas Zimmermann (Eds.). I...

work page doi:10.1109/msr.2010.5463346 2010

[37] [37]

Mohammadreza Pourreza, Hailong Li, Ruoxi Sun, Yeounoh Chung, Shayan Talaei, Gaurav Tarlok Kakkar, Yu Gan, Amin Saberi, Fatma Ozcan, and Sercan Ö. Arik

work page

[38] [38]

InThe Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025

CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL. InThe Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net. https: //openreview.net/forum?id=CvGqMD5OtX

work page 2025

[39] [39]

Mohammadreza Pourreza and Davood Rafiei. 2023. DIN-SQL: Decomposed In- Context Learning of Text-to-SQL with Self-Correction. InAdvances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, Alice Oh, Tristan Naumann, Amir Globerson, Kate Sae...

work page 2023

[40] [40]

Ge Qu, Jinyang Li, Bowen Qin, Xiaolong Li, Nan Huo, Chenhao Ma, and Reynold Cheng. 2025. SHARE: An SLM-based Hierarchical Action CorREction Assistant for Text-to-SQL. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2025, Vienna, Austria, July 27 - August 1, 2025, Wanxiang Che, Joyce Na...

work page 2025

[41] [41]

M. G. Rekoff. 1985. On reverse engineering.IEEE Trans. Syst. Man Cybern.15, 2 (1985), 244–252. doi:10.1109/TSMC.1985.6313354

work page doi:10.1109/tsmc.1985.6313354 1985

[42] [42]

Per Runeson. 2006. A Survey of Unit Testing Practices.IEEE Softw.23, 4 (2006), 22–29. doi:10.1109/MS.2006.91

work page doi:10.1109/ms.2006.91 2006

[43] [43]

Ruparelia

Nayan B. Ruparelia. 2010. Software development lifecycle models.ACM SIGSOFT Softw. Eng. Notes35, 3 (2010), 8–13

work page 2010

[44] [44]

Joscha Schnell and Gunther Reinhart. 2016. Quality management for battery production: a quality gate concept.Procedia CIRP57 (2016), 568–573

work page 2016

[45] [45]

Philip Sedgwick. 2012. Pearson’s correlation coefficient.Bmj345 (2012)

work page 2012

[46] [46]

Lei Sheng and Shuai-Shuai Xu. 2025. CSC-SQL: Corrective Self-Consistency in Text-to-SQL via Reinforcement Learning.CoRRabs/2505.13271 (2025). arXiv:2505.13271 doi:10.48550/ARXIV.2505.13271

work page doi:10.48550/arxiv.2505.13271 2025

[47] [47]

Lei Sheng, Shuai-Shuai Xu, and Wei Xie. 2025. BASE-SQL: A powerful open source Text-To-SQL baseline approach.CoRRabs/2502.10739 (2025). arXiv:2502.10739 doi:10.48550/ARXIV.2502.10739

work page doi:10.48550/arxiv.2502.10739 2025

[48] [48]

Vladislav Shkapenyuk, Divesh Srivastava, Theodore Johnson, and Parisa Ghane

work page

[49] [49]

arXiv:2505.19988 doi:10.48550/ARXIV.2505.19988

Automatic Metadata Extraction for Text-to-SQL.CoRRabs/2505.19988 (2025). arXiv:2505.19988 doi:10.48550/ARXIV.2505.19988

work page doi:10.48550/arxiv.2505.19988 2025

[50] [50]

Zhihao Shuai, Boyan Li, Siyu Yan, Yuyu Luo, and Weikai Yang. 2025. DeepVIS: Bridging Natural Language and Data Visualization Through Step-wise Reasoning. CoRRabs/2508.01700 (2025). arXiv:2508.01700 doi:10.48550/ARXIV.2508.01700

work page doi:10.48550/arxiv.2508.01700 2025

[51] [51]

Shayan Talaei, Mohammadreza Pourreza, Yu-Chen Chang, Azalia Mirhoseini, and Amin Saberi. 2024. CHESS: Contextual Harnessing for Efficient SQL Synthesis. CoRRabs/2405.16755 (2024). arXiv:2405.16755 doi:10.48550/ARXIV.2405.16755

work page internal anchor Pith review doi:10.48550/arxiv.2405.16755 2024

[52] [52]

Martyn Thomas and Frank E. McGarry. 1994. Top-Down vs. Bottom-Up Process Improvement.IEEE Softw.11, 4 (1994), 12–13. doi:10.1109/52.300121

work page doi:10.1109/52.300121 1994

[53] [53]

Peter Ulbrich, Martin Hoffmann, Rüdiger Kapitza, Daniel Lohmann, Wolfgang Schroder-Preikschat, and Reiner Schmid. 2012. Eliminating single points of failure in software-based redundancy. In2012 Ninth European Dependable Computing Conference. IEEE, 49–60

work page 2012

[54] [54]

Le, Ed H

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V. Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2023. Self-Consistency Improves Chain of Thought Reasoning in Language Models. InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/forum?id=1PL1NIMMrw

work page 2023

[55] [55]

Yifan Wu, Jingze Shi, Bingheng Wu, Jiayi Zhang, Xiaotian Lin, Nan Tang, and Yuyu Luo. 2025. Concise Reasoning, Big Gains: Pruning Long Reasoning Trace with Difficulty-Aware Prompting.CoRRabs/2505.19716 (2025)

work page arXiv 2025

[56] [56]

Xiangjin Xie, Guangwei Xu, Lingyan Zhao, and Ruijie Guo. 2025. OpenSearch- SQL: Enhancing Text-to-SQL with Dynamic Few-shot and Consistency Align- ment.Proc. ACM Manag. Data3, 3 (2025), 194:1–194:24. doi:10.1145/3725331

work page doi:10.1145/3725331 2025

[57] [57]

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jian Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, Le Yu, Liangha...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2505.09388 2025

[58] [58]

Jiaxi Yang, Binyuan Hui, Min Yang, Jian Yang, Junyang Lin, and Chang Zhou. 2024. Synthesizing Text-to-SQL Data from Weak and Strong LLMs. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). A...

work page doi:10.18653/v1/2024.acl-long.425 2024

[59] [59]

Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, and Dragomir R. Radev

work page

[60] [60]

Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross- Domain Semantic Parsing and Text-to-SQL Task. InEMNLP. Association for Computational Linguistics, 3911–3921

work page

[61] [61]

Jiayi Zhang, Jinyu Xiang, Zhaoyang Yu, Fengwei Teng, Xionghui Chen, Jiaqi Chen, Mingchen Zhuge, Xin Cheng, Sirui Hong, Jinlin Wang, Bingnan Zheng, Bang Liu, Yuyu Luo, and Chenglin Wu. 2025. AFlow: Automating Agentic Workflow Generation. InThe Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenRevi...

work page 2025

[62] [62]

Qingjie Zhang, Di Wang, Haoting Qian, Yiming Li, Tianwei Zhang, Minlie Huang, Ke Xu, Hewu Li, Liu Yan, and Han Qiu. 2025. Understanding the Dark Side of LLMs’ Intrinsic Self-Correction. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2025, Vienna, Austria, July 27 - August 1, 2025, Wan...

work page 2025

[63] [63]

Yuxin Zhang, Meihao Fan, Ju Fan, Mingyang Yi, Yuyu Luo, Jian Tan, and Guoliang Li. 2025. Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process- Supervised Rewards.CoRRabs/2505.04671 (2025)

work page arXiv 2025

[64] [64]

Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, Fei Huang, and Jingren Zhou. 2025. Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models.CoRRabs/2506.05176 (2025). arXiv:2506.05176 doi:10.48550/ARXIV.2506.05176

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2506.05176 2025

[65] [65]

Yizhang Zhu, Shiyin Du, Boyan Li, Yuyu Luo, and Nan Tang. 2024. Are Large Language Models Good Statisticians?. InNeurIPS

work page 2024

[66] [66]

Yizhang Zhu, Runzhi Jiang, Boyan Li, Nan Tang, and Yuyu Luo. 2025. El- lieSQL: Cost-Efficient Text-to-SQL with Complexity-Aware Routing.CoRR abs/2503.22402 (2025)

work page arXiv 2025