pith. machine review for the scientific record. sign in

arxiv: 2507.04701 · v2 · submitted 2025-07-07 · 💻 cs.CL

Recognition: unknown

XiYan-SQL: A Novel Multi-Generator Framework For Text-to-SQL

Authors on Pith no claims yet
classification 💻 cs.CL
keywords multipleframeworkgenerationmulti-generatorxiyan-sqlensemblefine-tuninggenerating
0
0 comments X
read the original abstract

To leverage the advantages of LLM in addressing challenges in the Text-to-SQL task, we present XiYan-SQL, an innovative framework effectively generating and utilizing multiple SQL candidates. It consists of three components: 1) a Schema Filter module filtering and obtaining multiple relevant schemas; 2) a multi-generator ensemble approach generating multiple highquality and diverse SQL queries; 3) a selection model with a candidate reorganization strategy implemented to obtain the optimal SQL query. Specifically, for the multi-generator ensemble, we employ a multi-task fine-tuning strategy to enhance the capabilities of SQL generation models for the intrinsic alignment between SQL and text, and construct multiple generation models with distinct generation styles by fine-tuning across different SQL formats. The experimental results and comprehensive analysis demonstrate the effectiveness and robustness of our framework. Overall, XiYan-SQL achieves a new SOTA performance of 75.63% on the notable BIRD benchmark, surpassing all previous methods. It also attains SOTA performance on the Spider test set with an accuracy of 89.65%.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. LEAF-SQL: Level-wise Exploration with Adaptive Fine-graining for Text-to-SQL Skeleton Prediction

    cs.CL 2026-05 unverdicted novelty 7.0

    LEAF-SQL uses level-wise exploration with adaptive fine-graining and dual agents to generate diverse SQL skeletons, reaching 71.6% execution accuracy on the BIRD benchmark and outperforming prior search- and skeleton-...

  2. Data-aware candidate selection in NL2SQL translation via small separating instances

    cs.DB 2026-05 unverdicted novelty 6.0

    A selection technique based on separating instances and provenance outperforms baselines for choosing among 2-3 NL2SQL candidates on a BIRD-DEV subset without consistency scores.

  3. Reliable Answers for Recurring Questions: Boosting Text-to-SQL Accuracy with Template Constrained Decoding

    cs.CL 2026-04 unverdicted novelty 6.0

    TeCoD improves Text-to-SQL execution accuracy by up to 36% over in-context learning and cuts latency 2.2x on matched queries by extracting templates from historical pairs and enforcing them with constrained decoding.

  4. SemanticAgent: A Semantics-Aware Framework for Text-to-SQL Data Synthesis

    cs.AI 2026-04 unverdicted novelty 6.0

    SemanticAgent introduces a three-stage semantic analysis, synthesis, and verification process that produces higher-quality text-to-SQL training data than prior execution-only methods.