Knapsack Optimization-based Schema Linking for LLM-based Text-to-SQL Generation

Zheng Yuan , Hao Chen , Zijin Hong , Qinggang Zhang , Feiran Huang , Qing Li , Xiao Huang

Authors on Pith no claims yet

classification 💻 cs.CL cs.DB

keywords linkingschemaunderlinekaslamissingrelevantelementsgeneration

read the original abstract

Generating SQLs from user queries is a long-standing challenge, where the accuracy of initial schema linking significantly impacts subsequent SQL generation performance. However, current schema linking models still struggle with missing relevant schema elements or an excess of redundant ones. A crucial reason for this is that commonly used metrics, recall and precision, fail to capture relevant element missing and thus cannot reflect actual schema linking performance. Motivated by this, we propose enhanced schema linking metrics by introducing a \textbf{restricted missing indicator}. Accordingly, we introduce \textbf{\underline{K}n\underline{a}psack optimization-based \underline{S}chema \underline{L}inking \underline{A}pproach (KaSLA)}, a plug-in schema linking method designed to prevent the missing of relevant schema elements while minimizing the inclusion of redundant ones. KaSLA employs a hierarchical linking strategy that first identifies the optimal table linking and subsequently links columns within the selected table to reduce linking candidate space. In each linking process, it utilizes a knapsack optimization approach to link potentially relevant elements while accounting for a limited tolerance of potentially redundant ones. With this optimization, KaSLA-1.6B achieves superior schema linking results compared to large-scale LLMs, including DeepSeek-V3 with the state-of-the-art (SOTA) schema linking method. Extensive experiments on Spider and BIRD benchmarks verify that KaSLA can significantly improve the SQL generation performance of SOTA Text2SQL models by substituting their schema linking processes. The code is available at https://github.com/DEEP-PolyU/KaSLA.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

EHRAG: Bridging Semantic Gaps in Lightweight GraphRAG via Hybrid Hypergraph Construction and Retrieval
cs.AI 2026-04 unverdicted novelty 6.0

EHRAG constructs structural hyperedges from sentence co-occurrence and semantic hyperedges from entity embedding clusters, then applies hybrid diffusion plus topic-aware PPR to retrieve top-k documents, outperforming ...