arxiv: 2604.23477 · v2 · submitted 2026-04-26 · 💻 cs.DB

Recognition: unknown

SEMA-SQL: Beyond Traditional Relational Querying with Large Language Models

Yin Lin , Tianjing Zeng , Zhongjun Ding , Rong Zhu , Bolin Ding , H. V. Jagadish , Jingren Zhou

Authors on Pith no claims yet

Pith reviewed 2026-05-08 05:07 UTC · model grok-4.3

classification 💻 cs.DB

keywords SEMA-SQLHybrid Relational AlgebraLLM UDFssemantic joinsnatural language queryingquery optimizationtext-to-SQLdatabase semantics

0 comments

The pith

SEMA-SQL generates efficient hybrid queries that combine relational operations with LLM semantic functions to answer natural language questions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SEMA-SQL to answer natural language questions over databases by automatically generating queries that mix standard relational algebra with LLM-powered semantic operations. It defines Hybrid Relational Algebra as the unifying declarative model for these hybrid queries. The system automates query generation from natural language via in-context learning, applies cost-based optimization and rewriting, and uses execution algorithms that batch LLM calls to reduce invocations by 93 percent on average for semantic joins. This extends both text-to-SQL systems, which lack semantic reasoning, and manual semantic operator systems, which require complex user pipelines. Experiments on benchmarks and extensions show expanded query capabilities for tasks like semantic entity matching and unstructured text analysis.

Core claim

SEMA-SQL automates the generation of Hybrid Relational Algebra queries from natural language, optimizes them through cost-based transformations and UDF rewriting, and executes them efficiently with batching that reduces LLM invocations by an average of 93 percent in semantic joins, enabling reliable answers to questions that require both structured relational operations and semantic reasoning.

What carries the argument

Hybrid Relational Algebra (HRA), a declarative extension of relational algebra that incorporates LLM user-defined functions specified in natural language for semantic operations such as joins, mappings, and aggregations.

If this is right

Hybrid queries can perform semantic joins and mappings across entities with inconsistent names or requiring external knowledge extraction.
Intelligent batching reduces the number of LLM invocations by an average of 93 percent during execution of semantic joins.
Natural language questions can be answered without users manually constructing pipelines of semantic operators.
Query capabilities expand beyond text-to-SQL systems by directly embedding LLM semantic reasoning into the algebra.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could support more natural database interfaces for users who lack SQL expertise but need semantic analysis.
Batching strategies developed here might generalize to reduce costs in other LLM-augmented data processing pipelines.
Integration with fine-tuned models specialized for database semantics could further lower error rates in production settings.
Similar hybrid algebra designs could be explored for non-relational systems such as graph or document databases.

Load-bearing premise

Large language models can reliably execute semantic user-defined functions from natural language specifications via in-context learning without introducing error rates or hallucinations that invalidate the hybrid results.

What would settle it

Execution of benchmark queries where LLM semantic operations produce incorrect outputs that cause the overall hybrid results to deviate from ground truth beyond acceptable thresholds.

Figures

Figures reproduced from arXiv: 2604.23477 by Bolin Ding, H. V. Jagadish, Jingren Zhou, Rong Zhu, Tianjing Zeng, Yin Lin, Zhongjun Ding.

**Figure 1.** Figure 1: Motivating examples: extending relational querying with LLM capabilities view at source ↗

**Figure 2.** Figure 2: Overview of the Sema-SQL system, which operates in three phases: (1) Query Generation translates natural language questions into HRA queries; (2) Query Optimization optimizes query plans via a cost-based algorithm and UDF rewriting; (3) Query Execution executes optimized plans to produce final answers. matching entities in the join columns; (b) a semantic mapping extracts missing information from parametr… view at source ↗

**Figure 4.** Figure 4: Examples of LLM UDFs in HRA for semantic operations. content through semantic processing (e.g., FavoriteDish in Figure 4b). We formally define an LLM UDF as follows: Definition 2.1 (LLM User-Defined Function). Let 𝑇 be input relations and𝐶 be a subset of columns from𝑇 , where𝑇 [𝐶] denotes the projection of 𝑇 onto columns 𝐶. An LLM-powered UDF 𝑈 𝑙 𝑀 leverages a language model 𝑀 to evaluate a natural languag… view at source ↗

**Figure 3.** Figure 3: Example query from the TAG benchmark: “How many test takers are there at the school/s in a county with population over 2 million?”. (a) LOTUS: expert-written program with explicit execution logic. (b) HRA: declarative algebraic operators. 2 HYBRID RELATIONAL ALGEBRA We introduce Hybrid Relational Algebra (HRA), which extends relational algebra with LLM-based semantic operations. HRA provides a declarativ… view at source ↗

**Figure 5.** Figure 5: Prompt template for HRA query generation. 3 QUERY GENERATION Automatically synthesizing HRA queries from natural language poses three core technical challenges: (1) semantic-aware schema encoding—representing database schemas to enable accurate operator selection and target data identification, (2) compositional query decomposition—mapping natural language questions to reasoning steps that align with Sema… view at source ↗

**Figure 6.** Figure 6: Example: query optimization with lazy LLM evaluation. costs and relational database operation costs, leveraging symbolic execution [50] to ensure plan equivalence across transformations. The optimization process first parses the HRA query into a logical plan, during which Sema-SQL’s parser validates syntax and verifies that all referenced tables and columns exist in the database. Definition 4.1 (Query Plan… view at source ↗

**Figure 7.** Figure 7: Verification for plan equivalence. The worst-case time complexity of the algorithm is 𝑂(𝑘 · 2 𝑚), where 𝑘 is the number of operators in the query plan and 𝑚 is the number of semantic operators. At each node, for each of the 𝑚 semantic operators, we have two choices: either reposition it immediately under node 𝑣 (𝑆𝑣 ) or leave it in the subtree(s) below. To determine whether the transformation produces an e… view at source ↗

**Figure 9.** Figure 9: Ablation study of query generation components in execution accuracy. with Opt. w/o Opt. 10 2 4 × 10 1 6 × 10 1 2 × 10 2 Execution Time (s) 45.1 62.9 Mean with Opt. w/o Opt. 10 5 2 × 10 4 3 × 10 4 4 × 10 4 6 × 10 4 Token Usage 25.0k 31.5k Mean view at source ↗

**Figure 11.** Figure 11: Comparison for semantic join algorithms view at source ↗

read the original abstract

Relational databases excel at structured data analysis, but real-world queries increasingly require capabilities beyond standard SQL, such as semantically matching entities across inconsistent names, extracting information not explicitly stored in schemas, and analyzing unstructured text. While text-to-SQL systems enable natural language querying, they remain limited to relational operations and cannot leverage the semantic reasoning capabilities of modern large language models (LLMs). Conversely, recent semantic operator systems extend relational algebra with LLM-powered operations (e.g., semantic joins, mappings, aggregations), but require users to manually construct complex query pipelines. To address this gap, we present SEMA-SQL, a system that automatically answers natural language questions by generating efficient queries that combine relational operations with LLM semantic reasoning. We formalize Hybrid Relational Algebra (HRA), a declarative abstraction unifying traditional relational operators with LLM user-defined functions (UDFs). The system automates three critical aspects: (1) query generation via in-context learning that produces HRA queries with precise natural language specifications for LLM UDFs, (2) query optimization via cost-based transformations and UDF rewriting, and (3) efficient execution algorithms that reduce LLM invocations by an average of 93% in semantic joins through intelligent batching. Extensive experiments with known benchmarks, and extensions thereof, demonstrate the significant query capability improvements possible with our design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SEMA-SQL automates a hybrid algebra that mixes standard SQL with LLM semantic UDFs and claims big efficiency wins through batching, but the abstract gives almost no experimental details to back the 93% reduction or accuracy numbers.

read the letter

The core contribution is formalizing Hybrid Relational Algebra as a declarative layer on top of relational operators plus LLM UDFs, then building a full pipeline that turns natural language questions into optimized HRA queries and executes them with intelligent batching. That combination of automated generation, cost-based rewriting, and batched execution is what sets it apart from plain text-to-SQL tools or the earlier manual semantic-operator systems. The batching trick for cutting LLM calls on semantic joins is a practical engineering point that could matter for cost and latency in real deployments. The paper does a clean job stating the gap and showing how HRA lets users avoid hand-crafting complex pipelines. The experimental claims are the soft spot. The abstract says extensive benchmarks and extensions demonstrate significant improvements and a 93% average drop in LLM invocations, yet supplies no tables, error rates, baseline comparisons, or discussion of hallucination handling. Without those numbers or controls, it is difficult to know whether the hybrid results stay reliable when the LLM UDFs misfire. The central assumption that in-context learning is enough to make the semantic functions trustworthy therefore sits untested in the summary. This work is aimed at database researchers who want to extend query engines with semantic capabilities on messy data. A reader already following text-to-SQL or LLM-operator papers would find the HRA abstraction and the execution optimizations worth discussing. The paper deserves peer review because the architecture is coherent and the efficiency ideas are concrete enough to evaluate, even though the current write-up will need a much stronger experimental section before acceptance.

Referee Report

2 major / 2 minor

Summary. The paper presents SEMA-SQL, a system that answers natural language questions over relational databases by automatically generating queries in a Hybrid Relational Algebra (HRA). HRA extends traditional relational algebra with LLM-powered user-defined functions (UDFs) for semantic operations such as joins, mappings, and aggregations on unstructured or inconsistently named data. The system automates three aspects: (1) query generation via in-context learning to produce HRA expressions with natural-language specifications for the LLM UDFs, (2) cost-based query optimization and UDF rewriting, and (3) efficient execution algorithms that batch LLM calls to achieve an average 93% reduction in invocations for semantic joins. Experiments on standard benchmarks and extensions thereof are claimed to show substantial gains in query capability over both pure SQL and text-to-SQL baselines.

Significance. If the experimental claims hold under rigorous validation, the work would provide a useful declarative bridge between structured relational processing and LLM semantic reasoning, reducing the need for manual construction of hybrid pipelines. The reported 93% reduction via batching represents a concrete engineering contribution to execution efficiency. The formalization of HRA offers a clean abstraction that could serve as a foundation for future hybrid query languages. Credit is due for the focus on automation of generation, optimization, and execution rather than leaving these to the user.

major comments (2)

[Experimental Evaluation] Experimental Evaluation section: the central claim of 'significant query capability improvements' and the 93% average reduction in LLM invocations rests on benchmark results, yet the section provides insufficient detail on error rates, hallucination frequency, or accuracy of the LLM UDF outputs against ground truth. Without these metrics and controls (e.g., comparison of hybrid results to manually verified answers), it is impossible to determine whether the reported capability gains are offset by unacceptable semantic errors.
[Query Optimization] Query Optimization and Execution sections: the cost model used for deciding when to apply UDF rewriting and batching is not specified in sufficient detail. In particular, it is unclear how the optimizer estimates the latency, monetary cost, or failure probability of LLM UDF calls relative to relational operators; this estimation is load-bearing for the claim that the generated plans are both correct and efficient.

minor comments (2)

[Related Work] The related-work discussion of prior semantic-operator systems could be expanded with explicit side-by-side comparison of query expressiveness and automation level.
[Hybrid Relational Algebra] Notation for HRA operators is introduced without a compact summary table; adding one would improve readability when the algebra is referenced in later sections.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the paper accordingly to strengthen the presentation of our experimental results and the details of our cost model.

read point-by-point responses

Referee: [Experimental Evaluation] Experimental Evaluation section: the central claim of 'significant query capability improvements' and the 93% average reduction in LLM invocations rests on benchmark results, yet the section provides insufficient detail on error rates, hallucination frequency, or accuracy of the LLM UDF outputs against ground truth. Without these metrics and controls (e.g., comparison of hybrid results to manually verified answers), it is impossible to determine whether the reported capability gains are offset by unacceptable semantic errors.

Authors: We agree that additional quantitative metrics on accuracy, error rates, and hallucination frequency are needed to fully substantiate the capability claims. In the revised manuscript, we will expand the Experimental Evaluation section to include accuracy rates of LLM UDF outputs measured against ground truth, observed hallucination frequencies across benchmarks, and results from manual verification of sampled hybrid query results. These additions will allow readers to evaluate whether semantic errors offset the reported gains. revision: yes
Referee: [Query Optimization] Query Optimization and Execution sections: the cost model used for deciding when to apply UDF rewriting and batching is not specified in sufficient detail. In particular, it is unclear how the optimizer estimates the latency, monetary cost, or failure probability of LLM UDF calls relative to relational operators; this estimation is load-bearing for the claim that the generated plans are both correct and efficient.

Authors: We acknowledge that the cost model description requires more detail. We will revise the Query Optimization section to fully specify the cost model, including explicit formulas and methods for estimating latency, monetary costs, and failure probabilities of LLM UDF calls relative to traditional relational operators. We will also describe how these estimates are calibrated and used in plan selection. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes a system architecture (SEMA-SQL) that formalizes Hybrid Relational Algebra (HRA) and automates query generation, optimization, and execution with LLM UDFs. It reports empirical improvements on benchmarks without any mathematical derivation chain, fitted parameters presented as predictions, self-definitional constructs, or load-bearing self-citations. The abstract and high-level claims rest on a proposed design and experimental results that are independent of the inputs by construction. No equations or reductions to prior fitted quantities appear.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that LLMs can serve as reliable semantic UDFs when prompted in natural language and that in-context learning suffices to generate correct HRA queries; no free parameters or invented entities beyond HRA itself are stated in the abstract.

axioms (1)

domain assumption Hybrid Relational Algebra extends standard relational algebra by allowing LLM-powered user-defined functions whose semantics are specified in natural language.
This is the core formalization invoked to unify relational and semantic operations.

invented entities (1)

Hybrid Relational Algebra (HRA) no independent evidence
purpose: Declarative abstraction that unifies traditional relational operators with LLM UDFs
New abstraction introduced to enable automated hybrid query generation and optimization.

pith-pipeline@v0.9.0 · 5552 in / 1358 out tokens · 71921 ms · 2026-05-08T05:07:21.295574+00:00 · methodology