arxiv: 2603.02537 · v2 · submitted 2026-03-03 · 💻 cs.DB

Recognition: no theorem link

Large Language Model-Enhanced Relational Operators: Taxonomy, Benchmark, and Analysis

Yunxiang Su , Tianjing Zeng , Zhongjun Ding , Yin Lin , Rong Zhu , Zhewei Wei , Bolin Ding , Jingren Zhou

Authors on Pith no claims yet

Pith reviewed 2026-05-15 17:33 UTC · model grok-4.3

classification 💻 cs.DB

keywords LLM-enhanced relational operatorstaxonomybenchmarksemantic query processingdatabase systemslarge language modelsdata imputationentity matching

0 comments

The pith

A unified taxonomy places LLM-enhanced relational operators into five categories and a benchmark of 350 queries identifies concrete design best practices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper brings order to scattered uses of large language models inside database operators by grouping them under one taxonomy. It then supplies LROBench, a large set of test queries drawn from many real databases and domains, to measure how different implementations actually perform. From those measurements the authors extract practical rules for choosing operator designs and show that applying those rules improves complete multi-operator pipelines. A sympathetic reader would care because semantic queries that mix filtering, matching, and imputation are becoming common, and without shared definitions and tests progress stays fragmented.

Core claim

LLM-Enhanced Relational Operators are defined as LLM calls that keep a strict relational input-output interface. The taxonomy sorts them into Select, Match, Impute, Cluster, and Order, each with listed operands and implementation variants. Evaluation on the 290 single-operator and 60 multi-operator queries of LROBench yields measurable performance differences across variants, from which the authors extract empirical best practices; these practices are then used to build an LRO suite that is compared directly against existing multi-operator systems on complex semantic workloads.

What carries the argument

The LRO taxonomy that aligns operators into Select, Match, Impute, Cluster and Order categories together with their operand and implementation variants, supported by the LROBench benchmark of 290 single-LRO and 60 multi-LRO queries across 27 databases.

If this is right

Designers can select LRO implementations according to the measured performance differences for better accuracy or speed on individual tasks.
Instantiating a multi-LRO system with the identified best practices yields higher end-to-end effectiveness on complex semantic queries than current systems.
The five-category taxonomy supplies a consistent language for comparing and extending operators across different research efforts.
Releasing the full benchmark data and evaluation code allows direct reproduction and extension of the reported measurements.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Adoption of the taxonomy could encourage database engines to expose standard interfaces for semantic operators rather than ad-hoc LLM calls.
The same evaluation approach might be applied to measure how these operators behave under streaming or distributed execution settings.
Hybrid query planners could treat the five LRO types as first-class algebraic operators when optimizing mixed relational and semantic workloads.

Load-bearing premise

The 290 single and 60 multi queries chosen for the benchmark adequately represent the full range of real-world semantic query needs and operating logics.

What would settle it

A fresh collection of semantic queries drawn from an additional domain that produces materially different performance rankings or best-practice recommendations would falsify the claim that the current benchmark results generalize.

Figures

Figures reproduced from arXiv: 2603.02537 by Bolin Ding, Jingren Zhou, Rong Zhu, Tianjing Zeng, Yin Lin, Yunxiang Su, Zhewei Wei, Zhongjun Ding.

**Figure 2.** Figure 2: An example to illustrate LLM-enhanced opera [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Single-LRO query numbers by LRO types and [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Scoring dimensions of multi-LRO queries. [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Overall score distribution of multi-LRO queries. [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Effectiveness and cost of Filter at scale. LLM-ALL Score LLM-ONE Score 10 20 50 100 200 500 1k 2k 5k 10k # Input Tuples 0.0 0.2 0.4 0.6 0.8 1.0 LLM Judge Score LLM-ALL Cost LLM-ONE Cost 10 20 50 100 200 500 1k 2k 5k 10k # Input Tuples 0.0 0.2 0.4 0.6 Cost ($) [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Effectiveness and cost of Impute at scale. 5.2 Scalability Evaluation While effectiveness is critical, real-world LRO query tasks may often involve large-scale inputs, making scalability a key practical concern. We conduct scalability experiments on the two most prevalent tasks for large-scale data: row-wise Filter for semantic filtering in [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

read the original abstract

With the development of large language models (LLMs), numerous studies integrate LLMs through operator-like components to enhance relational data processing tasks, e.g., filters with semantic predicates, knowledge-augmented table imputation, reasoning-driven entity matching and more challenging semantic query processing. These components invoke LLMs while preserving a relational input/output interface, which we refer to as LLM-Enhanced Relational Operators (LROs). From an operator perspective, unfortunately, these existing LROs suffer from fragmented definition, various implementation strategies and inadequate evaluation benchmarks. To this end, in this paper, we first establish a unified LRO taxonomy to align existing LROs, and categorize them into: Select, Match, Impute, Cluster and Order, along with their operands and implementation variants. Second, we design LROBench, a comprehensive benchmark featuring 290 single-LRO queries and 60 multi-LRO queries, spanning 27 databases across more than 10 domains. LROBench covers all operating logics and operand granularities in its single-LRO workload, and provides challenging multi-LRO queries stratified by query complexity. Based on these, we evaluate individual LROs under various implementations, deriving practical insights into LRO design choices and summarizing our empirical best practices. We further compare the end-to-end performance of existing multi-LRO systems against an LRO suite instantiated with these best practices, in order to investigate how to design an effective LRO set for multi-LRO systems targeting complex semantic queries. Last, to facilitate future work, we outline promising future directions and open-source all benchmark data and evaluation code, available at https://github.com/LROBench/LROBench/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The taxonomy and LROBench give a useful shared frame for LLM-enhanced DB operators, but the query selection process needs clearer justification to support the coverage claims.

read the letter

The paper's main contribution is a clean taxonomy that groups existing LLM-enhanced relational operators into five categories—Select, Match, Impute, Cluster, and Order—along with their operands and implementation variants. It then ships LROBench, a benchmark with 290 single-LRO queries and 60 multi-LRO queries across 27 databases in more than 10 domains. This is new; prior work stayed mostly at the level of individual operators without a unified view or a shared multi-operator test suite. Open-sourcing the data and code is a practical plus that lets others build on it directly. The inclusion of stratified multi-LRO queries also addresses a real gap for people trying to chain these operators on complex semantic tasks. The evaluations of different implementations and the comparison against existing multi-LRO systems are the parts that could turn the taxonomy into actionable design advice. The soft spot is the query selection. The claim that the 290 single queries cover all operating logics and operand granularities rests on manual curation stratified only by complexity, with no formal sampling frame or enumeration of the space described. That leaves room for missing sub-cases, such as multi-hop predicates or domain-specific constraints, which matters because the best-practice recommendations depend on the benchmark being representative. The abstract outlines the plan but does not include the actual numbers or error analysis, so the strength of the empirical insights is still hard to judge from the high-level description. This paper is for database researchers who want a starting point when adding semantic operators to relational systems. A reader looking for an organizing framework and reusable queries will get value even if they later refine the coverage. It deserves a serious referee because the taxonomy and open benchmark are concrete steps forward that the community can use and improve.

Referee Report

1 major / 0 minor

Summary. The paper proposes a unified taxonomy for LLM-Enhanced Relational Operators (LROs), categorizing them into Select, Match, Impute, Cluster, and Order along with their operands and implementation variants. It introduces LROBench, a benchmark with 290 single-LRO queries and 60 multi-LRO queries across 27 databases in more than 10 domains, which is asserted to cover all operating logics and operand granularities for single-LRO workloads. The work evaluates individual LRO implementations, derives empirical best practices, compares end-to-end performance of multi-LRO systems against an LRO suite using those practices, and open-sources the benchmark data and code.

Significance. If the benchmark coverage and evaluation results hold, the taxonomy and LROBench could standardize fragmented research on LLM-augmented relational operators, enabling reproducible comparisons and guiding design choices for semantic query processing. The open-sourcing of data, queries, and code is a clear strength that supports future work in this emerging area.

major comments (1)

[Abstract] Abstract and LROBench description: the claim that the 290 single-LRO queries 'cover all operating logics and operand granularities' is load-bearing for the taxonomy alignment and downstream best-practice recommendations, yet no formal enumeration of the logic space, sampling frame, or completeness argument is provided; queries are described as manually curated and stratified only by complexity, leaving open the possibility that sub-classes (e.g., multi-hop semantic predicates or domain-specific constraints) are underrepresented.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the benchmark construction. We address the major comment below and have prepared revisions to strengthen the presentation of LROBench coverage.

read point-by-point responses

Referee: [Abstract] Abstract and LROBench description: the claim that the 290 single-LRO queries 'cover all operating logics and operand granularities' is load-bearing for the taxonomy alignment and downstream best-practice recommendations, yet no formal enumeration of the logic space, sampling frame, or completeness argument is provided; queries are described as manually curated and stratified only by complexity, leaving open the possibility that sub-classes (e.g., multi-hop semantic predicates or domain-specific constraints) are underrepresented.

Authors: We agree that the original wording of the coverage claim was insufficiently justified. Section 3 defines the taxonomy by operator categories (Select, Match, Impute, Cluster, Order) together with operand types and granularities. The 290 queries were manually constructed by first enumerating the Cartesian product of these taxonomy dimensions and then instantiating each cell with representative examples drawn from the 27 databases. Stratification occurred along both operator type and complexity, not complexity alone. While a formal completeness proof is not possible for an open semantic space, we will add a new subsection (4.2) that explicitly lists the enumerated logic combinations, provides the sampling rationale, and includes a table mapping each taxonomy cell to the number of queries. We have also incorporated additional multi-hop predicate and domain-constraint examples into the released benchmark. The abstract claim will be revised to state that LROBench covers the operating logics and operand granularities defined by the taxonomy. These changes preserve the empirical results while addressing the concern directly. revision: partial

Circularity Check

0 steps flagged

No circularity in taxonomy or benchmark construction

full rationale

The paper surveys existing LLM-enhanced operators from the literature to establish a taxonomy (Select/Match/Impute/Cluster/Order), then manually curates 290 single-LRO and 60 multi-LRO queries stratified by complexity across 27 databases. No equations, parameter fitting, predictions, or self-citation chains appear in the central claims. The coverage assertion follows directly from the taxonomy categories rather than reducing to a self-definitional loop or fitted input. All benchmark data and code are open-sourced, making the work externally verifiable without internal circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework assumes LLMs can reliably perform semantic operations when wrapped as relational operators; no free parameters or invented physical entities are introduced.

axioms (1)

domain assumption LLMs invoked via operator-like components preserve relational input/output interfaces while adding semantic capabilities
Core premise stated in the abstract for defining LROs.

invented entities (1)

LRO (LLM-Enhanced Relational Operator) no independent evidence
purpose: Unified category for LLM components that act as relational operators
New conceptual grouping introduced to align existing work; no independent falsifiable evidence provided beyond the taxonomy itself.

pith-pipeline@v0.9.0 · 5622 in / 1202 out tokens · 56904 ms · 2026-05-15T17:33:26.036890+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

OmniTQA: A Cost-Aware System for Hybrid Query Processing over Semi-Structured Data
cs.DB 2026-04 unverdicted novelty 6.0

OmniTQA integrates LLM semantic reasoning as a first-class query operator with classical relational operators in a cost-aware planner for hybrid structured and semi-structured data.

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

Qwen3 Technical Report

Alibaba: Qwen3 technical report. CoRRabs/2505.09388 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

In: NeurIPS Datasets and Benchmarks (2021)

Aly, R., Guo, Z., Schlichtkrull, M.S., Thorne, J., Vla- chos, A., Christodoulopoulos, C., Cocarascu, O., Mittal, A.: FEVEROUS: fact extraction and verification over unstructured and structured information. In: NeurIPS Datasets and Benchmarks (2021)

work page 2021
[3]

CoRRabs/2409.00847(2024)

Anderson, E., Fritz, J., Lee, A., Li, B., Lindblad, M., Lindeman, H., Meyer, A., Parmar, P., Ranade, T., Shah, M.A., Sowell, B., Tecuci, D., Thapliyal, V., Welsh, M.: The design of an llm-powered unstructured analytics system. CoRRabs/2409.00847(2024)

work page arXiv 2024
[4]

URL https://www.anthropic.com/news/claude-sonnet-4-5

Anthropic: Introducing claude sonnet 4.5 (2025). URL https://www.anthropic.com/news/claude-sonnet-4-5

work page 2025
[5]

arXiv preprint arXiv:2408.14717 (2024)

Biswal, A., Patel, L., Jha, S., Kamsetty, A., Liu, S., Gonzalez, J.E., Guestrin, C., Zaharia, M.: Text2sql is not enough: Unifying ai and databases with tag. arXiv preprint arXiv:2408.14717 (2024)

work page arXiv 2024
[6]

BEAVER: An Enterprise Benchmark for Text-to-SQL

Chen, P.B., Wenz, F., Zhang, Y., Kayali, M., Tat- bul, N., Cafarella, M.J., Demiralp, C ¸., Stonebraker, M.: BEAVER: an enterprise benchmark for text-to-sql. CoRR abs/2409.02038(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[7]

In: ICLR

Chen, W., Chang, M., Schlinger, E., Wang, W.Y., Cohen, W.W.: Open question answering over tables and text. In: ICLR. OpenReview.net (2021)

work page 2021
[8]

In: ICLR

Chen, W., Wang, H., Chen, J., Zhang, Y., Wang, H., Li, S., Zhou, X., Wang, W.Y.: Tabfact: A large-scale dataset for table-based fact verification. In: ICLR. OpenReview.net (2020)

work page 2020
[9]

In: EMNLP (Findings), Findings of ACL, vol

Chen, W., Zha, H., Chen, Z., Xiong, W., Wang, H., Wang, W.Y.: Hybridqa: A dataset of multi-hop question answer- ing over tabular and textual data. In: EMNLP (Findings), Findings of ACL, vol. EMNLP 2020, pp. 1026–1036. Asso- ciation for Computational Linguistics (2020)

work page 2020
[10]

In: ICLR

Cheng, Z., Xie, T., Shi, P., Li, C., Nadkarni, R., Hu, Y., Xiong, C., Radev, D., Ostendorf, M., Zettlemoyer, L., Smith, N.A., Yu, T.: Binding language models in symbolic languages. In: ICLR. OpenReview.net (2023)

work page 2023
[11]

In: RecSys, pp

Cremonesi, P., Koren, Y., Turrin, R.: Performance of recommender algorithms on top-n recommendation tasks. In: RecSys, pp. 39–46. ACM (2010)

work page 2010
[12]

C., P.S., Gokhale, C., Konda, P., Govind, Y., Paulsen, D.: The magellan data repository

Das, S., Doan, A., G. C., P.S., Gokhale, C., Konda, P., Govind, Y., Paulsen, D.: The magellan data repository. https://sites.google.com/site/anhaidgroup/ projects/data

work page
[13]

CoRRabs/2307.07306(2023)

Dong, X., Zhang, C., Ge, Y., Mao, Y., Gao, Y., Chen, L., Lin, J., Lou, D.: C3: zero-shot text-to-sql with chatgpt. CoRRabs/2307.07306(2023)

work page arXiv 2023
[14]

Addison-Wesley-Longman (2000)

Elmasri, R., Navathe, S.B.: Fundamentals of Database Systems, 3rd Edition. Addison-Wesley-Longman (2000)

work page 2000
[15]

In: EDBT, pp

Flores, J., Nadal, S., Romero, O.: Towards scalable data discovery. In: EDBT, pp. 433–438. OpenProceedings.org (2021)

work page 2021
[16]

In: ACL (Findings), pp

Gan, Y., Chen, X., Purver, M.: Re-appraising the schema linking for text-to-sql. In: ACL (Findings), pp. 835–852. Association for Computational Linguistics (2023)

work page 2023
[17]

In: ACL (Findings), pp

Glenn, P., Dakle, P., Wang, L., Raghavan, P.: Blendsql: A scalable dialect for unifying hybrid question answering in relational algebra. In: ACL (Findings), pp. 453–466. Association for Computational Linguistics (2024)

work page 2024
[18]

In: ACL (Findings), pp

He, X., Ban, Y., Zou, J., Wei, T., Cook, C.B., He, J.: Llm- forest: Ensemble learning of llms with graph-augmented prompts for data imputation. In: ACL (Findings), pp. 6921–6936. Association for Computational Linguistics (2025)

work page 2025
[19]

In: ICDEW, pp

Huang, Z., Wu, E.: Relationalizing tables with large lan- guage models: The promise and challenges. In: ICDEW, pp. 305–309. IEEE (2024)

work page 2024
[20]

Journal of classification2(1), 193–218 (1985)

Hubert, L., Arabie, P.: Comparing partitions. Journal of classification2(1), 193–218 (1985)

work page 1985
[21]

CoRRabs/2504.04808(2025)

Jin, T., Zhu, Y., Kang, D.: Elt-bench: An end-to-end benchmark for evaluating AI agents on ELT pipelines. CoRRabs/2504.04808(2025)

work page arXiv 2025
[22]

Jo, S., Trummer, I.: Thalamusdb: Approximate query processing on multi-modal data. Proc. ACM Manag. Data2(3), 186 (2024)

work page 2024
[23]

In: EDBT, pp

Katsogiannis-Meimarakis, G., Mirylenka, K., Scotton, P., Fusco, F., Labbi, A.: In-depth analysis of llm-based schema linking. In: EDBT, pp. 117–130. OpenProceed- ings.org (2026)

work page 2026
[24]

Biometrika30(1-2), 81–93 (1938)

Kendall, M.G.: A new measure of rank correlation. Biometrika30(1-2), 81–93 (1938)

work page 1938
[25]

Khatiwada, A., Fan, G., Shraga, R., Chen, Z., Gat- terbauer, W., Miller, R.J., Riedewald, M.: SANTOS: relationship-based semantic table union search. Proc. ACM Manag. Data1(1), 9:1–9:25 (2023)

work page 2023
[26]

CoRRabs/2511.01716(2025)

Lao, J., Zimmerer, A., Ovcharenko, O., Cong, T., Russo, M., Vitagliano, G., Cochez, M., ¨Ozcan, F., Gupta, G., Hottelier, T., Jagadish, H.V., Kissel, K., Schelter, S., Kipf, A., Trummer, I.: Sembench: A benchmark for semantic query processing engines. CoRRabs/2511.01716(2025)

work page arXiv 2025
[27]

In: ICLR

Lei, F., Chen, J., Ye, Y., Cao, R., Shin, D., Su, H., Suo, Z., Gao, H., Hu, W., Yin, P., Zhong, V., Xiong, C., Sun, R., Liu, Q., Wang, S., Yu, T.: Spider 2.0: Evaluating language models on real-world enterprise text-to-sql workflows. In: ICLR. OpenReview.net (2025)

work page 2025
[28]

In: NeurIPS (2020)

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., K¨ uttler, H., Lewis, M., Yih, W., Rockt¨ aschel, T., Riedel, S., Kiela, D.: Retrieval-augmented generation for knowledge-intensive NLP tasks. In: NeurIPS (2020)

work page 2020
[29]

In: WWW (Companion Volume), pp

Li, H., Li, S., Hao, F., Zhang, C.J., Song, Y., Chen, L.: Booster: Leveraging large language models for enhancing entity resolution. In: WWW (Companion Volume), pp. 1043–1046. ACM (2024)

work page 2024
[30]

In: AAAI, pp

Li, H., Zhang, J., Li, C., Chen, H.: RESDSQL: decoupling schema linking and skeleton parsing for text-to-sql. In: AAAI, pp. 13,067–13,075. AAAI Press (2023)

work page 2023
[31]

In: NeurIPS (2023)

Li, J., Hui, B., Qu, G., Yang, J., Li, B., Li, B., Wang, B., Qin, B., Geng, R., Huo, N., Zhou, X., Ma, C., Li, G., Chang, K.C., Huang, F., Cheng, R., Li, Y.: Can LLM already serve as A database interface? A big bench for large-scale database grounded text-to-sqls. In: NeurIPS (2023)

work page 2023
[32]

Li, P., He, Y., Yashar, D., Cui, W., Ge, S., Zhang, H., Fainman, D.R., Zhang, D., Chaudhuri, S.: Table-gpt: Ta- ble fine-tuned GPT for diverse table tasks. Proc. ACM Manag. Data2(3), 176 (2024)

work page 2024
[33]

CoRRabs/2405.04674(2024)

Lin, Y., Hulsebos, M., Ma, R., Shankar, S., Zeighami, S., Parameswaran, A.G., Wu, E.: Towards accurate and efficient document analytics with large language models. CoRRabs/2405.04674(2024)

work page arXiv 2024
[34]

In: Proceed- ings of the Conference on Innovative Database Research (CIDR)

Liu, C., Russo, M., Cafarella, M., Cao, L., Chen, P.B., Chen, Z., Franklin, M., Kraska, T., Madden, S., Shahout, R., Vitagliano, G.: Palimpzest: Optimizing ai-powered analytics with declarative query processing. In: Proceed- ings of the Conference on Innovative Database Research (CIDR)

work page
[35]

In: COLING, pp

Liu, G., Tan, Y., Zhong, R., Xie, Y., Zhao, L., Wang, Q., Hu, B., Li, Z.: Solid-sql: Enhanced schema-linking based in-context learning for robust text-to-sql. In: COLING, pp. 9793–9803. Association for Computational Linguistics (2025) 22 Su et al

work page 2025
[36]

In: NAACL-HLT (Findings), pp

Liu, S., Xu, J., Tjangnaka, W., Semnani, S.J., Yu, C.J., Lam, M.: SUQL: conversational search over structured and unstructured data with large language models. In: NAACL-HLT (Findings), pp. 4535–4555. Association for Computational Linguistics (2024)

work page 2024
[37]

In: IEEE Big Data, pp

Ma, L., Thakurdesai, N., Chen, J., Xu, J., K¨ orpeoglu, E., Kumar, S., Achan, K.: Llms with user-defined prompts as generic data operators for reliable data processing. In: IEEE Big Data, pp. 3144–3148. IEEE (2023)

work page 2023
[38]

VLDB Endow.16(4), 738–746 (2022)

Narayan, A., Chami, I., Orr, L.J., R´ e, C.: Can foundation models wrangle your data? Proc. VLDB Endow.16(4), 738–746 (2022)

work page 2022
[39]

CoRR (2025)

OpenAI: Openai gpt-5 system card. CoRR (2025)

work page 2025
[40]

Patel, L., Jha, S., Pan, M.Z., Gupta, H., Asawa, P., Guestrin, C., Zaharia, M.: Semantic operators and their optimization: Towards ai-based data analytics with accu- racy guarantees. Proc. VLDB Endow.18(11), 4171–4184 (2025)

work page 2025
[41]

In: EDBT, pp

Peeters, R., Steiner, A., Bizer, C.: Entity matching using large language models. In: EDBT, pp. 529–541. OpenPro- ceedings.org (2025)

work page 2025
[42]

¨O.: CHASE-SQL: multi-path reasoning and preference opti- mized candidate selection in text-to-sql

Pourreza, M., Li, H., Sun, R., Chung, Y., Talaei, S., Kakkar, G.T., Gan, Y., Saberi, A., Ozcan, F., Arik, S. ¨O.: CHASE-SQL: multi-path reasoning and preference opti- mized candidate selection in text-to-sql. In: ICLR. Open- Review.net (2025)

work page 2025
[43]

In: NeurIPS (2023)

Pourreza, M., Rafiei, D.: DIN-SQL: decomposed in- context learning of text-to-sql with self-correction. In: NeurIPS (2023)

work page 2023
[44]

In: MLSys

Qian, Y., He, Y., Zhu, R., Huang, J., Ma, Z., Wang, H., Wang, Y., Sun, X., Lian, D., Ding, B., Zhou, J.: Unidm: A unified framework for data manipulation with large language models. In: MLSys. mlsys.org (2024)

work page 2024
[45]

CoRR abs/2411.19504(2024)

Qiu, Z., Peng, Y., He, G., Yuan, B., Wang, C.: Tqa- bench: Evaluating llms for multi-table question answering with scalable context and symbolic extension. CoRR abs/2411.19504(2024)

work page arXiv 2024
[46]

CoRR abs/2505.14661(2025)

Russo, M., Sudhir, S., Vitagliano, G., Liu, C., Kraska, T., Madden, S., Cafarella, M.J.: Abacus: A cost- based optimizer for semantic operator systems. CoRR abs/2505.14661(2025)

work page arXiv 2025
[47]

In: SIGMOD Conference, pp

Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: SIGMOD Conference, pp. 23–34. ACM (1979)

work page 1979
[48]

Shankar, S., Chambers, T., Shah, T., Parameswaran, A.G., Wu, E.: Docetl: Agentic query rewriting and evaluation for complex document processing. Proc. VLDB Endow. 18(9), 3035–3048 (2025)

work page 2025
[49]

In: ICDEW, pp

Steiner, A., Peeters, R., Bizer, C.: Fine-tuning large lan- guage models for entity matching. In: ICDEW, pp. 9–17. IEEE (2025)

work page 2025
[50]

Strehl, A., Ghosh, J.: Cluster ensembles — A knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res.3, 583–617 (2002)

work page 2002
[51]

Sun, Z., Chai, C., Deng, Q., Jin, K., Guo, X., Han, H., Yuan, Y., Wang, G., Cao, L.: QUEST: query optimization in unstructured document analysis. Proc. VLDB Endow. 18(11), 4560–4573 (2025)

work page 2025
[52]

In: NAACL-HLT, pp

Thorne, J., Vlachos, A., Christodoulopoulos, C., Mittal, A.: FEVER: a large-scale dataset for fact extraction and verification. In: NAACL-HLT, pp. 809–819. Association for Computational Linguistics (2018)

work page 2018
[53]

In: CIDR

Urban, M., Binnig, C.: CAESURA: language models as multi-modal query planners. In: CIDR. www.cidrdb.org (2024)

work page 2024
[54]

In: COLING, pp

Wang, B., Ren, C., Yang, J., Liang, X., Bai, J., Chai, L., Yan, Z., Zhang, Q., Yin, D., Sun, X., Li, Z.: MAC-SQL: A multi-agent collaborative framework for text-to-sql. In: COLING, pp. 540–557. Association for Computational Linguistics (2025)

work page 2025
[55]

In: COLING, pp

Wang, T., Chen, X., Lin, H., Chen, X., Han, X., Sun, L., Wang, H., Zeng, Z.: Match, compare, or select? an investigation of large language models for entity matching. In: COLING, pp. 96–109. Association for Computational Linguistics (2025)

work page 2025
[56]

In: ICLR

Wu, J., Yang, L., Li, D., Ji, Y., Okumura, M., Zhang, Y.: MMQA: evaluating llms with multi-table multi-hop complex questions. In: ICLR. OpenReview.net (2025)

work page 2025
[57]

In: EMNLP, pp

Yu, T., Zhang, R., Yang, K., Yasunaga, M., Wang, D., Li, Z., Ma, J., Li, I., Yao, Q., Roman, S., Zhang, Z., Radev, D.R.: Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text- to-sql task. In: EMNLP, pp. 3911–3921. Association for Computational Linguistics (2018)

work page 2018
[58]

In: VLDB Workshops

Zhang, H., Dong, Y., Xiao, C., Oyamada, M.: Large lan- guage models as data preprocessors. In: VLDB Workshops. VLDB.org (2024)

work page 2024
[59]

CoRR abs/2408.00884(2024)

Zhao, F., Agrawal, D., Abbadi, A.E.: Hybrid querying over relational databases and large language models. CoRR abs/2408.00884(2024)

work page arXiv 2024
[60]

Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning

Zhong, V., Xiong, C., Socher, R.: Seq2sql: Generating structured queries from natural language using reinforce- ment learning. CoRRabs/1709.00103(2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017