pith. sign in

arxiv: 2606.29151 · v1 · pith:WYBVDUGMnew · submitted 2026-06-28 · 💻 cs.DB

CADENZA: Compiling Natural-Language Intent into Task-Specific Operator DAGs for Semantic Query Processing

Pith reviewed 2026-06-30 02:46 UTC · model grok-4.3

classification 💻 cs.DB
keywords semantic query processingquery optimizationtask DAGsrelational algebra extensionBayesian optimizationnatural language intentmulti-objective optimizationmodel inference routing
0
0 comments X

The pith

CADENZA compiles natural-language intents into typed task DAGs using task-extended relational algebra so semantic queries can be filtered, reordered, routed, and jointly tuned for quality-latency-cost trade-offs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that semantic query processing engines can treat each semantic operator instance as a relational optimization object by compiling its natural-language intent into a plan space of typed task DAGs. This exposes intermediate task outputs for filtering, reordering, routing, thresholding, and joint tuning, which existing optimizers cannot do. A logical planner synthesizes seed plans in TxRA, applies dependency-checked structural rewrites, and generates semantics-guided alternatives. A physical planner then routes each task operator over heterogeneous backends and uses Bayesian optimization to tune cutpoints, parameters, and thresholds. On SemBench the approach yields large simultaneous gains in quality, latency, and cost.

Core claim

CADENZA compiles each semantic operator instance—a template bound to a natural-language intent—into an intent-specific plan space of typed task DAGs and selects an executable plan under user-specified quality-latency-cost trade-offs. It introduces TxRA, a conservative extension of relational algebra with task-specific operators. The logical planner synthesizes seed TxRA plans, applies structural rewrites whose safety conditions are checked from operator dependencies, and enumerates semantics-guided alternatives from alternative-generation templates. The physical planner compiles each task-specific operator into a router over heterogeneous backends and jointly tunes routing cutpoints, backend

What carries the argument

task-extended relational algebra (TxRA), a conservative extension of relational algebra that adds task-specific operators so intermediate inference outputs become first-class relational objects for rewriting and routing.

If this is right

  • Semantic operators become subject to the same algebraic rewrites and cost-based planning as relational operators.
  • User-specified quality-latency-cost preferences can be met by a single joint optimization pass rather than separate tuning loops.
  • Intermediate task outputs can be materialized, filtered, or reused across multiple semantic operators in one query.
  • Heterogeneous backends for the same task can be selected per instance rather than fixed at the operator level.
  • Alternative-generation templates allow the planner to explore semantically equivalent but structurally different task DAGs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same compilation approach could be applied to non-relational dataflows that already contain model calls, such as data pipelines in scientific computing.
  • If task outputs are exposed as relations, standard provenance or explanation techniques developed for relational queries become directly applicable to semantic results.
  • The router-plus-Bayesian-tuning layer could be reused as a standalone component for any system that must choose among multiple inference backends under multi-objective constraints.
  • Extending TxRA with additional task operators for new modalities would require only new alternative-generation templates rather than a full redesign of the optimizer.

Load-bearing premise

Safety conditions for structural rewrites can be checked from operator dependencies alone and Bayesian optimization can jointly tune routing, backends, and thresholds effectively under user trade-offs.

What would settle it

A workload where a dependency-checked structural rewrite produces incorrect final answers on the same inputs that the original plan answered correctly, or where Bayesian optimization returns a plan whose measured quality-latency-cost point lies outside the Pareto surface found by exhaustive search.

Figures

Figures reproduced from arXiv: 2606.29151 by Jaehyun Ha, Wook-Shin Han, Yongjoo Park.

Figure 1
Figure 1. Figure 1: A semantic operator instance embedded in SQL. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the CADENZA architecture illustrating an end-to-end workflow example with representative baseline plans. For clarity, implementation-level optimizations that apply uniformly across plans (e.g., LLM batching) are not shown. implementations (symbolic matching, a distilled QA model, a strong general-purpose LLM, and a composite RAG-style), using cheap input features such as text length with a tuna… view at source ↗
Figure 3
Figure 3. Figure 3: The workflow of CADENZA’s logical planner: seed synthesis (gray) and plan exploration (green). 𝜎answer=“yes”∧Score≥𝜆  ApplyTxtQA𝜑 (txt) [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: End-to-end per-query Q/L/C across systems for each scenario. [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Sensitivity to Bayesian optimization trials. [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: End-to-end scaling with data size. B Per-Scenario and BioDEX End-to-End Results Tables 7–11 report the per-scenario raw values summarized in [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Mean utility per (scenario, system, weight). Queries a system fails to run contribute zero to its mean (zero-fill). Higher [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: BO trials and hypervolume over the Pareto set. [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Top-3 logical plans for a representative E-Commerce query. [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Quality–Cost trade-off on Movie. Up-and-left is better. [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Quality–Cost trade-off on Wildlife. Up-and-left is better. [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Quality–Cost trade-off on MMQA. Up-and-left is better. [PITH_FULL_IMAGE:figures/full_fig_p021_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Quality–Cost trade-off on Cars. Up-and-left is better. [PITH_FULL_IMAGE:figures/full_fig_p021_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Quality–Cost trade-off on E-Commerce. Up-and-left is better. [PITH_FULL_IMAGE:figures/full_fig_p022_14.png] view at source ↗
read the original abstract

Semantic query processing engines (SQPEs) extend relational query processing with semantic operators that are executed via model inference over unstructured data. Optimizing such queries is inherently multi-objective: model inference dominates latency and monetary cost, and outputs are stochastic and backend-dependent, so quality must be optimized alongside efficiency. Existing SQPE optimizers do not expose each semantic operator instance's intermediate task outputs as a relational optimization object, leaving optimization unable to filter, reorder, route, threshold, or jointly tune them. We present CADENZA, which compiles each semantic operator instance--a template bound to a natural-language intent--into an intent-specific plan space of typed task DAGs and selects an executable plan under user-specified quality-latency-cost trade-offs. CADENZA introduces task-extended relational algebra (TxRA), a conservative extension of relational algebra with task-specific operators. The logical planner synthesizes seed TxRA plans, applies structural rewrites whose safety conditions are checked from operator dependencies, and enumerates semantics-guided alternatives from alternative-generation templates. The physical planner compiles each task-specific operator into a router over heterogeneous backends and jointly tunes routing cutpoints, backend parameters, and relational thresholds with Bayesian optimization. On SemBench, CADENZA improves the scenario-level averages of quality, latency, and cost by up to +0.49, 165.7x, and 310.3x, respectively, relative to state-of-the-art.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces CADENZA, a system for semantic query processing that compiles each natural-language intent bound to a semantic operator into an intent-specific plan space of typed task DAGs expressed in task-extended relational algebra (TxRA). The logical planner synthesizes seed TxRA plans, applies structural rewrites whose safety is checked from operator dependencies, and enumerates alternatives; the physical planner uses routers over heterogeneous backends and Bayesian optimization to jointly tune routing, backend parameters, and thresholds under quality-latency-cost trade-offs. On SemBench the system reports scenario-level average improvements of up to +0.49 in quality, 165.7x in latency, and 310.3x in cost relative to prior SQPE optimizers.

Significance. If the empirical results and the two core mechanisms (dependency-based rewrite safety and effective multi-objective BO tuning) hold under scrutiny, the work would constitute a meaningful advance in SQPE optimization by exposing intermediate semantic task outputs as first-class relational objects and providing a concrete compilation and tuning pipeline. The conservative extension TxRA and the separation of logical enumeration from physical Bayesian tuning are technically clean contributions that could be adopted more broadly.

major comments (2)
  1. [Logical planner] Logical planner (description of structural rewrites): the safety conditions for rewrites are stated to be checkable from operator dependencies alone, yet semantic operators produce stochastic, backend-dependent outputs; dependency graphs alone may miss cases where reordering or thresholding changes quality distributions or cost in non-obvious ways. This assumption is load-bearing for the claim that the enumerated TxRA plans remain correct.
  2. [Physical planner] Physical planner (Bayesian optimization paragraph): the joint tuning of routers, backend parameters, and relational thresholds via BO is presented as effective under user trade-offs, but the manuscript supplies no convergence diagnostics, search-space characterization, or ablation isolating the contribution of the multi-objective optimizer. The headline gains (+0.49 / 165.7x / 310.3x) rest directly on this component working as claimed.
minor comments (2)
  1. [Abstract / Evaluation] The abstract and experimental claims would benefit from explicit statement of the number of scenarios, variance across runs, and the precise state-of-the-art baselines used for the reported averages.
  2. [TxRA definition] Notation for TxRA operators and the alternative-generation templates could be introduced with a small example early in the paper to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We respond to each major point below, with planned revisions where the manuscript can be strengthened.

read point-by-point responses
  1. Referee: [Logical planner] Logical planner (description of structural rewrites): the safety conditions for rewrites are stated to be checkable from operator dependencies alone, yet semantic operators produce stochastic, backend-dependent outputs; dependency graphs alone may miss cases where reordering or thresholding changes quality distributions or cost in non-obvious ways. This assumption is load-bearing for the claim that the enumerated TxRA plans remain correct.

    Authors: TxRA safety conditions verify structural equivalence of task outputs based on declared types and dependencies, ensuring rewrites preserve the computation graph independently of specific realizations. Stochasticity and backend effects are addressed downstream via threshold tuning and routing in the physical planner. We will revise the logical planner section to add an explicit paragraph separating logical structural safety from physical quality optimization. revision: yes

  2. Referee: [Physical planner] Physical planner (Bayesian optimization paragraph): the joint tuning of routers, backend parameters, and relational thresholds via BO is presented as effective under user trade-offs, but the manuscript supplies no convergence diagnostics, search-space characterization, or ablation isolating the contribution of the multi-objective optimizer. The headline gains (+0.49 / 165.7x / 310.3x) rest directly on this component working as claimed.

    Authors: We agree that additional diagnostics would strengthen the claims. In revision we will add convergence diagnostics for the BO runs, a description of the search space, and an ablation isolating the multi-objective optimizer's contribution to the reported gains. revision: yes

Circularity Check

0 steps flagged

No circularity: system architecture paper with empirical results only

full rationale

The manuscript describes a new query compilation system (CADENZA) that introduces TxRA, synthesizes plans via logical and physical planners, and reports measured improvements (+0.49 quality, 165.7x latency, 310.3x cost) on SemBench. No equations, first-principles derivations, or predictions appear that reduce by construction to fitted parameters, self-citations, or renamed inputs. Bayesian optimization is used as a tuning mechanism whose outputs are evaluated empirically rather than asserted as forced identities. The design is presented as a self-contained engineering contribution without load-bearing self-citation chains or self-definitional steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no concrete free parameters, axioms, or invented entities; all such details are absent from the provided text.

pith-pipeline@v0.9.1-grok · 5820 in / 987 out tokens · 70848 ms · 2026-06-30T02:46:50.240030+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 7 canonical work pages

  1. [1]

    Amazon Redshift ML

    2025. Amazon Redshift ML. https://aws.amazon.com/ko/redshift/features/ redshift-ml/. accessed: 2025-09-28

  2. [2]

    Samuel Arch, Yuchen Liu, Todd C Mowry, Jignesh M Patel, and Andrew Pavlo

  3. [3]

    The key to effective udf optimization: Before inlining, first perform outlin- ing.Proceedings of the VLDB Endowment18, 1 (2024), 1–13

  4. [4]

    Konstantinos Chasialis, Yannis Foufoulas, Alkis Simitsis, and Yannis Ioannidis

  5. [5]

    Optimizing UDF Queries in SQL Data Engines. (2025)

  6. [6]

    Karel D’Oosterlinck, François Remy, Johannes Deleu, Thomas Demeester, Chris Develder, Klim Zaporojets, Aneiss Ghodsi, Simon Ellershaw, Jack Collins, and Christopher Potts. 2023. BioDEX: Large-scale biomedical adverse drug event extraction for real-world pharmacovigilance. InFindings of the association for computational linguistics: EMNLP 2023. 13425–13454

  7. [7]

    David Eriksson and Martin Jankowiak. 2021. High-dimensional Bayesian opti- mization with sparse axis-aligned subspaces. InUncertainty in Artificial Intelli- gence. PMLR, 493–503

  8. [8]

    David Eriksson, Michael Pearce, Jacob Gardner, Ryan D Turner, and Matthias Poloczek. 2019. Scalable global optimization via local Bayesian optimization. Advances in neural information processing systems32 (2019)

  9. [9]

    Google. 2025. BigQuery ML. https://cloud.google.com/bigquery-ml/docs. [On- line; accessed 03-November-2025]

  10. [10]

    Sirui Hong, Yizhang Lin, Bang Liu, Bangbang Liu, Binhao Wu, Ceyao Zhang, Danyang Li, Jiaqi Chen, Jiayi Zhang, Jinlin Wang, et al. 2025. Data interpreter: An llm agent for data science. InFindings of the Association for Computational Linguistics: ACL 2025. 19796–19821

  11. [11]

    Shengran Hu, Cong Lu, and Jeff Clune. [n. d.]. Automated Design of Agentic Systems. InThe Thirteenth International Conference on Learning Representations

  12. [12]

    Fabian Hueske, Mathias Peters, Matthias J Sax, Astrid Rheinländer, Rico Bergmann, Aljoscha Krettek, and Kostas Tzoumas. 2012. Opening the Black Boxes in Data Flow Optimization.Proceedings of the VLDB Endowment5, 11 (2012)

  13. [13]

    Ilyas, Rahul Shah, Walid G

    Ihab F. Ilyas, Rahul Shah, Walid G. Aref, Jeffrey Scott Vitter, and Ahmed K. Elmagarmid. 2004. Rank-aware Query Optimization. InProceedings of the 2004 ACM SIGMOD International Conference on Management of Data (SIGMOD ’04). ACM, New York, NY, USA. doi:10.1145/1007568.1007593

  14. [14]

    Saehan Jo and Immanuel Trummer. 2024. Thalamusdb: Approximate query processing on multi-modal data.Proceedings of the ACM on Management of Data 2, 3 (2024), 1–26

  15. [15]

    Gaurav Tarlok Kakkar, Jiashen Cao, Pramod Chunduri, Zhuangdi Xu, Sury- atej Reddy Vyalla, Prashanth Dintyala, Anirudh Prabakaran, Jaeho Bang, Aubhro Sengupta, Kaushik Ravichandran, et al. 2023. Eva: An end-to-end exploratory video analytics system. InProceedings of the Seventh Workshop on Data Manage- ment for End-to-End Machine Learning. 1–5

  16. [16]

    Gaurav Tarlok Kakkar, Jiashen Cao, Aubhro Sengupta, Joy Arulraj, and Hyesoon Kim. 2025. Aero: Adaptive Query Processing of ML Queries.Proceedings of the ACM on Management of Data3, 3 (2025), 1–27

  17. [17]

    Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, and Ashish Sabharwal. [n. d.]. Decomposed Prompting: A Modular Ap- proach for Solving Complex Tasks. InThe Eleventh International Conference on Learning Representations

  18. [18]

    Jiale Lao, Andreas Zimmerer, Olga Ovcharenko, Tianji Cong, Matthew Russo, Gerardo Vitagliano, Michael Cochez, Fatma Özcan, Gautam Gupta, Thibaud Hottelier, et al. 2025. SemBench: A Benchmark for Semantic Query Processing Engines.arXiv preprint arXiv:2511.01716(2025)

  19. [19]

    Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. InProceedings of the 58th annual meeting of the association for computational linguistics. 7871–7880

  20. [20]

    Ilyas, and Sumin Song

    Chengkai Li, Kevin Chen-Chuan Chang, Ihab F. Ilyas, and Sumin Song. 2005. RankSQL: Query Algebra and Optimization for Relational Top- 𝑘 Queries. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data (SIGMOD ’05). ACM, New York, NY, USA, 131–142. doi:10.1145/1066157. 1066173

  21. [21]

    Chunwei Liu, Matthew Russo, Michael Cafarella, Lei Cao, Peter Baile Chen, Zui Chen, Michael Franklin, Tim Kraska, Samuel Madden, Rana Shahout, et al. 2025. Palimpzest: Optimizing ai-powered analytics with declarative query processing. InProceedings of the Conference on Innovative Database Research (CIDR). 2

  22. [22]

    Jaehyun Nam, Jinsung Yoon, Jiefeng Chen, Jinwoo Shin, Sercan Ö Arık, and Tomas Pfister. 2025. MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement.arXiv preprint arXiv:2506.15692(2025)

  23. [23]

    Liana Patel, Siddharth Jha, Melissa Pan, Harshit Gupta, Parth Asawa, Carlos Guestrin, and Matei Zaharia. 2025. Semantic Operators and Their Optimization: Enabling LLM-Based Data Processing with Accuracy Guarantees in LOTUS. Proceedings of the VLDB Endowment18, 11 (2025), 4171–4184

  24. [24]

    Archiki Prasad, Alexander Koller, Mareike Hartmann, Peter Clark, Ashish Sabhar- wal, Mohit Bansal, and Tushar Khot. 2024. Adapt: As-needed decomposition and planning with language models. InFindings of the Association for Computational Linguistics: NAACL 2024. 4226–4252

  25. [25]

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. InInternational conference on machine learning. PmLR, 8748–8763

  26. [26]

    Matthew Russo, Sivaprasad Sudhir, Gerardo Vitagliano, Chunwei Liu, Tim Kraska, Samuel Madden, and Michael Cafarella. 2025. Abacus: A Cost-Based Optimizer for Semantic Operator Systems.arXiv preprint arXiv:2505.14661(2025)

  27. [27]

    Jon Saad-Falcon, Adrian Gamarra Lafuente, Shlok Natarajan, Nahum Maru, Hristo Todorov, Etash Kumar Guha, E Kelly Buchanan, Mayee F Chen, Neel Guha, Christopher Re, et al. [n. d.]. An Architecture Search Framework for Inference- Time Techniques. InForty-second International Conference on Machine Learning

  28. [28]

    SemBench Maintainers. 2025. Medical scenario removed due to licensing issues. https://github.com/SemBench/SemBench/issues/16. GitHub Issue #16

  29. [29]

    Parameswaran, and Eugene Wu

    Shreya Shankar, Tristan Chambers, Tarak Shah, Aditya G. Parameswaran, and Eugene Wu. 2025. DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing.Proc. VLDB Endow.18, 9 (May 2025), 3035–3048. doi:10. 14778/3746405.3746426

  30. [30]

    Matthias Urban and Carsten Binnig. 2024. CAESURA: language models as multi- modal query planners.CIDR(2024)

  31. [31]

    Jiayi Wang and Jianhua Feng. 2025. Unify: An unstructured data analytics system. In2025 IEEE 41st International Conference on Data Engineering (ICDE). IEEE Computer Society, 4662–4674

  32. [32]

    Junlin Wang, WANG Jue, Ben Athiwaratkun, Ce Zhang, and James Zou. [n. d.]. Mixture-of-Agents Enhances Large Language Model Capabilities. InThe Thir- teenth International Conference on Learning Representations

  33. [33]

    Jiayi Wang and Guoliang Li. 2025. Aop: Automated and interactive llm pipeline orchestration for answering complex queries. CIDR

  34. [34]

    Yongdong Wang, Runze Xiao, Jun Younes Louhi Kasahara, Ryosuke Yajima, Keiji Nagatani, Atsushi Yamashita, and Hajime Asama. 2024. Dart-llm: Dependency- aware multi-robot task decomposition and execution using large language models. arXiv preprint arXiv:2411.09022(2024)

  35. [35]

    Yanbo Wang, Zixiang Xu, Yue Huang, Xiangqi Wang, Zirui Song, Lang Gao, Chenxi Wang, Xiangru Tang, Yue Zhao, Arman Cohan, et al . 2025. DyFlow: Dynamic Workflow Framework for Agentic Reasoning. InAdvances in Neural Information Processing Systems

  36. [36]

    Johannes Wehrstein, Tiemo Bang, Roman Heinrich, and Carsten Binnig. 2025. GRACEFUL: A Learned Cost Estimator for UDFs. In2025 IEEE 41st International Conference on Data Engineering (ICDE). IEEE Computer Society, 2450–2463

  37. [37]

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. React: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR)

  38. [38]

    Enhao Zhang, Nicole Sullivan, Brandon Haynes, Ranjay Krishna, and Magdalena Balazinska. 2025. Self-Enhancing Video Data Management System for Com- positional Events with Large Language Models.Proceedings of the ACM on Management of Data3, 3 (2025), 1–29

  39. [39]

    #$%&”()"*+

    Jiayi Zhang, Jinyu Xiang, Zhaoyang Yu, Fengwei Teng, Xiong-Hui Chen, Jiaqi Chen, Mingchen Zhuge, Xin Cheng, Sirui Hong, Jinlin Wang, et al. [n. d.]. AFlow: Automating Agentic Workflow Generation. InThe Thirteenth International Con- ference on Learning Representations. CADENZA: Compiling Natural-Language Intent into Task-Specific Operator DAGs for Semantic...