pith. machine review for the scientific record. sign in

arxiv: 2604.28079 · v1 · submitted 2026-04-30 · 💻 cs.DB

Recognition: unknown

Tailwind: A Practical Framework for Query Accelerators

Geoffrey X. Yu , Ryan Marcus , Tim Kraska

Authors on Pith no claims yet

Pith reviewed 2026-05-07 05:37 UTC · model grok-4.3

classification 💻 cs.DB
keywords query acceleratorsdatabase optimizationabstract logical plansneural network query planningquery rewritingTPC-H benchmarkRedshiftDuckDB
0
0 comments X

The pith

Tailwind lets any relational database use custom accelerators without modifying its engine or optimizer.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to bring workload-specific speedups into standard RDBMSes by placing an external planner on top that rewrites queries at runtime. Users express accelerators as abstract logical plans, which are mostly declarative descriptions of relational operators expressed with regular tree patterns. Tailwind automatically trains a neural network on each accelerator to predict whether it will help a given query, then applies the accelerator only when the model says the gain outweighs the cost. The system falls back to the original database engine otherwise. On TPC-H queries run against both Redshift and DuckDB, the approach yields 1.38 times average speedup using a library of four accelerators, with some queries improving by as much as 29 times.

Core claim

Tailwind is an external query planner that sits atop any RDBMS supporting data import and export. Users define accelerators using abstract logical plans (ALPs), a new mostly-declarative abstraction over relational operators built on regular tree expressions. These ALPs let Tailwind automatically construct customized neural network models that estimate when a particular accelerator is beneficial. At runtime the planner transparently rewrites queries to execute across one or more accelerators when the models predict a net gain, otherwise falling back to the underlying RDBMS.

What carries the argument

Abstract logical plans (ALPs): mostly-declarative tree expressions over relational operators that enable automated training of per-accelerator neural network models for benefit prediction and transparent query rewriting.

If this is right

  • Accelerators can be added, tested, and removed without any changes to the database engine or its optimizer.
  • The same accelerator library works across different RDBMSes such as Redshift and DuckDB as long as they support data import and export.
  • Execution can mix accelerated and non-accelerated paths within a single query when multiple accelerators are available.
  • Engineering cost for specialized optimizations drops because the work of integration and correctness is handled outside the core system.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Database administrators could maintain private accelerator libraries tuned to their own data patterns without waiting for vendor support.
  • If ALPs prove expressive enough, the approach could later support accelerators for more complex operations such as multi-way joins or window functions.
  • Cloud environments where data movement between storage and compute is already cheap would be natural places to deploy this style of transparent acceleration.

Load-bearing premise

Users can write effective accelerators as abstract logical plans and the resulting neural network models can predict when an accelerator helps without adding unacceptable planning cost or prediction errors.

What would settle it

A workload of queries in which the neural models mispredict accelerator value often enough, or planning overhead exceeds gains often enough, that overall query times become slower than the unmodified RDBMS baseline.

Figures

Figures reproduced from arXiv: 2604.28079 by Geoffrey X. Yu, Ryan Marcus, Tim Kraska.

Figure 1
Figure 1. Figure 1: We can accelerate TPC-H Q13 by running the view at source ↗
Figure 2
Figure 2. Figure 2: An overview of the Tailwind system and its workflow. (Section 3.1), and context about the underlying RDBMS’ semantics (e.g., if NULLs should be sorted first or last). The run() function runs an accelerator instance. Tailwind calls it with the ALP in￾stance that matched the query, any state that build() returned, and the context just mentioned. Accelerators at the bottom of a query plan have no inputs and o… view at source ↗
Figure 3
Figure 3. Figure 3: The domain negation ALP template (simplified). view at source ↗
Figure 4
Figure 4. Figure 4: The domain negation accelerator’s featurization. view at source ↗
Figure 5
Figure 5. Figure 5: Tailwind accelerates TPC-H (SF = 100) on both Redshift and DuckDB by 1.32× and 1.38× respectively (geomean). 6 7 11 8 9 10 4 3 16 2 17 13 18 20 19 14 12 1 5 15 1/64× 1/16× 1/4× 1× 4× 16× Speedup × × × × × × × Naive Ours Geomean 1 4 × 1 2 × 1× 2× D (a) Redshift (10%) (X indicates timeout) 6 7 10 9 4 5 8 3 2 14 15 16 13 12 19 18 17 20 11 1 1/16× 1/4× 1× 4× 16× Speedup ~123× ~101× Naive Ours Geomean 1 2 × 1× … view at source ↗
Figure 6
Figure 6. Figure 6: Tailwind accelerates our TPC-H+ workload on Redshift and DuckDB by 1.37× and 1.76× respectively (geomean). queries by 1.37× (1.49×) on Redshift and 1.76× (1.78×) on DuckDB (geomean). The speedup on TPC-H comes from the CDF (Q1, Q4), index (Q18, Q19), and domain negation (Q13) accelerators. The known group by accelerator also applies to Q1, but Tailwind selects the CDF as it provides a larger speedup. On TP… view at source ↗
Figure 7
Figure 7. Figure 7: The ALP neural network’s prediction accuracy compared to simpler modeling baselines (lower is better). view at source ↗
Figure 10
Figure 10. Figure 10: Robustness to query run time prediction error. view at source ↗
Figure 9
Figure 9. Figure 9: Workload speedup for varied space budgets. view at source ↗
Figure 11
Figure 11. Figure 11: Impact of simulated bulk writes (e.g., ETLs) on view at source ↗
read the original abstract

Relational database management systems (RDBMSes) can process general-purpose queries, but often have lower performance compared to custom-built solutions for specific queries. For example, consider a group-by query over a few known groups (e.g., grouping by country). While an RDBMS would likely use a hash map to do the grouping, a faster method could hard-code the expected groups into the query executor. But such workload-specific techniques, which we call query accelerators, are not widely used in practice because the engineering effort (optimizer and engine changes, potential bugs) does not justify the isolated performance gains (speedup on a single specific query). We propose Tailwind: an external query planner that brings accelerators into any RDBMS that supports data import/export. Users define their accelerators using abstract logical plans (ALPs): a new mostly-declarative abstraction over relational operators built on regular tree expressions. ALPs allow Tailwind to automatically build customized neural network models to estimate when using a particular accelerator is beneficial. At runtime, Tailwind sits atop an RDBMS and transparently rewrites queries to run across one or more accelerators when predicted to be beneficial, falling back to the underlying RDBMS when not. On Redshift and DuckDB with a library of four diverse accelerators, Tailwind accelerates TPC-H queries by 1.38x on average (up to 29x).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper presents Tailwind, an external query planner that integrates workload-specific query accelerators into general-purpose RDBMSes (e.g., Redshift, DuckDB) supporting data import/export. Users define accelerators via Abstract Logical Plans (ALPs), a declarative abstraction over relational operators using regular tree expressions. Tailwind automatically generates neural-network models to predict accelerator benefit at runtime, transparently rewrites queries to use one or more accelerators when beneficial, and falls back to the RDBMS otherwise. With a library of four diverse accelerators, it reports 1.38x average (up to 29x) speedup on TPC-H queries.

Significance. If the empirical claims hold under rigorous controls, Tailwind would meaningfully reduce the engineering cost of deploying specialized accelerators by automating both the definition interface and the runtime decision logic. The ALP abstraction and automated NN construction are practical contributions that could influence optimizer design in production systems. The evaluation on real engines and a standard benchmark is a positive aspect, though the absence of detailed methodology, predictor metrics, and overhead measurements in the abstract limits assessment of whether the net speedups are robust.

major comments (3)
  1. [Abstract] Abstract: the central empirical claim of 1.38x average TPC-H speedup (and 29x peak) is presented without any description of experimental methodology, baselines, number of runs, variance, or controls for confounds. This information is load-bearing for verifying the reported gains and must be supplied with concrete numbers and tables.
  2. [Evaluation] Evaluation section: no numbers are given for the neural-network predictors' precision, recall, training-set size, false-positive rate, or isolated inference latency. Without these, it is impossible to confirm that mispredictions or predictor overhead do not erode or reverse the claimed speedups, directly addressing the stress-test concern.
  3. [§4] §4 (ALP and model construction): the claim that ALPs allow automatic, effective NN model building for arbitrary accelerators lacks concrete examples of ALP definitions, the resulting model architectures, or validation that the models generalize beyond the training workload. This is central to the 'practical framework' contribution.
minor comments (3)
  1. [§3] Clarify the exact interface and overhead of the transparent rewrite layer when multiple accelerators are available; a small diagram or pseudocode would help.
  2. [Abstract] The abstract states 'four diverse accelerators' but does not list them or their target query patterns; add this enumeration early in the paper.
  3. [Evaluation] Ensure all figures in the evaluation section include error bars or explicit variance reporting to support the average speedup numbers.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which identifies opportunities to strengthen the verifiability and concreteness of our presentation. We respond to each major comment below and will revise the manuscript accordingly to address the concerns while preserving the core contributions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central empirical claim of 1.38x average TPC-H speedup (and 29x peak) is presented without any description of experimental methodology, baselines, number of runs, variance, or controls for confounds. This information is load-bearing for verifying the reported gains and must be supplied with concrete numbers and tables.

    Authors: We agree that the abstract would be improved by including a concise reference to the experimental methodology to support the reported speedups. The body of the manuscript (Section 5) already details the TPC-H benchmark, Redshift and DuckDB baselines, multiple runs per query to compute averages and variance, and controls such as isolating accelerator execution from data import/export overhead. We will revise the abstract to briefly note the setup (TPC-H on two engines, native RDBMS baselines, averaged results) and will ensure the evaluation section expands any tables with explicit variance and confound controls. revision: yes

  2. Referee: [Evaluation] Evaluation section: no numbers are given for the neural-network predictors' precision, recall, training-set size, false-positive rate, or isolated inference latency. Without these, it is impossible to confirm that mispredictions or predictor overhead do not erode or reverse the claimed speedups, directly addressing the stress-test concern.

    Authors: We acknowledge that explicit predictor metrics would directly address potential concerns about misprediction impact and overhead. While the manuscript describes the NN construction process and reports end-to-end gains, it does not isolate these metrics. In revision we will add a dedicated subsection to the Evaluation section reporting training-set sizes (generated from TPC-H query templates), precision/recall/false-positive rates on held-out queries, and measured inference latency (sub-millisecond and negligible relative to query execution). This will confirm that predictor overhead does not reverse the net speedups. revision: yes

  3. Referee: [§4] §4 (ALP and model construction): the claim that ALPs allow automatic, effective NN model building for arbitrary accelerators lacks concrete examples of ALP definitions, the resulting model architectures, or validation that the models generalize beyond the training workload. This is central to the 'practical framework' contribution.

    Authors: We agree that concrete examples would better substantiate the practicality of ALPs for automated NN construction. The current §4 presents the general ALP syntax and model-building pipeline but omits specific instances. We will revise §4 to include the exact ALP definitions (as regular tree expressions) for each of the four accelerators, the derived NN architectures (feature vectors from ALP nodes and layer configurations), and additional generalization results (e.g., accuracy on queries withheld from training and across scale factors). These additions will illustrate how ALPs enable effective, workload-agnostic model building. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation or claims

full rationale

The paper is a system description of Tailwind, an external query planner using ALPs for accelerator definition and automatically trained neural networks for runtime benefit prediction. The primary result (1.38x average TPC-H speedup on Redshift/DuckDB) is an empirical measurement from benchmark runs, not a first-principles derivation or equation that reduces to its inputs by construction. No self-definitional steps, fitted parameters renamed as predictions, or load-bearing self-citations appear in the load-bearing claims. The NN training and prediction mechanism is a standard supervised learning setup whose accuracy is evaluated externally via the reported speedups rather than being tautological. This is a normal non-circular empirical systems paper.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The central claim rests on the new ALP abstraction being expressive enough for real accelerators and on the trained neural models being accurate predictors. No machine-checked proofs or parameter-free derivations are described.

free parameters (1)
  • Neural network weights and hyperparameters
    Models are trained on data to estimate accelerator benefit; their parameters are fitted rather than derived.
axioms (1)
  • domain assumption ALPs can capture performance-relevant structure of accelerators
    The framework assumes users can express useful accelerators in the ALP language so that the planner can reason about them.
invented entities (2)
  • Abstract Logical Plan (ALP) no independent evidence
    purpose: Mostly-declarative representation of accelerators using regular tree expressions over relational operators.
    New abstraction introduced by the paper; no independent evidence outside the system itself.
  • Neural network models for accelerator benefit estimation no independent evidence
    purpose: Predict whether an accelerator will outperform the baseline RDBMS for a given query.
    Trained models whose accuracy is central to the runtime decision; no external validation mentioned.

pith-pipeline@v0.9.0 · 5542 in / 1402 out tokens · 57969 ms · 2026-05-07T05:37:51.578292+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

92 extracted references · 47 canonical work pages

  1. [1]

    Narasayya

    Sanjay Agrawal, Surajit Chaudhuri, and Vivek R. Narasayya. 2000. Automated Selection of Materialized Views and Indexes in SQL Databases. In Proceedings of the 26th International Conference on Very Large Data Bases (VLDB ’00) . San Francisco, CA, USA, 496–505

  2. [2]

    Alexander Aiken and Brian R. Murphy. 1991. Implementing Regular Tree Expres- sions. In Proceedings of the 5th ACM Conference on Functional Programming Lan- guages and Computer Architecture. Springer-Verlag, Berlin, Heidelberg, 427–447

  3. [3]

    Amazon Web Services. 2025. Amazon EC2. https://aws.amazon.com/ec2/. Retrieved November 2025

  4. [4]

    Amazon Web Services. 2025. Amazon Redshift. https://aws.amazon.com/ redshift/. Retrieved November 2025

  5. [5]

    Amazon Web Services. 2025. Amazon S3. https://aws.amazon.com/s3/. Retrieved November 2025

  6. [6]

    Amazon Web Services. 2026. SQL Commands - Amazon Redshift. https://docs. aws.amazon.com/redshift/latest/dg/c_SQL_commands.html

  7. [7]

    Apache Software Foundation. 2026. Apache Arrow. https://arrow.apache.org

  8. [8]

    Mowry, Jignesh M

    Samuel Arch, Yuchen Liu, Todd C. Mowry, Jignesh M. Patel, and Andrew Pavlo

  9. [9]

    Proceedings of the VLDB Endowment 18, 1 (Sept

    The Key to Effective UDF Optimization: Before Inlining, First Perform Outlining. Proceedings of the VLDB Endowment 18, 1 (Sept. 2024), 1–13. doi:10. 14778/3696435.3696436

  10. [10]

    Mior, and Daniel Lemire

    Edmon Begoli, Jesús Camacho-Rodríguez, Julian Hyde, Michael J. Mior, and Daniel Lemire. 2018. Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources. In Proceedings of the 2018 International Conference on Management of Data (Houston, TX, USA) (SIGMOD ’18). Association for Computing Machinery, New York, NY, ...

  11. [11]

    Ahlem Belabbaci, Hadda Cherroun, Loek Cleophas, and Djelloul Ziadi. 2018. Tree Pattern Matching From Regular Tree Expressions. Kybernetika 54, 2 (2018), 221–242

  12. [12]

    Graham Bent, Patrick Dantressangle, David Vyvyan, Abbe Mowshowitz, and Valia Mitsou. 2008. A Dynamic Distributed Federated Database. In Proceedings of the 2nd Annual Conference on International Technology Alliance (ACITA ’08)

  13. [13]

    Peter Boncz, Thomas Neumann, and Orri Erling. 2013. TPC-H analyzed: Hidden messages and lessons learned from an influential benchmark. In Technology Conference on Performance Evaluation and Benchmarking . Springer, 61–76

  14. [14]

    Yuri Breitbart, Hector Garcia-Molina, and Abraham Silberschatz. 1992. Overview of Multidatabase Transaction Management. VLDB Journal 1 (10 1992), 181–239. doi:10.1145/1925805.1925811

  15. [15]

    Yuri Breitbart and Avi Silberschatz. 1988. Multidatabase Update Issues. In Pro- ceedings of the 1988 ACM SIGMOD International Conference on Management of Data (SIGMOD ’88). 135–142. doi:10.1145/50202.50217

  16. [16]

    Mihai Budiu, Tej Chajed, Frank McSherry, Leonid Ryzhyk, and Val Tannen. 2023. DBSP: Automatic Incremental View Maintenance for Rich Query Languages. Proc. VLDB Endow. 16, 7 (March 2023), 1601–1614. doi:10.14778/3587136.3587137

  17. [17]

    Chaudhuri, Mayur Datar, and Vivek Narasayya

    Surajit. Chaudhuri, Mayur Datar, and Vivek Narasayya. 2004. Index Selection for Databases: A Hardness Study and A Principled Heuristic Solution. IEEE Transactions on Knowledge and Data Engineering 16, 11 (2004), 1313–1323. doi:10. 1109/TKDE.2004.75

  18. [18]

    Surajit Chaudhuri and Umeshwar Dayal. 1997. An overview of data warehousing and OLAP technology. SIGMOD Rec. 26, 1 (March 1997), 65–74. doi:10.1145/ 248603.248616

  19. [19]

    Surajit Chaudhuri and Vivek Narasayya. 1998. AutoAdmin “what-if” index analysis utility. SIGMOD Rec. 27, 2 (June 1998), 367–378. doi:10.1145/276305. 276337

  20. [20]

    Douglas Comer. 1978. The Difficulty of Optimum Index Selection. ACM Trans. Database Syst. 3, 4 (Dec. 1978), 440–445. doi:10.1145/320289.320296

  21. [21]

    Hubert Comon, Max Dauchet, Rémi Gilleron, Florent Jacquemard, Denis Lugiez, Christof Löding, Sophie Tison, and Marc Tommasi. 2008. Tree Automata Tech- niques and Applications

  22. [22]

    Narasayya

    Bailu Ding, Sudipto Das, Ryan Marcus, Wentao Wu, Surajit Chaudhuri, and Vivek R. Narasayya. 2019. AI Meets AI: Leveraging Query Executions to Improve Index Recommendations. In 38th ACM Special Interest Group in Data Management (SIGMOD ’19). doi:10.1145/3299869.3324957

  23. [23]

    Jialin Ding, Ryan Marcus, Andreas Kipf, Vikram Nathan, Aniruddha Nrusimha, Kapil Vaidya, Alexander van Renen, and Tim Kraska. 2022. SageDB: An Instance- Optimized Data Analytics System. Proc. VLDB Endow. 15, 13 (Sept. 2022), 4062–4078. doi:10.14778/3565838.3565857

  24. [24]

    John Doner. 1970. Tree acceptors and some of their applications. Journal of Computer and System Sciences 4, 5 (1970), 406–451

  25. [25]

    DuckDB. 2025. DuckDB Extensions. https://duckdb.org/docs/extensions/ working_with_extensions.html

  26. [26]

    DuckDB Labs. 2025. DuckDB: An in-process SQL OLAP database management system. https://duckdb.org/. Retrieved November 2025

  27. [27]

    Jennie Duggan, Ugur Cetintemel, Olga Papaemmanouil, and Eli Upfal. 2011. Performance Prediction for Concurrent Database Workloads. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD ’11). ACM, Athens, Greece, 337–348. doi:10.1145/1989323.1989359

  28. [28]

    John Feser, Sam Madden, Nan Tang, and Armando Solar-Lezama. 2020. Deductive Optimization of Relational Data Storage. Proc. ACM Program. Lang. 4, OOPSLA, Article 170 (Nov. 2020), 30 pages. doi:10.1145/3428238 13 Geoffrey X. Yu, Ryan Marcus, and Tim Kraska

  29. [29]

    You extend SQL

    Yannis Foufoulas, Alkis Simitsis, Lefteris Stamatogiannakis, and Yannis Ioan- nidis. 2022. YeSQL: "You extend SQL" with Rich and Highly Performant User- Defined Functions in Relational Databases. Proc. VLDB Endow. 15, 10 (June 2022), 2270–2283. doi:10.14778/3547305.3547328

  30. [30]

    Haralampos Gavriilidis, Kaustubh Beedkar, Jorge-Arnulfo Quiané-Ruiz, and Volker Markl. 2023. In-Situ Cross-Database Query Processing. In 2023 IEEE 39th International Conference on Data Engineering (ICDE ’23). IEEE, 2794–2807. doi:10.1109/ICDE55515.2023.00214

  31. [31]

    Dimitrios Georgakopoulos, Marek Rusinkiewicz, and Amit P. Sheth. 1991. On Serializability of Multidatabase Transactions Through Forced Local Conflicts. In Proceedings of the Seventh International Conference on Data Engineering (ICDE ’91). 314–323

  32. [32]

    Jonathan Goldstein and Per-Åke Larson. 2001. Optimizing Queries Using Ma- terialized Views: A Practical, Scalable Solution. SIGMOD Rec. 30, 2 (May 2001), 331–342. doi:10.1145/376284.375706

  33. [33]

    Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org

  34. [34]

    Google. 2026. Data definition language (DDL) statements in Goog- leSQL. https://docs.cloud.google.com/bigquery/docs/reference/standard-sql/ data-definition-language

  35. [35]

    Goetz Graefe. 1995. The Cascades Framework for Query Optimization. IEEE Data Eng. Bull. 18, 3 (1995), 19–29

  36. [36]

    Tim Gubner and Peter Boncz. 2022. Excalibur: A Virtual Machine for Adaptive Fine-grained JIT-Compiled Query Execution based on VOILA.Proc. VLDB Endow. 16, 4 (Dec. 2022), 829–841. doi:10.14778/3574245.3574266

  37. [37]

    Alon Y Halevy. 2001. Answering Queries Using Views: A Survey. The VLDB Journal 10 (2001), 270–294

  38. [38]

    Benjamin Hilprecht and Carsten Binnig. 2022. Zero-Shot Cost Models for Out- of-the-box Learned Cost Prediction. Proceedings of the VLDB Endowment 15, 11 (2022), 2361–2374. https://www.vldb.org/pvldb/vol15/p2361-hilprecht.pdf

  39. [39]

    Tin Kam Ho. 1995. Random Decision Forests. In Proceedings of 3rd International Conference on Document Analysis and Recognition , Vol. 1. IEEE, 278–282

  40. [40]

    Hwang, E.-P

    S.-Y. Hwang, E.-P. Lim, H.-R. Yang, S. Musukula, K. Mediratta, M. Ganesh, D. Clements, J. Stenoien, and J. Srivastava. 1994. The MYRIAD Federated Database Prototype. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data (SIGMOD ’94). doi:10.1145/191839.191986

  41. [41]

    Vanja Josifovski, Peter Schwarz, Laura Haas, and Eileen Lin. 2002. Garlic: A New Flavor of Federated Query Processing for DB2. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD ’02) . 524–532

  42. [42]

    Timo Kersten, Viktor Leis, Alfons Kemper, Thomas Neumann, Andrew Pavlo, and Peter Boncz. 2018. Everything you always wanted to know about compiled and vectorized queries but were afraid to ask. Proc. VLDB Endow. 11, 13 (Sept. 2018), 2209–2222. doi:10.14778/3275366.3284966

  43. [43]

    Andersen, and Andrew Pavlo

    Abigale Kim, Marco Slot, David G. Andersen, and Andrew Pavlo. 2025. Anarchy in the Database: A Survey and Evaluation of Database Management System Extensibility. Proc. VLDB Endow. 18, 6 (Feb. 2025), 1962–1976. doi:10.14778/ 3725688.3725719

  44. [44]

    Jiale Lao and Immanuel Trummer. 2026. GenDB: The Next Generation of Query Processing – Synthesized, Not Engineered. arXiv:2603.02081 [cs.DB] https: //arxiv.org/abs/2603.02081

  45. [45]

    Jyoti Leeka and Kaushik Rajan. 2019. Incorporating Super-Operators in Big- Data Query Optimizers. Proc. VLDB Endow. 13, 3 (2019), 348–361. doi:10.14778/ 3368289.3368299

  46. [46]

    Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Al- izadeh, and Tim Kraska. 2022. Bao: Making Learned Query Optimization Practi- cal. In Proceedings of the International Conference on Management of Data (SIG- MOD ’22)

  47. [47]

    Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo: A Learned Query Optimizer. Proceedings of the VLDB Endowment 12, 11 (2019)

  48. [48]

    Ryan Marcus and Olga Papaemmanouil. 2019. Plan-Structured Deep Neural Network Models for Query Performance Prediction. Proceedings of the VLDB Endowment 12, 11 (2019), 1733–1746. doi:10.14778/3342263.3342646

  49. [49]

    Mowry, and Andrew Pavlo

    Prashanth Menon, Amadou Ngom, Lin Ma, Todd C. Mowry, and Andrew Pavlo

  50. [50]

    Permutable Compiled Queries: Dynamically Adapting Compiled Queries without Recompiling. Proc. VLDB Endow. 14, 2 (2020), 101–113. doi:10.14778/ 3425879.3425882

  51. [51]

    Mert Akdere and Ugur Cetintemel. 2012. Learning-based query performance modeling and prediction. In 2012 IEEE 28th International Conference on Data Engineering (ICDE ’12). IEEE, 390–401

  52. [52]

    Microsoft Corporation. 2026. Microsoft Fabric Documentation. https://learn. microsoft.com/en-us/fabric/

  53. [53]

    Sudarshan, and Krithi Ramamritham

    Hoshi Mistry, Prasan Roy, S. Sudarshan, and Krithi Ramamritham. 2001. Ma- terialized view selection and maintenance using multi-query optimization. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data (Santa Barbara, California, USA) (SIGMOD ’01). Association for Computing Machinery, New York, NY, USA, 307–318. doi:10.1145/...

  54. [54]

    Guido Moerkotte, Thomas Neumann, and Gabriele Steidl. 2009. Preventing bad plans by bounding the impact of cardinality estimation errors.Proc. VLDB Endow. 2, 1 (Aug. 2009), 982–993. doi:10.14778/1687627.1687738

  55. [55]

    Charles Gregory Nelson. 1980. Techniques for Program Verification. Ph. D. Dis- sertation. Stanford University

  56. [56]

    Thomas Neumann. 2011. Efficiently Compiling Efficient Query Plans for Modern Hardware. Proceedings of the VLDB Endowment 4, 9 (2011), 539–550

  57. [57]

    Robert Nieuwenhuis and Albert Oliveras. 2005. Proof-Producing Congruence Closure. In Term Rewriting and Applications, Jürgen Giesl (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 453–468

  58. [58]

    Gregory Piatetsky-Shapiro. 1983. The optimal selection of secondary indices is NP-complete. SIGMOD Rec. 13, 2 (Jan. 1983), 72–75. doi:10.1145/984523.984530

  59. [59]

    PostgreSQL Global Development Group. 2025. PostgreSQL Extensions. https: //www.postgresql.org/docs/current/external-extensions.html

  60. [60]

    Calton Pu. 1988. Superdatabases for Composition of Heterogeneous Databases. In Proceedings of the Fourth International Conference on Data Engineering (ICDE ’88) . 548–555

  61. [61]

    Maximilian Rieger and Thomas Neumann. 2025. T3: Accurate and Fast Perfor- mance Prediction for Relational Database Systems With Compiled Decision Trees. Proc. ACM Manag. Data3, 3, Article 227 (June 2025), 27 pages. doi:10.1145/3725364

  62. [62]

    Bogdan Răducanu, Peter Boncz, and Marcin Zukowski. 2013. Micro Adaptivity in Vectorwise. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD ’13). Association for Computing Machinery, New York, NY, USA, 1231–1242. doi:10.1145/2463676.2465292

  63. [63]

    Tobias Schmidt, Andreas Kipf, Dominik Horn, Gaurav Saxena, and Tim Kraska

  64. [64]

    In Companion of the 2024 International Conference on Management of Data (SIGMOD ’24)

    Predicate Caching: Query-Driven Secondary Indexing for Cloud Data Warehouses. In Companion of the 2024 International Conference on Management of Data (SIGMOD ’24). Association for Computing Machinery, New York, NY, USA, 347–359. doi:10.1145/3626246.3653395

  65. [65]

    Amazon Web Services. 2026. User-defined functions in Amazon Redshift. https: //docs.aws.amazon.com/redshift/latest/dg/user-defined-functions.html

  66. [66]

    Amit P Sheth and James A Larson. 1990. Federated Database Systems for Manag- ing Distributed, Heterogeneous, and Autonomous Databases. ACM Computing Surveys (CSUR) 22, 3 (1990), 183–236

  67. [67]

    Naughton

    Amit Shukla, Prasad Deshpande, and Jeffrey F. Naughton. 1998. Materialized View Selection for Multidimensional Datasets. In Proceedings of the 24rd International Conference on Very Large Data Bases (VLDB ’98). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 488–499

  68. [69]

    Moritz Sichert and Thomas Neumann. 2022. User-defined operators: efficiently integrating custom algorithms into modern databases. Proc. VLDB Endow. 15, 5 (Jan. 2022), 1119–1131. doi:10.14778/3510397.3510408

  69. [70]

    Michael Sipser. 2021. Introduction to the Theory of Computation . Cengage

  70. [71]

    Leonhard Spiegelberg, Rahul Yesantharao, Malte Schwarzkopf, and Tim Kraska

  71. [72]

    In Proceedings of the 2021 International Conference on Management of Data (Virtual Event, China) (SIG- MOD ’21)

    Tuplex: Data Science in Python at Native Code Speed. In Proceedings of the 2021 International Conference on Management of Data (Virtual Event, China) (SIG- MOD ’21). Association for Computing Machinery, New York, NY, USA, 1718–1731. doi:10.1145/3448016.3457244

  72. [73]

    Mihail Stoian, Andreas Zimmerer, Skander Krid, Amadou Latyr Ngom, Jialin Ding, Tim Kraska, and Andreas Kipf. 2025. Parachute: Single-Pass Bi-Directional Information Passing. Proc. VLDB Endow. 18, 10 (June 2025), 3299–3311. doi:10. 14778/3748191.3748196

  73. [74]

    Sivaprasad Sudhir, Michael Cafarella, and Samuel Madden. 2021. Replicated layout for in-memory database systems. Proc. VLDB Endow. 15, 4 (Dec. 2021), 984–997. doi:10.14778/3503585.3503606

  74. [75]

    Ji Sun and Guoliang Li. 2019. An End-to-End Learning-based Cost Estimator. Proceedings of the VLDB Endowment 13, 3 (2019), 307–319. doi:10.14778/3368289. 3368296

  75. [76]

    Kanat Tangwongsan, Martin Hirzel, Scott Schneider, and Kun-Lung Wu. 2015. General Incremental Sliding-Window Aggregation. Proceedings of the VLDB Endowment 8, 7 (2015), 702–713

  76. [77]

    Ross Tate, Michael Stepp, Zachary Tatlock, and Sorin Lerner. 2009. Equality Saturation: A New Approach to Optimization. SIGPLAN Not. 44, 1 (Jan. 2009), 264–276. doi:10.1145/1594834.1480915

  77. [78]

    J. W. Thatcher and J. B. Wright. 1968. Generalized finite automata theory with an application to a decision problem of second-order logic. Mathematical systems theory 2, 1 (1968), 57–81. doi:10.1007/BF01691346

  78. [79]

    Transaction Processing Performance Council. 2025. TPC-H Benchmark. https: //www.tpc.org/TPC_Documents_Current_Versions/pdf/TPC-H_v3.0.1.pdf

  79. [80]

    Immanuel Trummer. 2023. Demonstrating GPT-DB: Generating Query-Specific and Customizable Code for SQL Processing with GPT-4. Proc. VLDB Endow. 16, 12 (Aug. 2023), 4098–4101. doi:10.14778/3611540.3611630

  80. [81]

    Alexander van Renen, Dominik Horn, Pascal Pfeil, Kapil Vaidya, Wenjian Dong, Murali Narayanaswamy, Zhengchun Liu, Gaurav Saxena, Andreas Kipf, and Tim Kraska. 2024. Why TPC is Not Enough: An Analysis of the Amazon Redshift Fleet. Proc. VLDB Endow. 17, 11 (July 2024), 3694–3706. doi:10.14778/3681954.3682031 14 Tailwind: A Practical Framework for Query Accelerators

Showing first 80 references.