Recognition: unknown
An Extensible and Verifiable Language for Query Rewrite Rules
Pith reviewed 2026-05-08 03:43 UTC · model grok-4.3
The pith
Rulescript is an engine-agnostic DSL for writing formally verified query rewrite rules that deploy to multiple database engines via lightweight adapters.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Rulescript enables developers to define query rewrite rules using a core set of relational algebra operators for pattern matching and plan transformation, with all rules formally verified for equivalence. Custom operators can be added for engine-specific features, and integration with any target system requires only implementing a mapping adapter. The language was used to reimplement 33 rules from Apache Calcite and successfully deployed to two different engines.
What carries the argument
Rulescript, a domain-specific language with a relational algebra core that decomposes rewrite rules into explicit matching and transformation phases, supported by automatic formal verification and extensible custom operators.
If this is right
- Rewrite rules written in Rulescript can be deployed to new database engines without rewriting the logic.
- Automatic formal verification ensures that each rule produces semantically equivalent query plans.
- Custom operators allow capturing semantics unique to specific engines while staying within the verified framework.
- Porting requires only a lightweight adapter, as demonstrated by deployments to CockroachDB and Apache Data Fusion.
- Existing rules from systems like Apache Calcite can be reimplemented once and reused.
Where Pith is reading between the lines
- Database engine developers could maintain a shared repository of verified rules to accelerate new system creation.
- Similar DSL approaches might apply to other optimization tasks beyond query planning, such as in compilers or dataflow systems.
- Reducing reimplementation could lower the barrier for experimenting with new database architectures.
Load-bearing premise
The lightweight adapter correctly translates Rulescript operators to the target engine without semantic errors or breaking the verification guarantees.
What would settle it
A case where a Rulescript-verified rule, after adapter mapping, produces a query result or performance that differs from the original rule in the target engine.
Figures
read the original abstract
Logical query plan rewriting transforms a relational database query into an equivalent but more efficient form and is crucial to the performance of database-backed applications. In existing systems, rewrite rules are typically implemented manually, tightly coupled to specific execution engines, and often lack formal correctness guarantees. Consequently, developing a new engine requires reimplementing both legacy and new rules, incurring significant engineering cost, limiting portability, and every new implementation is an opportunity for introducing new bugs. We introduce Rulescript, an engine-agnostic domain-specific language (DSL) for developing query rewrite rules. Rulescript separates rule definition from execution infrastructure via a relational algebra-inspired core language and an explicit decomposition of rules into matching and transformation phases. Developers express rewrites by pattern-matching query plans using Rulescript's core operators and constructing semantically equivalent transformed plans, with all rewrites automatically verified formally to ensure correctness. Rulescript is extensible: users can define custom operators in terms of the core language to capture engine-specific semantics. To integrate with an existing system, developers need only implement a lightweight adapter that maps Rulescript's core and custom operators to the operators implemented in the target engine. We evaluate Rulescript by reimplementing 33 rewrite rules from Apache Calcite and extending the language with several custom operators. To demonstrate portability, we automatically deploy these rules to CockroachDB and Apache Data Fusion, two engines with substantially different backends. Our results show that Rulescript enables "write once, deploy everywhere" paradigm for query plan rewriting, with minimal effort required to deploy previously written rules on a new data engine.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Rulescript, an engine-agnostic DSL for query rewrite rules. Rules are expressed via decomposition into pattern-matching and transformation phases over a relational algebra core, with automatic formal verification of correctness. The language is extensible via custom operators defined in terms of the core; integration with a target engine requires only a lightweight adapter that maps core and custom operators. The evaluation reimplements 33 rules from Apache Calcite (with custom operators) and automatically deploys them to CockroachDB and Apache Data Fusion, claiming a 'write once, deploy everywhere' paradigm with minimal porting effort.
Significance. If the central claims hold, the work could meaningfully reduce engineering cost and bug risk when porting or developing rewrite rules for new database engines, while providing formal correctness guarantees that are rare in this domain. The separation of rule logic from engine-specific adapters and the emphasis on verification address a genuine pain point in query optimizer development. However, the absence of concrete evidence on verification methods, adapter semantics preservation, and quantitative outcomes limits the assessed significance in its current form.
major comments (3)
- [Evaluation] Evaluation section: the claim of successful deployment of 33 rules to CockroachDB and Data Fusion is presented without any quantitative correctness results (e.g., differential testing outcomes, equivalence checks post-adaptation, or failure rates), which is load-bearing for the portability and 'minimal effort' assertions.
- [§3] §3 (Language and Verification): the assertion that 'all rewrites [are] automatically verified formally' provides no description of the verification technique, the properties proven (e.g., equivalence of match+transform), or the handling of custom operators, leaving the verifiable claim without supporting detail.
- [Adapter description] Adapter description (near §4): the central 'lightweight adapter' mechanism is stated to preserve semantics and verification guarantees, yet no evidence, proof sketch, or testing is given that adapters avoid semantic mismatches; this directly affects whether Rulescript-level verification extends to deployed rules on target engines.
minor comments (1)
- [Abstract] Abstract: 'several custom operators' is mentioned without enumeration or examples; adding a brief list would improve clarity without altering the contribution.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which identify key gaps in evidence and detail that weaken the central claims. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation of verification, adapters, and evaluation results.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: the claim of successful deployment of 33 rules to CockroachDB and Data Fusion is presented without any quantitative correctness results (e.g., differential testing outcomes, equivalence checks post-adaptation, or failure rates), which is load-bearing for the portability and 'minimal effort' assertions.
Authors: We agree that the evaluation lacks quantitative correctness metrics. The current text reports only that the 33 rules were reimplemented and deployed, without reporting equivalence checks, differential testing results, or failure rates across the target engines. In the revised version we will add a dedicated subsection (or table) that presents the outcomes of systematic equivalence verification for the deployed rules, including any discrepancies encountered and the effort required for adapter implementation. This will directly support the portability claims. revision: yes
-
Referee: [§3] §3 (Language and Verification): the assertion that 'all rewrites [are] automatically verified formally' provides no description of the verification technique, the properties proven (e.g., equivalence of match+transform), or the handling of custom operators, leaving the verifiable claim without supporting detail.
Authors: The manuscript asserts automatic formal verification but does not describe the underlying technique, the precise properties established (e.g., semantic equivalence of the match-and-transform phases), or how custom operators are incorporated into the verification process. We will expand §3 with a clear account of the verification method, the properties proven, and the reduction of custom operators to the core language for verification purposes. revision: yes
-
Referee: [Adapter description] Adapter description (near §4): the central 'lightweight adapter' mechanism is stated to preserve semantics and verification guarantees, yet no evidence, proof sketch, or testing is given that adapters avoid semantic mismatches; this directly affects whether Rulescript-level verification extends to deployed rules on target engines.
Authors: We acknowledge that the paper describes adapters as lightweight and semantics-preserving without providing a proof sketch, formal argument, or empirical validation that the mappings avoid semantic mismatches. In the revision we will augment the adapter section with a proof sketch showing preservation of core and custom operator semantics, together with a brief report on the concrete adapter implementations and any testing performed for CockroachDB and Data Fusion. revision: yes
Circularity Check
No circularity: Rulescript claims rest on new DSL design and reimplementation
full rationale
The paper introduces Rulescript as a new engine-agnostic DSL with explicit core algebra, matching/transformation decomposition, and formal verification of rewrites at the abstract level. Evaluation reimplements 33 existing Calcite rules and ports them via adapters to CockroachDB and Data Fusion. No derivation reduces to self-definition, fitted inputs renamed as predictions, or load-bearing self-citations; the verification and portability claims are supported by the language definition and empirical porting rather than circular reduction to prior results or inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Relational algebra provides a suitable core language for expressing query plans and rewrite rules across engines
invented entities (1)
-
Rulescript DSL with matching/transformation decomposition
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Astrahan, Mike W
Morton M. Astrahan, Mike W. Blasgen, Donald D. Chamberlin, Kapali P. Eswaran, Jim N Gray, Patricia P. Griffiths, W Frank King, Raymond A. Lorie, Paul R. McJones, James W. Mehl, et al. 1976. System R: Relational approach to database management.ACM Transactions on Database Systems (TODS)1, 2 (1976), 97–137
1976
-
[2]
Edmon Begoli, Jesús Camacho-Rodríguez, Julian Hyde, Michael J Mior, and Daniel Lemire. 2018. Apache calcite: A foundational framework for optimized query processing over heterogeneous data sources. InProceedings of the 2018 International Conference on Management of Data. 221–230
2018
-
[3]
M.-S. Chen and P.S. Yu. 1993. Combining joint and semi-join operations for dis- tributed query processing.IEEE Transactions on Knowledge and Data Engineering 5, 3 (1993), 534–542. https://doi.org/10.1109/69.224205
-
[4]
Shumo Chu, Chenglong Wang, Konstantin Weitz, and Alvin Cheung. 2017. Cosette: An Automated Prover for SQL.. InCIDR. 1–7
2017
-
[5]
Shumo Chu, Konstantin Weitz, Alvin Cheung, and Dan Suciu. 2017. HoTTSQL: Proving query rewrites with univalent SQL semantics.ACM SIGPLAN Notices 52, 6 (2017), 510–524
2017
-
[6]
Benoit Dageville, Thierry Cruanes, Marcin Zukowski, Vadim Antonov, Artin Avanes, Jon Bock, Jonathan Claybaugh, Daniel Engovatov, Martin Hentschel, Jiansheng Huang, et al. 2016. The snowflake elastic data warehouse. InProceed- ings of the 2016 International Conference on Management of Data. 215–226
2016
-
[7]
2022.Transpose Semi-join Aggregation Rule
Apache Doris. 2022.Transpose Semi-join Aggregation Rule. https://github.com/ apache/doris/blob/8cb5a4210f794abca49b9aa0355120877aba2f32/fe/fe-core/src/ main/java/org/apache/doris/nereids/rules/rewrite/TransposeSemiJoinAgg.java
2022
-
[8]
Goetz Graefe. 1995. The cascades framework for query optimization.IEEE Data Eng. Bull.18, 3 (1995), 19–29
1995
-
[9]
Goetz Graefe and William J McKenna. 1993. The volcano optimizer genera- tor: Extensibility and efficient search. InProceedings of IEEE 9th international conference on data engineering. IEEE, 209–218
1993
-
[10]
Yannis E Ioannidis and Raghu Ramakrishnan. 1995. Containment of conjunctive queries: Beyond relations as sets.ACM Transactions on Database Systems (TODS) 20, 3 (1995), 288–324
1995
-
[11]
2024.MySQL Bug #114435: Incorrect query results caused by the subquery optimization
MySQL. 2024.MySQL Bug #114435: Incorrect query results caused by the subquery optimization. https://bugs.mysql.com/bug.php?id=114435
2024
-
[12]
Hamid Pirahesh, Joseph M Hellerstein, and Waqar Hasan. 1992. Extensible/rule based query rewrite optimization in Starburst.ACM Sigmod Record21, 2 (1992), 39–48
1992
-
[13]
2015.PostgreSQL Bug #13592: Optimizer throws out join constraint causing incorrect result
PostgreSQL. 2015.PostgreSQL Bug #13592: Optimizer throws out join constraint causing incorrect result. https://www.postgresql.org/message-id/20150826195031. 2091.40681%40wrigleys.postgresql.org
-
[14]
Manuel Rigger and Zhendong Su. 2020. Detecting optimization bugs in data- base engines via non-optimizing reference engine construction(ESEC/FSE 2020). Association for Computing Machinery, New York, NY, USA, 1140–1152. https://doi.org/10.1145/3368089.3409710
-
[15]
P Griffiths Selinger, Morton M Astrahan, Donald D Chamberlin, Raymond A Lorie, and Thomas G Price. 1979. Access path selection in a relational database management system. InProceedings of the 1979 ACM SIGMOD international conference on Management of data. 23–34
1979
-
[16]
2022.Push Down Left Semi/Anti-join Rule
Apache Spark. 2022.Push Down Left Semi/Anti-join Rule. https: //github.com/apache/spark/blob/168e5cf57147eed3e789b6d609a60396750094e4/ sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ PushDownLeftSemiAntiJoin.scala
2022
-
[17]
2025.Optimizer incorrectly push down aggregations to subqueries with distinct and union all.https://sqlite.org/forum/forumpost/a860f5fb2e
SQLite. 2025.Optimizer incorrectly push down aggregations to subqueries with distinct and union all.https://sqlite.org/forum/forumpost/a860f5fb2e
2025
-
[18]
Rebecca Taft, Irfan Sharif, Andrei Matei, Nathan VanBenschoten, Jordan Lewis, Tobias Grieger, Kai Niemi, Andy Woods, Anne Birzin, Raphael Poss, et al. 2020. Cockroachdb: The resilient geo-distributed sql database. InProceedings of the 2020 ACM SIGMOD international conference on management of data. 1493–1509
2020
-
[19]
The Apache Calcite Team. 2026. Apache Calcite: RelRule. https: //github.com/apache/calcite/blob/80e717ab2e5f9f6fb23995ab995264f2ec940882/ core/src/main/java/org/apache/calcite/plan/RelRule.java. [Accessed 02-27-2026]
2026
-
[20]
The Apache Datafusion Team. 2026. Apache DataFusion. https://datafusion. apache.org/. [Accessed 02-27-2026]
2026
-
[21]
The Apache Datafusion Team. 2026. Apache Datafu- sion: LogicalPlan. https://github.com/apache/datafusion/blob/ 6713439497561fa74a94177e5b8632322fb7cea5/datafusion/expr/src/logical_plan/ plan.rs. [Accessed 02-27-2026]
2026
-
[22]
The CockroachDB Team. 2026. CockroachDB: Optgen. https://github.com/ cockroachdb/cockroach/tree/58e75c8e68cafc9a34482eb75f05ef51e6d8ca9a/pkg/ sql/opt/optgen. [Accessed 02-27-2026]
2026
-
[23]
BA Trahtenbrot. 1963. Impossibility of an algorithm for the decision problem in finite classes.Nine Papers on Logic and Quantum Electrodynamics(1963), 1–5
1963
-
[24]
Transaction Processing Performance Council (TPC). 2020. TPC-H Benchmark Specification, Version 2.17.1. https://www.tpc.org/tpc_documents_current_ versions/pdf/tpc-h_v2.17.1.pdf. [Accessed 30-01-2026]
2020
-
[25]
Patrick Valduriez and Georges Gardarin. 1984. Join and Semijoin Algorithms for a Multiprocessor Database Machine.ACM Trans. Database Syst.9, 1 (March 1984), 133–161. https://doi.org/10.1145/348.318590
-
[26]
Shuxian Wang, Sicheng Pan, and Alvin Cheung. 2024. QED: A Powerful Query Equivalence Decider for SQL.Proc. VLDB Endow.17, 11 (2024), 3602–3614
2024
-
[27]
Qi Zhou, Joy Arulraj, Shamkant B Navathe, William Harris, and Jinpeng Wu
-
[28]
In2022 IEEE 38th International Conference on Data Engineering (ICDE)
SPES: A symbolic approach to proving query equivalence under bag semantics. In2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 2735–2748. 13
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.