pith. sign in

arxiv: 2511.15000 · v3 · pith:RXPXUD66new · submitted 2025-11-19 · 💻 cs.PL · cs.DB

Bonsai: Compiling Queries to Pruned Tree Traversals

Pith reviewed 2026-05-17 21:24 UTC · model grok-4.3

classification 💻 cs.PL cs.DB
keywords query compilationtree traversalpruning optimizationsymbolic interval analysisgeometric predicatesdatabase queriesindex joinscompiler optimization
0
0 comments X

The pith

A compiler derives pruning conditions for tree queries via symbolic interval analysis to generate traversals matching expert code and outperforming linear scans for complex predicates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to automatically turn high-level queries over tree-structured data into efficient single-pass traversals. It does this by extending symbolic interval analysis with rules that handle geometric predicates such as intersection and containment, allowing the system to figure out from node metadata when entire subtrees can be skipped or accepted wholesale. The compiler also fuses compound operations like filters followed by reductions and supports generalized joins beyond simple equality or range checks. A sympathetic reader would care because this removes the need for programmers to hand-write query-specific pruning logic, which existing systems require when standard optimizations do not apply. If correct, the approach makes tree indexes practical for a much wider range of queries without sacrificing performance.

Core claim

The central claim is that symbolic interval analysis, extended with new rules for geometric predicates, can derive correct pruning and inclusion conditions expressed in terms of node metadata. These conditions are then used to compile queries into fused tree traversals. The resulting code matches the behavior of expert-written query-specific implementations and asymptotically outperforms the linear scans and nested-loop joins that systems fall back to when no hand-written case applies.

What carries the argument

Symbolic interval analysis extended with rules for geometric predicates such as intersection and containment, which computes the conditions for pruning or including subtrees from the query and the metadata stored at each tree node.

Load-bearing premise

The extended symbolic interval analysis must derive correct and complete pruning conditions for the supported queries without missing valid cases or introducing incorrect simplifications.

What would settle it

A concrete geometric query predicate for which the generated traversal either visits a subtree that a correct expert implementation would prune or skips a subtree that must be visited, producing either wrong results or worse performance than the hand-written baseline.

Figures

Figures reproduced from arXiv: 2511.15000 by Alexander J Root, Andrew Adams, Christophe Gyurgyik, Fredrik Kjolstad, Jonathan Ragan-Kelley, Kayvon Fatahalian, Purvi Goel.

Figure 1
Figure 1. Figure 1: Set operations supported by Bonsai. 𝑇 and 𝑆 are primitive types (e.g., integers, floats, or product types). filter, any, and all accept search predicates; min, max, argmin, and argmax accept a metric to optimize over; and reduce accepts a commutative and associative operator. Predicates. A core component of a Bonsai query is a predicate, which maps a set element to a boolean value. For scalars, Bonsai supp… view at source ↗
Figure 2
Figure 2. Figure 2: Geometric operations in Bonsai. distmin and distmax return reals, the rest return booleans. contains(A, B) disjoint(A, B) intersects(A, B) touches(A, B) A ≤x B A <y B A B A B A B A B A A B B [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Examples of lowered filters on the ExampleTree. before yielding or iterating; interior nodes scan when 𝑃 is always true, recurse when 𝑃 may be true, and prune otherwise. Figure 4a shows the lowered filter for 𝑃 on ExampleTree. This rewrite naturally fuses chained filters: each adds local conditions to yield, iter, and scan. Figure 4b shows the fused filter for predicates 𝑃 and𝑄, which scans only when both … view at source ↗
Figure 5
Figure 5. Figure 5: Dual-tree traversal specialization. The generic dual traversal [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Fused, final lowered C++ of a coiterating two trees, and counting collisions over leaf and interior [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of Bonsai versus SotA closest point queries and ray tracing in FCPW [68], and collision detection versus FCL [54]. Lower is better for runtimes (a) and (b), and higher is better for speedups (c) [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Runtime comparisons with state of the art (SotA) database management systems. Lower is better. [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Speed-up plots over default linear scans for uniformly sampled data in [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Ablation of count-filter fusion. 9.4.2 Fusion Ablation. Fusion is particularly ben￾eficial when filters are not very selective, as fu￾sion of a reduction on a filter both removes the need for an expensive intermediate data structure, and leverages reduction metadata available in the tree. To highlight these benefits, [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Torus join results, with join predicate √︁ (𝑥0 − 𝑥1) 2 + (𝑦0 − 𝑦1) 2 ∈ [10, 20]. Both tables are uniformly randomly sampled within circles of varying radii, to illustrate the impact of data distribution on join choice [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗
read the original abstract

Trees can accelerate queries that search or aggregate values over large collections. They achieve this by storing metadata that enables quick pruning (or inclusion) of subtrees when predicates on that metadata can prove that none (or all) of the data in a subtree affect the query result. Existing systems implement this pruning logic manually for each query predicate and data structure. We generalize and mechanize this class of optimization. Our method derives conditions for when subtrees can be pruned (or included wholesale), expressed in terms of the metadata available at each node. We efficiently generate these conditions using symbolic interval analysis, extended with new rules to handle geometric predicates (e.g., intersection, containment). Additionally, our compiler fuses compound queries (e.g., reductions on filters) into a single tree traversal. These techniques enable the automatic derivation of generalized single-index and dual-index tree joins that support a wide class of join predicates beyond standard equality and range predicates. The generated traversals match the behavior of expert-written code that implements query-specific traversals, and can asymptotically outperform the linear scans and nested-loop joins that existing systems fall back to when hand-written cases do not apply.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents Bonsai, a compiler that translates queries over tree-structured data into optimized single-pass traversals. It derives subtree pruning (and inclusion) conditions from node metadata via symbolic interval analysis extended with new rules for geometric predicates such as intersection and containment. The system also fuses compound queries (e.g., filters followed by reductions) into a single traversal and generalizes the approach to single-index and dual-index tree joins for predicates beyond equality and range joins. The generated traversals are claimed to match the behavior of expert-written query-specific code while asymptotically outperforming linear scans and nested-loop joins used by existing systems when hand-written cases do not apply.

Significance. If the soundness and completeness claims for the extended interval analysis hold, the work would automate a practically important class of index-specific optimizations that are currently implemented by hand in spatial and tree-based database systems. The ability to derive pruning conditions mechanically and to fuse queries could reduce engineering effort and enable efficient execution for a broader set of predicates than current hand-tuned implementations support.

major comments (2)
  1. [§4.3] §4.3 (Geometric Predicate Rules): The central claim that generated traversals match expert implementations and achieve asymptotic improvements rests on the soundness and completeness of the new rules for intersection and containment predicates. The manuscript provides the rules but does not include a formal soundness argument or an enumeration of edge cases (e.g., degenerate intervals, open vs. closed bounds). Without such an argument, it is impossible to verify that the derived pruning conditions never incorrectly discard a subtree that must be visited.
  2. [§5.2] §5.2 (Experimental Methodology): The abstract asserts that the generated code outperforms linear scans and nested-loop joins. However, the evaluation section does not report how the expert-written baselines were constructed, whether they were allowed the same metadata, or the precise query workloads and data distributions used to demonstrate asymptotic gains. These details are load-bearing for the performance claim.
minor comments (2)
  1. [§3] Notation for interval bounds is introduced in §3 but used inconsistently in later figures; a single table of symbols would improve readability.
  2. The paper cites prior work on symbolic analysis but does not discuss how the new geometric rules differ from existing interval-arithmetic extensions in the literature.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and for recognizing the potential of Bonsai to automate index-specific optimizations. We address each major comment below and will incorporate revisions to strengthen the soundness presentation and experimental details.

read point-by-point responses
  1. Referee: [§4.3] §4.3 (Geometric Predicate Rules): The central claim that generated traversals match expert implementations and achieve asymptotic improvements rests on the soundness and completeness of the new rules for intersection and containment predicates. The manuscript provides the rules but does not include a formal soundness argument or an enumeration of edge cases (e.g., degenerate intervals, open vs. closed bounds). Without such an argument, it is impossible to verify that the derived pruning conditions never incorrectly discard a subtree that must be visited.

    Authors: We agree that a formal soundness argument and explicit treatment of edge cases would improve verifiability. The geometric rules extend standard symbolic interval analysis by conservatively over-approximating intersection and containment using interval bounds on node metadata; this design ensures pruning conditions are sound (never discarding a subtree that must be visited) but may be incomplete in some degenerate cases. In the revised manuscript we will add a proof sketch in §4.3 (or an appendix) that establishes soundness via induction on the interval operations, together with a discussion of edge cases including zero-width intervals, open/closed bounds, and empty subtrees. revision: yes

  2. Referee: [§5.2] §5.2 (Experimental Methodology): The abstract asserts that the generated code outperforms linear scans and nested-loop joins. However, the evaluation section does not report how the expert-written baselines were constructed, whether they were allowed the same metadata, or the precise query workloads and data distributions used to demonstrate asymptotic gains. These details are load-bearing for the performance claim.

    Authors: We acknowledge that additional methodological detail is needed for reproducibility. The expert baselines were hand-written by the authors with identical access to the same node metadata used by the compiler, and the workloads were selected to expose asymptotic differences (synthetic uniform and clustered distributions plus real spatial datasets). We will expand §5.2 with explicit descriptions of baseline construction, the exact query sets, data cardinalities and distributions, and the measurement protocol for asymptotic gains versus linear and nested-loop alternatives. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation from query/metadata via symbolic analysis is self-contained

full rationale

The paper presents a mechanized compiler that derives subtree pruning conditions directly from the input query predicates and per-node metadata using symbolic interval analysis extended by new rules for geometric predicates (intersection, containment). The abstract and description frame this as a forward derivation that produces traversals whose behavior matches expert-written code and can outperform linear scans. No equations, fitted parameters, or self-citations are shown reducing the claimed output (pruning conditions or generated traversals) to the inputs by construction. The extension for geometric predicates is introduced as an addition to the analysis rather than a renaming or self-referential fit. The method is therefore independent of the circularity patterns; any questions of soundness or completeness fall under correctness rather than circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that interval analysis can be soundly extended to geometric predicates and that queries admit a representation suitable for this symbolic treatment; no free parameters or new entities are mentioned.

axioms (1)
  • domain assumption Symbolic interval analysis can be extended with new rules to correctly handle geometric predicates such as intersection and containment for pruning decisions.
    This extension is presented as the key enabler for generalizing beyond standard range and equality predicates.

pith-pipeline@v0.9.0 · 5520 in / 1273 out tokens · 37788 ms · 2026-05-17T21:24:04.700100+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Decoupling Data Layouts from Bounding Volume Hierarchies

    cs.PL 2025-11 unverdicted novelty 7.0

    Scion is a new DSL and compiler that decouples BVH data layouts from traversal algorithms, enabling architecture-agnostic layout optimizations and a novel Pareto-optimal ray tracing layout.