pith. sign in

arxiv: 2606.09361 · v1 · pith:NTCUE5XAnew · submitted 2026-06-08 · 💻 cs.DB

Bespoke-Card: Why Tune When You Can Generate? Synthesizing Workload-Specific Cardinality Estimators

Pith reviewed 2026-06-27 14:21 UTC · model grok-4.3

classification 💻 cs.DB
keywords cardinality estimationquery optimizationagent-based code synthesisworkload-specific estimatorsPostgreSQLJoin Order Benchmarkq-error feedback
0
0 comments X

The pith

An agent system synthesizes workload-specific cardinality estimators as executable code, cutting PostgreSQL runtime on the JOB benchmark by 33%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Traditional cardinality estimators must support arbitrary schemas and workloads, so they rely on generic statistics that often produce large errors and poor query plans. Bespoke-Card instead directs AI agents to generate estimators tailored to one known workload and schema by writing and refining executable estimation code. A planning agent outlines the strategy, a coding agent produces the implementation, and a validator scores outputs against true cardinalities using q-error feedback, regression analysis, outlier subplans, and a curriculum that isolates different error types. When the resulting code is injected into PostgreSQL, total runtime on the Join Order Benchmark drops 33% and median q-error across all subplans falls 41%. The entire synthesis finishes in under an hour at a cost below $10, offering a practical alternative to both generic estimators and learned models.

Core claim

Bespoke-Card synthesizes workload-specific cardinality estimators directly as executable code. A planning agent designs the estimation approach, a coding agent writes the implementation, and a validator scores the code against ground-truth cardinalities and PostgreSQL estimates. Structured feedback from q-error, regression analysis, concrete outlier subplans, and a curriculum covering join-only, filter-only, and full-subplan cases guides selection of the best implementation. When these estimators replace the default inside the optimizer, total PostgreSQL runtime on the JOB benchmark falls by 33% and median q-error over all subplans drops by 41%.

What carries the argument

The multi-agent synthesis harness of planning, coding, and validation agents that iteratively generates, tests, and archives executable cardinality estimation code using q-error and curriculum feedback.

If this is right

  • Workload-specific estimators can be created on demand without manual tuning or large training sets.
  • Query optimizers gain access to estimates that match the actual schema and query patterns of a given database.
  • Synthesis can be repeated cheaply whenever the workload changes, producing fresh estimators each time.
  • The approach provides an alternative path for cardinality estimation that sits beside both classical statistics and learned models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same agent harness could be tested on synthesizing other optimizer components such as cost models or join-order heuristics.
  • If the validator remains reliable, the method might reduce dependence on fixed learned architectures that require offline training data collection.
  • Applying the synthesis loop to production workloads with slowly drifting query patterns would reveal how often regeneration is needed to maintain gains.

Load-bearing premise

The agents can produce correct, executable estimation code that integrates safely with the optimizer and generalizes beyond the queries used for validation.

What would settle it

Integrating the synthesized code into PostgreSQL and measuring no reduction in total runtime or median q-error on the JOB benchmark would falsify the performance claim.

Figures

Figures reproduced from arXiv: 2606.09361 by Anton Winter, Carsten Binnig, Johannes Wehrstein, Timo Eckmann.

Figure 1
Figure 1. Figure 1: Bespoke-Card reduces total PostgreSQL runtime by [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Bespoke-Card synthesizes a cardinality estimator given a dataset and tunes it for a given workload. Two agents, a [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Quality of cardinality estimates for multi-join queries compared to the true cardinalities. Bespoke-Card [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Q-error distribution of cardinality estimates com [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Q-error distribution over all subplans of the JOB [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
read the original abstract

Cardinality estimators are built to support arbitrary schemas and workloads, forcing them to rely on generic statistics even when the schema and workload is known in advance, leaving optimizers prone to large errors and poor plans. We present Bespoke-Card, an agent-driven system that synthesizes workload-specific cardinality estimators as executable code: a planning agent designs the estimators strategies, a coding agent implements them, and a validator scores the estimates against true cardinalities and PostgreSQL estimates, forming a robust and deterministic harness. Going beyond naive prompting, Bespoke-Card uses structured q-error feedback, regression analysis, concrete outlier subplans, a curriculum isolating join-only, filter-only, and full-subplan errors, and archival selection of the best implementation. Injecting its estimates into the optimizer cuts total PostgreSQL runtime on JOB by 33% and reduces median q-error over all JOB subplans by 41%, while synthesizing a strong estimator in under one hour for less than $10. Bespoke-Card is opening a new avenue for cardinality estimation next to classical generic estimators and learned estimator architectures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper presents Bespoke-Card, an agent-driven system that synthesizes workload-specific cardinality estimators as executable code via a planning agent, coding agent, and validator that scores against ground-truth cardinalities using structured q-error feedback, regression analysis, outlier subplans, and a curriculum over join/filter/full subplans. On the JOB benchmark, the best synthesized estimator reduces total PostgreSQL runtime by 33% and median q-error over all subplans by 41% when injected into the optimizer, at a cost of under one hour and $10.

Significance. If the reported gains reflect genuine generalization, the work would be significant for demonstrating a practical, low-cost alternative to both classical generic estimators and learned models, enabling bespoke estimators tailored to known schemas and workloads.

major comments (1)
  1. [Evaluation on JOB benchmark] The 41% median q-error reduction and 33% runtime improvement are reported over all JOB subplans, but the validator, structured feedback, outlier analysis, curriculum, and archival selection all operate on these same subplans. No held-out test partition of subplans or queries (excluded from all feedback and selection) is described, so the gains may reflect overfitting to the synthesis set rather than a generalizable workload-specific estimator. This directly undermines the central empirical claim.
minor comments (2)
  1. [System overview] The abstract and system description would benefit from explicit pseudocode or a diagram of the full agent loop, including how the validator's scores are converted into prompts for the coding agent.
  2. [Implementation details] Clarify the exact mechanism and safety guarantees for injecting the synthesized estimator into PostgreSQL (e.g., which hooks or extensions are used and how it avoids side effects on other queries).

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for highlighting a methodological concern in our evaluation. We address the comment directly below and commit to revisions that strengthen the empirical claims.

read point-by-point responses
  1. Referee: [Evaluation on JOB benchmark] The 41% median q-error reduction and 33% runtime improvement are reported over all JOB subplans, but the validator, structured feedback, outlier analysis, curriculum, and archival selection all operate on these same subplans. No held-out test partition of subplans or queries (excluded from all feedback and selection) is described, so the gains may reflect overfitting to the synthesis set rather than a generalizable workload-specific estimator. This directly undermines the central empirical claim.

    Authors: We agree that the absence of an explicitly held-out partition of subplans or queries (never seen during planning, coding, validation, feedback, or archival selection) leaves open the possibility that the reported q-error reductions partly reflect fitting to the synthesis set rather than a robust workload-specific estimator. The runtime improvement provides some downstream evidence of utility, but it does not fully isolate generalization of the cardinality function itself. We will revise the manuscript to (1) partition the JOB subplans into synthesis and held-out sets, (2) report q-error and runtime results on the held-out portion, and (3) add a discussion of how the curriculum and outlier analysis interact with generalization. These changes will be reflected in the next version. revision: yes

Circularity Check

0 steps flagged

No significant circularity; results are empirical measurements on external benchmark

full rationale

The paper describes an agent-based synthesis system for workload-specific cardinality estimators and reports concrete runtime and q-error improvements measured on the external JOB benchmark. No equations, fitted parameters, or self-citation chains are presented that reduce any claimed result to the inputs by construction. The central claims rest on direct experimental outcomes rather than any self-definitional, fitted-input, or uniqueness-imported derivation. The absence of an explicit held-out partition is an evaluation-validity concern, not a circularity in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on the unverified effectiveness of the agent synthesis process and the availability of ground-truth cardinalities for validation.

axioms (1)
  • domain assumption AI agents can reliably generate correct and effective cardinality estimation code for given workloads
    The reported performance gains depend on the agents producing usable estimators that pass validation.

pith-pipeline@v0.9.1-grok · 5735 in / 1216 out tokens · 30046 ms · 2026-06-27T14:21:51.916385+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references

  1. [1]

    Gibbons, Viswanath Poosala, and Sridhar Ra- maswamy

    Swarup Acharya, Phillip B. Gibbons, Viswanath Poosala, and Sridhar Ra- maswamy. 1999. Join Synopses for Approximate Query Answering. InSIGMOD. 275–286

  2. [2]

    Rico Bergmann, Claudio Hartmann, Dirk Habich, and Wolfgang Lehner. 2025. An Elephant Under the Microscope: Analyzing the Interaction of Optimizer Components in PostgreSQL.SIGMOD3, 1 (2025), 9:1–9:28

  3. [3]

    Garofalakis, Peter J

    Graham Cormode, Minos N. Garofalakis, Peter J. Haas, and Chris Jermaine

  4. [4]

    Foundations and Trends in Databases4, 1-3 (2012), 1–294

    Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches. Foundations and Trends in Databases4, 1-3 (2012), 1–294

  5. [5]

    Timo Eckmann, Matthias Jasny, Johannes Wehrstein, and Carsten Binnig. 2026. The Future Is Bespoke: Synthesizing One-Size-Fits-One DBMSs with LLM Coding Agents.IEEE Data Engineering Bulletin50, 1 (2026), 88–103

  6. [6]

    Ullman, and Jennifer Widom

    Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom. 2009.Database Systems - The Complete Book (2. ed.)

  7. [7]

    Benjamin Hilprecht, Andreas Schmidt, Moritz Kulessa, Alejandro Molina, Kris- tian Kersting, and Carsten Binnig. 2020. DeepDB: Learn from Data, not from Queries!VLDB13, 7 (2020), 992–1005

  8. [8]

    Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter Boncz, and Alfons Kemper. 2019. Learned Cardinalities: Estimating Correlated Joins with Deep Learning. InCIDR

  9. [9]

    Oleksii Kliukin. 2014. PgTune – Tuning PostgreSQL Config by Your Hardware

  10. [10]

    Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How Good Are Query Optimizers, Really?VLDB9, 3 (2015), 204–215

  11. [11]

    Viktor Leis, Bernhard Radke, Andrey Gubichev, Alfons Kemper, and Thomas Neumann. 2017. Cardinality Estimation Done Right: Index-Based Join Sampling. InCIDR

  12. [12]

    Yao Lu, Srikanth Kandula, Arnd Christian König, and Surajit Chaudhuri. 2021. Pre-training Summarization Models of Structured Datasets for Cardinality Esti- mation.VLDB15, 3 (2021), 414–426

  13. [13]

    Ioannidis

    Viswanath Poosala and Yannis E. Ioannidis. 1997. Selectivity Estimation Without the Attribute Value Independence Assumption. InVLDB. 486–495

  14. [14]

    Ioannidis, Peter J

    Viswanath Poosala, Yannis E. Ioannidis, Peter J. Haas, and Eugene J. Shekita

  15. [15]

    In SIGMOD

    Improved Histograms for Selectivity Estimation of Range Predicates. In SIGMOD. 294–305

  16. [16]

    Selinger, Morton M

    Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, and Thomas G. Price. 1979. Access Path Selection in a Relational Database Management System. InSIGMOD. 23–34

  17. [17]

    One Size Fits All

    Michael Stonebraker and Ugur Çetintemel. 2005. "One Size Fits All": An Idea Whose Time Has Come and Gone (Abstract). InICDE. 2–11

  18. [18]

    Alexander van Renen, Dominik Horn, Pascal Pfeil, Kapil Vaidya, Wenjian Dong, Murali Narayanaswamy, Zhengchun Liu, Gaurav Saxena, Andreas Kipf, and Tim Kraska. 2024. Why TPC Is Not Enough: An Analysis of the Amazon Redshift Fleet.VLDB17, 11 (2024), 3694–3706

  19. [19]

    Johannes Wehrstein, Carsten Binnig, Fatma Özcan, Shobha Vasudevan, Yu Gan, and Yawen Wang. 2025. Towards Foundation Database Models. InCIDR

  20. [20]

    Johannes Wehrstein, Timo Eckmann, Roman Heinrich, and Carsten Binnig

  21. [21]

    JOB-Complex: A Challenging Benchmark for Traditional & Learned Query Optimization.VLDB(2025)

  22. [22]

    Johannes Wehrstein, Timo Eckmann, Matthias Jasny, and Carsten Binnig. 2026. Bespoke OLAP: Synthesizing Workload-Specific One-size-fits-one Database En- gines.arXiv preprint arXiv:2603.02001(2026)

  23. [23]

    Peizhi Wu and Gao Cong. 2021. A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation. InSIGMOD. 2009–2022

  24. [24]

    Zongheng Yang, Amog Kamsetty, Sifei Luan, Eric Liang, Yan Duan, Xi Chen, and Ion Stoica. 2020. NeuroCard: One Cardinality Estimator for All Tables.VLDB14, 1 (2020), 61–73

  25. [25]

    Tianjing Zeng, Junwei Lan, Jiahong Ma, Wenqing Wei, Rong Zhu, Pengfei Li, Bolin Ding, Defu Lian, Zhewei Wei, and Jingren Zhou. 2024. PRICE: A Pretrained Model for Cross-Database Cardinality Estimation.VLDB18, 3 (2024), 637–650. 10