pith. sign in

arxiv: 2606.10187 · v1 · pith:BSATLVUUnew · submitted 2026-06-08 · 📊 stat.ML · cs.LG

Decision-Calibrated Conformal Uncertainty for Pacing Decisions in Streaming Advertising

Pith reviewed 2026-06-27 14:24 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords conformal predictionpacing decisionsstreaming advertisingdecision calibrationuncertainty quantificationpolicy sensitivityadvertising optimization
0
0 comments X

The pith

A conformal uncertainty score for pacing decisions is the smallest one that protects every possible policy with finite-sample guarantees.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a decision-calibrated conformal method for handling uncertainty in streaming advertising pacing. Pacing choices depend on uncertain inventory, demand, response, and experience factors, so the method evaluates forecast errors by their maximum effect on policies that could be deployed rather than by raw prediction mistakes. The central theorem establishes that this score is the minimal valid uncertainty measure protecting all such policies uniformly. It is constructed as the support function of the signed policy sensitivity set, and split conformal calibration provides coverage. Experiments on calibrated replays from Criteo and KuaiRand datasets demonstrate substantially smaller uncertainty radii than traditional approaches, allowing more decisive yet reliable pacing choices.

Core claim

The proposed score is the smallest valid uncertainty measure that uniformly protects all deployable pacing policies. Geometrically, it is the support function of the signed policy sensitivity set. Split conformal calibration gives finite-sample coverage for this score. A high-dimensional separation theorem shows that traditional residual calibration can be arbitrarily more conservative by paying for nuisance inventory dimensions. A robust pacing result combines inventory, response, and experience uncertainty.

What carries the argument

The decision-calibrated score defined as the support function of the signed policy sensitivity set, which measures the worst-case impact on deployable policies.

If this is right

  • Traditional residual-based calibration incurs unnecessary conservatism from high-dimensional inventory uncertainties that do not affect policies.
  • The method produces separate uncertainty margins for value, delivery, budget, and member-experience load.
  • On Criteo replays, it reduces held-out violation rates from 16.7% to 3.3% with no budget or member-load violations.
  • It certifies less aggressive pacing policies than point-forecast baselines while maintaining coverage.
  • Uncertainty radii drop from over 7000 to under 20 on Criteo and from 4600 to under 280 on KuaiRand.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This calibration strategy could apply to other sequential decisions where only certain policies are feasible, such as in resource allocation or bidding.
  • If the policy sensitivity set changes over time, online updates to the set might be needed to keep the score tight.
  • The geometric interpretation suggests connections to robust optimization where sensitivity sets define the uncertainty.
  • Further work could test whether this leads to better overall revenue or user experience in live systems.

Load-bearing premise

One must be able to accurately describe the set of policies that could be used in order to compute the relevant sensitivity set.

What would settle it

Observe whether adding a high-dimensional inventory variable that does not affect any policy causes traditional residual radii to increase arbitrarily while the decision-calibrated radii stay fixed.

Figures

Figures reproduced from arXiv: 2606.10187 by Caroline Howard, Prashant Shekhar.

Figure 1
Figure 1. Figure 1: Decision-calibrated conformal pacing. Generic residual calibration uses a radius that covers all forecast errors, including irrelevant inventory directions. The decision-calibrated score projects forecast errors onto value and constraint directions used by the pacing catalog, calibrates the resulting policy-impact radius, and returns either a certified policy or an unresolved decision. methods typically ca… view at source ↗
Figure 2
Figure 2. Figure 2: Deployment-facing replay comparison. Point-forecast pacing ignores forecast uncertainty. Generic-residual conformal pacing uses an unweighted residual radius, while the decision-calibrated selector uses component radii built from catalog sensitivities. Criteo has enough margin for a certified policy, while KuaiRand remains unresolved under repeated-exposure and member-load uncertainty. The KuaiRand case is… view at source ↗
Figure 3
Figure 3. Figure 3: Forecasting inputs evaluated by downstream pacing uncertainty and certification outcome. The top-left panel reports Criteo decision-calibrated radii by forecaster; smaller radii mean less uncertainty in the value and constraint directions used by the pacing selector, and all Criteo forecasters certify a policy. The bottom-left panel reports the same quantity for KuaiRand, where every forecaster remains unr… view at source ↗
Figure 4
Figure 4. Figure 4: Response-estimation inputs to the conformal pacing selector. The left panel compares the response scale supplied to the pacing problem under three modes: (i) a model-predicted response, (ii) a context-adjusted doubly robust estimate when treatment contrast is available, and (iii) a robust response input that subtracts the response uncertainty radius. Criteo has enough randomized treatment contrast for the … view at source ↗
Figure 5
Figure 5. Figure 5: Geometry diagnostic for Theorem 4.1. This is a synthetic theorem-level diagnostic rather than a Criteo or KuaiRand experiment. The implementation constructs 8 orthogonal policy-sensitivity directions in a 16-dimensional forecast-error space and evaluates 500 random forecast-error draws plus targeted active-direction errors. The left panel reports each candidate certificate’s mean size relative to the minim… view at source ↗
Figure 6
Figure 6. Figure 6: Theory-level separation and coverage diagnostics. These are synthetic diagnostics for Theorem 5.2 and the split￾conformal coverage statement in Theorem 4.1. The left panel constructs forecast-error vectors with one decision-relevant coordinate and an increasing number of nuisance inventory coordinates that are orthogonal to every policy-sensitivity direction. The decision-calibrated radius stays fixed at 1… view at source ↗
Figure 7
Figure 7. Figure 7: Calibration sample-size diagnostic for Proposition 5.1. The experiment builds longer 480-block Criteo and KuaiRand streaming cases, treats the full 120-block calibration split as the reference calibration set, then repeatedly subsamples smaller calibration sets and reruns the same component-radius calibration and robust selector. The left panel reports the median maximum component quantile error across val… view at source ↗
read the original abstract

We develop a decision-calibrated conformal framework for pacing decisions in streaming advertising. Pacing depends on uncertain future inventory, demand pressure, incremental response, and member-experience load. Instead of calibrating a generic forecast residual, the framework measures forecast error by its largest impact on the policies that could actually be deployed. The main theorem shows that the proposed score is the smallest valid uncertainty measure that uniformly protects all deployable pacing policies. Geometrically, it is the support function of the signed policy sensitivity set. Split conformal calibration gives finite-sample coverage for this score. A high-dimensional separation theorem shows that traditional residual calibration can be arbitrarily more conservative by paying for nuisance inventory dimensions, and a robust pacing result combines inventory, response, and experience uncertainty. On public-data-calibrated pacing replays built from Criteo Uplift and KuaiRand datasets, traditional conformal pacing remains unresolved with high residual radii of 7236.7 on Criteo and 4629.4 on KuaiRand. With the proposed decision calibration approach, the uncertainty radii are reduced to 18.4 and 278.6 respectively, with separate margins for value, delivery, budget, and member load. On Criteo, the proposed method certifies a less aggressive pacing policy than the point-forecast baseline, and reduces held-out any-violation rate from 16.7% to 3.3%, with zero budget and member-load violations. On KuaiRand, the choice remains unresolved. In a nutshell, the paper establishes that forecasts, response estimates, and member-experience models should be judged by whether they shrink the uncertainty that the pacing decision uses, as this leads to confident decisions that are not overly conservative.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper develops a decision-calibrated conformal framework for pacing decisions in streaming advertising. Uncertainty is defined geometrically as the support function of the signed policy sensitivity set rather than a generic forecast residual; the main theorem establishes that this score is the smallest valid uncertainty measure that uniformly protects all deployable pacing policies. Split conformal calibration is claimed to deliver finite-sample coverage. Additional results include a high-dimensional separation theorem showing that residual-based calibration can be arbitrarily conservative and a robust combination of inventory, response, and experience uncertainties. Experiments on Criteo Uplift and KuaiRand replays report large reductions in uncertainty radii (from 7236.7 to 18.4 on Criteo; 4629.4 to 278.6 on KuaiRand) and lower any-violation rates.

Significance. If the central guarantees hold, the work provides a principled way to calibrate uncertainty directly to the downstream pacing decision rather than to nuisance dimensions, yielding less conservative yet still protected policies. The geometric characterization via support functions and the explicit comparison to traditional conformal methods are clear strengths; the empirical reductions in radii and violation rates on public-data replays illustrate practical impact.

major comments (2)
  1. [Main Theorem] Main Theorem (section containing the coverage claim for the support function): the finite-sample coverage guarantee is obtained via standard split conformal and therefore requires exchangeability between calibration and test points. The streaming setting (sequential inventory, demand, response, and member-experience realizations) introduces temporal dependence that is not addressed by the high-dimensional separation result or the robust pacing combination; no argument is given that these restore exchangeability or that a time-series conformal variant is used. This directly affects the uniform protection claim over all deployable policies.
  2. [High-dimensional separation theorem] High-dimensional separation theorem (section stating the result on nuisance inventory dimensions): the claim that traditional residual calibration can be arbitrarily more conservative is load-bearing for the motivation, yet the quantitative separation is shown only under the paper's geometric construction; it is unclear whether the same separation holds once temporal dependence is acknowledged, which would weaken the comparative advantage.
minor comments (2)
  1. Notation for the signed policy sensitivity set and its support function should be introduced with an explicit definition before the main theorem to improve readability.
  2. [Experiments] The experimental section reports radii and violation rates but does not include confidence intervals or statistical tests on the held-out any-violation reduction (16.7% to 3.3% on Criteo); adding these would strengthen the empirical claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for highlighting the exchangeability requirement in the main theorem and its implications for the high-dimensional separation result in the streaming setting. We address each major comment below.

read point-by-point responses
  1. Referee: [Main Theorem] Main Theorem (section containing the coverage claim for the support function): the finite-sample coverage guarantee is obtained via standard split conformal and therefore requires exchangeability between calibration and test points. The streaming setting (sequential inventory, demand, response, and member-experience realizations) introduces temporal dependence that is not addressed by the high-dimensional separation result or the robust pacing combination; no argument is given that these restore exchangeability or that a time-series conformal variant is used. This directly affects the uniform protection claim over all deployable policies.

    Authors: We agree that the finite-sample coverage guarantee relies on the standard exchangeability assumption between calibration and test points for split conformal, which is not restored by the separation result or robust combination in the manuscript. The streaming nature of the data introduces temporal dependence that is not addressed. We will revise the paper to explicitly state the exchangeability assumption as a modeling choice and discuss it as a limitation, including potential extensions to time-series conformal variants. revision: yes

  2. Referee: [High-dimensional separation theorem] High-dimensional separation theorem (section stating the result on nuisance inventory dimensions): the claim that traditional residual calibration can be arbitrarily more conservative is load-bearing for the motivation, yet the quantitative separation is shown only under the paper's geometric construction; it is unclear whether the same separation holds once temporal dependence is acknowledged, which would weaken the comparative advantage.

    Authors: The separation theorem is established under the geometric construction and the assumptions supporting the main theorem, including exchangeability. We acknowledge that temporal dependence could affect whether the same quantitative separation holds. We will revise to add a discussion noting this caveat and its potential impact on the comparative advantage over residual-based methods. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper defines the proposed score geometrically as the support function of the signed policy sensitivity set and separately invokes standard split conformal calibration to obtain finite-sample coverage. The main theorem asserts this score is the smallest valid uncertainty measure protecting all deployable policies, but this follows from the geometric construction and conformal properties without reducing to self-definition, fitted inputs renamed as predictions, or load-bearing self-citations. No ansatz smuggling, uniqueness theorems from prior author work, or renaming of known results is present in the derivation chain. The framework remains self-contained with independent content from external conformal theory and empirical replays.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard conformal prediction assumptions and the definition of the policy sensitivity set.

axioms (1)
  • domain assumption Split conformal calibration provides finite-sample coverage for the decision-calibrated score.
    Invoked for the main coverage guarantee.

pith-pipeline@v0.9.1-grok · 5844 in / 997 out tokens · 16840 ms · 2026-06-27T14:24:10.200699+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references · 5 canonical work pages

  1. [1]

    Journal of the ACM , volume =

    AdWords and Generalized Online Matching , author =. Journal of the ACM , volume =

  2. [2]

    and Hayes, Thomas P

    Devanur, Nikhil R. and Hayes, Thomas P. , booktitle =. The

  3. [3]

    Proceedings of the 40th International Conference on Machine Learning , series =

    Robust Budget Pacing with a Single Sample , author =. Proceedings of the 40th International Conference on Machine Learning , series =

  4. [4]

    arXiv preprint arXiv:2205.08674 , eprint =

    Budget Pacing in Repeated Auctions: Regret and Efficiency without Convergence , author =. arXiv preprint arXiv:2205.08674 , eprint =

  5. [5]

    arXiv preprint arXiv:2301.02276 , eprint =

    Statistical Inference and A/B Testing for First-Price Pacing Equilibria , author =. arXiv preprint arXiv:2301.02276 , eprint =

  6. [6]

    arXiv preprint arXiv:2402.07322 , eprint =

    Interference Among First-Price Pacing Equilibria: A Bias and Variance Analysis , author =. arXiv preprint arXiv:2402.07322 , eprint =

  7. [7]

    arXiv preprint arXiv:2112.15155 , eprint =

    Auction Throttling and Causal Inference of Online Advertising Effects , author =. arXiv preprint arXiv:2112.15155 , eprint =

  8. [8]

    Proceedings of Thirty Seventh Conference on Learning Theory , series =

    Autobidders with Budget and ROI Constraints: Efficiency, Regret, and Pacing Dynamics , author =. Proceedings of Thirty Seventh Conference on Learning Theory , series =

  9. [9]

    AAAI Conference on Artificial Intelligence , volume =

    Percentile Risk-Constrained Budget Pacing for Guaranteed Display Advertising in Online Optimization , author =. AAAI Conference on Artificial Intelligence , volume =

  10. [10]

    arXiv preprint arXiv:2208.12809 , eprint =

    Incrementality Bidding and Attribution , author =. arXiv preprint arXiv:2208.12809 , eprint =

  11. [11]

    Advances in Neural Information Processing Systems , year =

    Incrementality Bidding via Reinforcement Learning under Mixed and Delayed Rewards , author =. Advances in Neural Information Processing Systems , year =

  12. [12]

    Proceedings of the Tenth ACM International Conference on Web Search and Data Mining , pages =

    Real-Time Bidding by Reinforcement Learning in Display Advertising , author =. Proceedings of the Tenth ACM International Conference on Web Search and Data Mining , pages =. doi:10.1145/3018661.3018702 , year =

  13. [13]

    Proceedings of the 27th ACM International Conference on Information and Knowledge Management , pages =

    Real-Time Bidding with Multi-Agent Reinforcement Learning in Display Advertising , author =. Proceedings of the 27th ACM International Conference on Information and Knowledge Management , pages =. doi:10.1145/3269206.3272021 , year =

  14. [14]

    Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages =

    Deep Reinforcement Learning for Sponsored Search Real-time Bidding , author =. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages =

  15. [15]

    1704.04110 , archivePrefix =

    Salinas, David and Flunkert, Valentin and Gasthaus, Jan , journal =. 1704.04110 , archivePrefix =

  16. [16]

    and Carpov, Dmitri and Chapados, Nicolas and Bengio, Yoshua , booktitle =

    Oreshkin, Boris N. and Carpov, Dmitri and Chapados, Nicolas and Bengio, Yoshua , booktitle =

  17. [17]

    International Journal of Forecasting , volume =

    Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting , author =. International Journal of Forecasting , volume =

  18. [18]

    Proceedings of the AAAI conference on artificial intelligence , volume=

    Informer: Beyond efficient transformer for long sequence time-series forecasting , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

  19. [19]

    Advances in neural information processing systems , volume=

    Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting , author=. Advances in neural information processing systems , volume=

  20. [20]

    International Conference on Learning Representations , year =

    A Time Series is Worth 64 Words: Long-Term Forecasting with Transformers , author =. International Conference on Learning Representations , year =

  21. [21]

    arXiv preprint arXiv:2403.07815 , eprint =

    Chronos: Learning the Language of Time Series , author =. arXiv preprint arXiv:2403.07815 , eprint =

  22. [22]

    Advances in neural information processing systems , volume=

    Conformalized quantile regression , author=. Advances in neural information processing systems , volume=

  23. [23]

    Advances in Neural Information Processing Systems , year =

    Adaptive Conformal Inference under Distribution Shift , author =. Advances in Neural Information Processing Systems , year =

  24. [24]

    Proceedings of the 38th International Conference on Machine Learning , series =

    Conformal Prediction Interval for Dynamic Time-Series , author =. Proceedings of the 38th International Conference on Machine Learning , series =

  25. [25]

    The Annals of Applied Statistics , volume=

    Learn then test: Calibrating predictive algorithms to achieve risk control , author=. The Annals of Applied Statistics , volume=. 2025 , publisher=

  26. [26]

    Management Science , volume =

    Smart ``Predict, then Optimize'' , author =. Management Science , volume =

  27. [27]

    Proceedings of the Tenth Symposium on Conformal and Probabilistic Prediction and Applications , series =

    Conformal Uncertainty Sets for Robust Optimization , author =. Proceedings of the Tenth Symposium on Conformal and Probabilistic Prediction and Applications , series =. 2021 , publisher =

  28. [28]

    Advances in neural information processing systems , volume=

    Predict-then-calibrate: A new perspective of robust contextual lp , author=. Advances in neural information processing systems , volume=

  29. [29]

    International Conference on Learning Representations , year =

    Utility-Directed Conformal Prediction: A Decision-Aware Framework for Actionable Uncertainty Quantification , author =. International Conference on Learning Representations , year =

  30. [30]

    Proceedings of the 28th International Conference on Machine Learning , pages=

    Doubly robust policy evaluation and learning , author=. Proceedings of the 28th International Conference on Machine Learning , pages=

  31. [31]

    International conference on machine learning , pages=

    Counterfactual risk minimization: Learning from logged bandit feedback , author=. International conference on machine learning , pages=. 2015 , organization=

  32. [32]

    The Econometrics Journal , volume =

    Double/Debiased Machine Learning for Treatment and Structural Parameters , author =. The Econometrics Journal , volume =

  33. [33]

    The Annals of Statistics , volume =

    Generalized Random Forests , author =. The Annals of Statistics , volume =

  34. [34]

    International conference on machine learning , pages=

    Estimating individual treatment effect: generalization bounds and algorithms , author=. International conference on machine learning , pages=. 2017 , organization=

  35. [35]

    Journal of the ACM (JACM) , volume=

    Bandits with knapsacks , author=. Journal of the ACM (JACM) , volume=. 2018 , publisher=

  36. [36]

    International Conference on Machine Learning , pages=

    Adaptive algorithms for online convex optimization with long-term constraints , author=. International Conference on Machine Learning , pages=. 2016 , organization=

  37. [37]

    Mathematical Programming , volume =

    Extending the Scope of Robust Optimization , author =. Mathematical Programming , volume =

  38. [38]

    SIAM Review , volume =

    Theory and Applications of Robust Optimization , author =. SIAM Review , volume =

  39. [39]

    Proceedings of the 1st Workshop on Deep Learning for Recommender Systems , pages =

    Wide & Deep Learning for Recommender Systems , author =. Proceedings of the 1st Workshop on Deep Learning for Recommender Systems , pages =. doi:10.1145/2988450.2988454 , year =

  40. [40]

    Guo, Huifeng and Tang, Ruiming and Ye, Yunming and Li, Zhenguo and He, Xiuqiang , booktitle =

  41. [41]

    Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages =

    Deep Interest Network for Click-Through Rate Prediction , author =. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages =. doi:10.1145/3219819.3219823 , year =

  42. [42]

    arXiv preprint arXiv:2008.07146 , year=

    Open bandit dataset and pipeline: Towards realistic and reproducible off-policy evaluation , author=. arXiv preprint arXiv:2008.07146 , year=

  43. [43]

    doi:10.1145/3511808.3557624 , year =

    Gao, Chongming and Li, Shijun and Zhang, Yuan and Chen, Jiawei and Li, Biao and Lei, Wenqiang and Jiang, Peng and He, Xiangnan , booktitle =. doi:10.1145/3511808.3557624 , year =

  44. [44]

    Proceedings of the AdKDD and TargetAd Workshop, KDD , year =

    A Large Scale Benchmark for Uplift Modeling , author =. Proceedings of the AdKDD and TargetAd Workshop, KDD , year =