Decision-Calibrated Conformal Uncertainty for Pacing Decisions in Streaming Advertising
Pith reviewed 2026-06-27 14:24 UTC · model grok-4.3
The pith
A conformal uncertainty score for pacing decisions is the smallest one that protects every possible policy with finite-sample guarantees.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The proposed score is the smallest valid uncertainty measure that uniformly protects all deployable pacing policies. Geometrically, it is the support function of the signed policy sensitivity set. Split conformal calibration gives finite-sample coverage for this score. A high-dimensional separation theorem shows that traditional residual calibration can be arbitrarily more conservative by paying for nuisance inventory dimensions. A robust pacing result combines inventory, response, and experience uncertainty.
What carries the argument
The decision-calibrated score defined as the support function of the signed policy sensitivity set, which measures the worst-case impact on deployable policies.
If this is right
- Traditional residual-based calibration incurs unnecessary conservatism from high-dimensional inventory uncertainties that do not affect policies.
- The method produces separate uncertainty margins for value, delivery, budget, and member-experience load.
- On Criteo replays, it reduces held-out violation rates from 16.7% to 3.3% with no budget or member-load violations.
- It certifies less aggressive pacing policies than point-forecast baselines while maintaining coverage.
- Uncertainty radii drop from over 7000 to under 20 on Criteo and from 4600 to under 280 on KuaiRand.
Where Pith is reading between the lines
- This calibration strategy could apply to other sequential decisions where only certain policies are feasible, such as in resource allocation or bidding.
- If the policy sensitivity set changes over time, online updates to the set might be needed to keep the score tight.
- The geometric interpretation suggests connections to robust optimization where sensitivity sets define the uncertainty.
- Further work could test whether this leads to better overall revenue or user experience in live systems.
Load-bearing premise
One must be able to accurately describe the set of policies that could be used in order to compute the relevant sensitivity set.
What would settle it
Observe whether adding a high-dimensional inventory variable that does not affect any policy causes traditional residual radii to increase arbitrarily while the decision-calibrated radii stay fixed.
Figures
read the original abstract
We develop a decision-calibrated conformal framework for pacing decisions in streaming advertising. Pacing depends on uncertain future inventory, demand pressure, incremental response, and member-experience load. Instead of calibrating a generic forecast residual, the framework measures forecast error by its largest impact on the policies that could actually be deployed. The main theorem shows that the proposed score is the smallest valid uncertainty measure that uniformly protects all deployable pacing policies. Geometrically, it is the support function of the signed policy sensitivity set. Split conformal calibration gives finite-sample coverage for this score. A high-dimensional separation theorem shows that traditional residual calibration can be arbitrarily more conservative by paying for nuisance inventory dimensions, and a robust pacing result combines inventory, response, and experience uncertainty. On public-data-calibrated pacing replays built from Criteo Uplift and KuaiRand datasets, traditional conformal pacing remains unresolved with high residual radii of 7236.7 on Criteo and 4629.4 on KuaiRand. With the proposed decision calibration approach, the uncertainty radii are reduced to 18.4 and 278.6 respectively, with separate margins for value, delivery, budget, and member load. On Criteo, the proposed method certifies a less aggressive pacing policy than the point-forecast baseline, and reduces held-out any-violation rate from 16.7% to 3.3%, with zero budget and member-load violations. On KuaiRand, the choice remains unresolved. In a nutshell, the paper establishes that forecasts, response estimates, and member-experience models should be judged by whether they shrink the uncertainty that the pacing decision uses, as this leads to confident decisions that are not overly conservative.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops a decision-calibrated conformal framework for pacing decisions in streaming advertising. Uncertainty is defined geometrically as the support function of the signed policy sensitivity set rather than a generic forecast residual; the main theorem establishes that this score is the smallest valid uncertainty measure that uniformly protects all deployable pacing policies. Split conformal calibration is claimed to deliver finite-sample coverage. Additional results include a high-dimensional separation theorem showing that residual-based calibration can be arbitrarily conservative and a robust combination of inventory, response, and experience uncertainties. Experiments on Criteo Uplift and KuaiRand replays report large reductions in uncertainty radii (from 7236.7 to 18.4 on Criteo; 4629.4 to 278.6 on KuaiRand) and lower any-violation rates.
Significance. If the central guarantees hold, the work provides a principled way to calibrate uncertainty directly to the downstream pacing decision rather than to nuisance dimensions, yielding less conservative yet still protected policies. The geometric characterization via support functions and the explicit comparison to traditional conformal methods are clear strengths; the empirical reductions in radii and violation rates on public-data replays illustrate practical impact.
major comments (2)
- [Main Theorem] Main Theorem (section containing the coverage claim for the support function): the finite-sample coverage guarantee is obtained via standard split conformal and therefore requires exchangeability between calibration and test points. The streaming setting (sequential inventory, demand, response, and member-experience realizations) introduces temporal dependence that is not addressed by the high-dimensional separation result or the robust pacing combination; no argument is given that these restore exchangeability or that a time-series conformal variant is used. This directly affects the uniform protection claim over all deployable policies.
- [High-dimensional separation theorem] High-dimensional separation theorem (section stating the result on nuisance inventory dimensions): the claim that traditional residual calibration can be arbitrarily more conservative is load-bearing for the motivation, yet the quantitative separation is shown only under the paper's geometric construction; it is unclear whether the same separation holds once temporal dependence is acknowledged, which would weaken the comparative advantage.
minor comments (2)
- Notation for the signed policy sensitivity set and its support function should be introduced with an explicit definition before the main theorem to improve readability.
- [Experiments] The experimental section reports radii and violation rates but does not include confidence intervals or statistical tests on the held-out any-violation reduction (16.7% to 3.3% on Criteo); adding these would strengthen the empirical claims.
Simulated Author's Rebuttal
We thank the referee for highlighting the exchangeability requirement in the main theorem and its implications for the high-dimensional separation result in the streaming setting. We address each major comment below.
read point-by-point responses
-
Referee: [Main Theorem] Main Theorem (section containing the coverage claim for the support function): the finite-sample coverage guarantee is obtained via standard split conformal and therefore requires exchangeability between calibration and test points. The streaming setting (sequential inventory, demand, response, and member-experience realizations) introduces temporal dependence that is not addressed by the high-dimensional separation result or the robust pacing combination; no argument is given that these restore exchangeability or that a time-series conformal variant is used. This directly affects the uniform protection claim over all deployable policies.
Authors: We agree that the finite-sample coverage guarantee relies on the standard exchangeability assumption between calibration and test points for split conformal, which is not restored by the separation result or robust combination in the manuscript. The streaming nature of the data introduces temporal dependence that is not addressed. We will revise the paper to explicitly state the exchangeability assumption as a modeling choice and discuss it as a limitation, including potential extensions to time-series conformal variants. revision: yes
-
Referee: [High-dimensional separation theorem] High-dimensional separation theorem (section stating the result on nuisance inventory dimensions): the claim that traditional residual calibration can be arbitrarily more conservative is load-bearing for the motivation, yet the quantitative separation is shown only under the paper's geometric construction; it is unclear whether the same separation holds once temporal dependence is acknowledged, which would weaken the comparative advantage.
Authors: The separation theorem is established under the geometric construction and the assumptions supporting the main theorem, including exchangeability. We acknowledge that temporal dependence could affect whether the same quantitative separation holds. We will revise to add a discussion noting this caveat and its potential impact on the comparative advantage over residual-based methods. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper defines the proposed score geometrically as the support function of the signed policy sensitivity set and separately invokes standard split conformal calibration to obtain finite-sample coverage. The main theorem asserts this score is the smallest valid uncertainty measure protecting all deployable policies, but this follows from the geometric construction and conformal properties without reducing to self-definition, fitted inputs renamed as predictions, or load-bearing self-citations. No ansatz smuggling, uniqueness theorems from prior author work, or renaming of known results is present in the derivation chain. The framework remains self-contained with independent content from external conformal theory and empirical replays.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Split conformal calibration provides finite-sample coverage for the decision-calibrated score.
Reference graph
Works this paper leans on
-
[1]
Journal of the ACM , volume =
AdWords and Generalized Online Matching , author =. Journal of the ACM , volume =
-
[2]
and Hayes, Thomas P
Devanur, Nikhil R. and Hayes, Thomas P. , booktitle =. The
-
[3]
Proceedings of the 40th International Conference on Machine Learning , series =
Robust Budget Pacing with a Single Sample , author =. Proceedings of the 40th International Conference on Machine Learning , series =
-
[4]
arXiv preprint arXiv:2205.08674 , eprint =
Budget Pacing in Repeated Auctions: Regret and Efficiency without Convergence , author =. arXiv preprint arXiv:2205.08674 , eprint =
-
[5]
arXiv preprint arXiv:2301.02276 , eprint =
Statistical Inference and A/B Testing for First-Price Pacing Equilibria , author =. arXiv preprint arXiv:2301.02276 , eprint =
-
[6]
arXiv preprint arXiv:2402.07322 , eprint =
Interference Among First-Price Pacing Equilibria: A Bias and Variance Analysis , author =. arXiv preprint arXiv:2402.07322 , eprint =
-
[7]
arXiv preprint arXiv:2112.15155 , eprint =
Auction Throttling and Causal Inference of Online Advertising Effects , author =. arXiv preprint arXiv:2112.15155 , eprint =
-
[8]
Proceedings of Thirty Seventh Conference on Learning Theory , series =
Autobidders with Budget and ROI Constraints: Efficiency, Regret, and Pacing Dynamics , author =. Proceedings of Thirty Seventh Conference on Learning Theory , series =
-
[9]
AAAI Conference on Artificial Intelligence , volume =
Percentile Risk-Constrained Budget Pacing for Guaranteed Display Advertising in Online Optimization , author =. AAAI Conference on Artificial Intelligence , volume =
-
[10]
arXiv preprint arXiv:2208.12809 , eprint =
Incrementality Bidding and Attribution , author =. arXiv preprint arXiv:2208.12809 , eprint =
-
[11]
Advances in Neural Information Processing Systems , year =
Incrementality Bidding via Reinforcement Learning under Mixed and Delayed Rewards , author =. Advances in Neural Information Processing Systems , year =
-
[12]
Proceedings of the Tenth ACM International Conference on Web Search and Data Mining , pages =
Real-Time Bidding by Reinforcement Learning in Display Advertising , author =. Proceedings of the Tenth ACM International Conference on Web Search and Data Mining , pages =. doi:10.1145/3018661.3018702 , year =
-
[13]
Real-Time Bidding with Multi-Agent Reinforcement Learning in Display Advertising , author =. Proceedings of the 27th ACM International Conference on Information and Knowledge Management , pages =. doi:10.1145/3269206.3272021 , year =
-
[14]
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages =
Deep Reinforcement Learning for Sponsored Search Real-time Bidding , author =. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages =
-
[15]
Salinas, David and Flunkert, Valentin and Gasthaus, Jan , journal =. 1704.04110 , archivePrefix =
-
[16]
and Carpov, Dmitri and Chapados, Nicolas and Bengio, Yoshua , booktitle =
Oreshkin, Boris N. and Carpov, Dmitri and Chapados, Nicolas and Bengio, Yoshua , booktitle =
-
[17]
International Journal of Forecasting , volume =
Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting , author =. International Journal of Forecasting , volume =
-
[18]
Proceedings of the AAAI conference on artificial intelligence , volume=
Informer: Beyond efficient transformer for long sequence time-series forecasting , author=. Proceedings of the AAAI conference on artificial intelligence , volume=
-
[19]
Advances in neural information processing systems , volume=
Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting , author=. Advances in neural information processing systems , volume=
-
[20]
International Conference on Learning Representations , year =
A Time Series is Worth 64 Words: Long-Term Forecasting with Transformers , author =. International Conference on Learning Representations , year =
-
[21]
arXiv preprint arXiv:2403.07815 , eprint =
Chronos: Learning the Language of Time Series , author =. arXiv preprint arXiv:2403.07815 , eprint =
-
[22]
Advances in neural information processing systems , volume=
Conformalized quantile regression , author=. Advances in neural information processing systems , volume=
-
[23]
Advances in Neural Information Processing Systems , year =
Adaptive Conformal Inference under Distribution Shift , author =. Advances in Neural Information Processing Systems , year =
-
[24]
Proceedings of the 38th International Conference on Machine Learning , series =
Conformal Prediction Interval for Dynamic Time-Series , author =. Proceedings of the 38th International Conference on Machine Learning , series =
-
[25]
The Annals of Applied Statistics , volume=
Learn then test: Calibrating predictive algorithms to achieve risk control , author=. The Annals of Applied Statistics , volume=. 2025 , publisher=
2025
-
[26]
Management Science , volume =
Smart ``Predict, then Optimize'' , author =. Management Science , volume =
-
[27]
Proceedings of the Tenth Symposium on Conformal and Probabilistic Prediction and Applications , series =
Conformal Uncertainty Sets for Robust Optimization , author =. Proceedings of the Tenth Symposium on Conformal and Probabilistic Prediction and Applications , series =. 2021 , publisher =
2021
-
[28]
Advances in neural information processing systems , volume=
Predict-then-calibrate: A new perspective of robust contextual lp , author=. Advances in neural information processing systems , volume=
-
[29]
International Conference on Learning Representations , year =
Utility-Directed Conformal Prediction: A Decision-Aware Framework for Actionable Uncertainty Quantification , author =. International Conference on Learning Representations , year =
-
[30]
Proceedings of the 28th International Conference on Machine Learning , pages=
Doubly robust policy evaluation and learning , author=. Proceedings of the 28th International Conference on Machine Learning , pages=
-
[31]
International conference on machine learning , pages=
Counterfactual risk minimization: Learning from logged bandit feedback , author=. International conference on machine learning , pages=. 2015 , organization=
2015
-
[32]
The Econometrics Journal , volume =
Double/Debiased Machine Learning for Treatment and Structural Parameters , author =. The Econometrics Journal , volume =
-
[33]
The Annals of Statistics , volume =
Generalized Random Forests , author =. The Annals of Statistics , volume =
-
[34]
International conference on machine learning , pages=
Estimating individual treatment effect: generalization bounds and algorithms , author=. International conference on machine learning , pages=. 2017 , organization=
2017
-
[35]
Journal of the ACM (JACM) , volume=
Bandits with knapsacks , author=. Journal of the ACM (JACM) , volume=. 2018 , publisher=
2018
-
[36]
International Conference on Machine Learning , pages=
Adaptive algorithms for online convex optimization with long-term constraints , author=. International Conference on Machine Learning , pages=. 2016 , organization=
2016
-
[37]
Mathematical Programming , volume =
Extending the Scope of Robust Optimization , author =. Mathematical Programming , volume =
-
[38]
SIAM Review , volume =
Theory and Applications of Robust Optimization , author =. SIAM Review , volume =
-
[39]
Proceedings of the 1st Workshop on Deep Learning for Recommender Systems , pages =
Wide & Deep Learning for Recommender Systems , author =. Proceedings of the 1st Workshop on Deep Learning for Recommender Systems , pages =. doi:10.1145/2988450.2988454 , year =
-
[40]
Guo, Huifeng and Tang, Ruiming and Ye, Yunming and Li, Zhenguo and He, Xiuqiang , booktitle =
-
[41]
Deep Interest Network for Click-Through Rate Prediction , author =. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages =. doi:10.1145/3219819.3219823 , year =
-
[42]
arXiv preprint arXiv:2008.07146 , year=
Open bandit dataset and pipeline: Towards realistic and reproducible off-policy evaluation , author=. arXiv preprint arXiv:2008.07146 , year=
arXiv 2008
-
[43]
doi:10.1145/3511808.3557624 , year =
Gao, Chongming and Li, Shijun and Zhang, Yuan and Chen, Jiawei and Li, Biao and Lei, Wenqiang and Jiang, Peng and He, Xiangnan , booktitle =. doi:10.1145/3511808.3557624 , year =
-
[44]
Proceedings of the AdKDD and TargetAd Workshop, KDD , year =
A Large Scale Benchmark for Uplift Modeling , author =. Proceedings of the AdKDD and TargetAd Workshop, KDD , year =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.