arxiv: 2605.06479 · v1 · submitted 2026-05-07 · 📊 stat.ML · cs.LG· math.ST· stat.TH

Recognition: unknown

Risk-Controlled Post-Processing of Decision Policies

Edgar Dobriban, Hamed Hassani, Sunay Joshi, Tao Wang

Pith reviewed 2026-05-08 04:45 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.STstat.TH

keywords post-processingrisk controldecision policiesthreshold structurechance constraintsexcess riskcalibrationexchangeability

0 comments

The pith

A threshold-based post-processing step adjusts any baseline decision policy to meet a risk constraint while maximizing agreement with the original.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method to post-process an existing decision policy so that it satisfies a user-specified risk constraint on some loss function, while staying as close as possible to the original policy. At the population level, the optimal adjustment has a simple threshold structure: switch to a fallback policy only in contexts where the reduction in violation risk is large enough. For finite samples, they provide an algorithm that selects the threshold from calibration data and prove that the excess risk is small, of order log n over n. In special cases with a perfectly safe fallback, it achieves exact risk control. This matters because many deployed systems have policies that stakeholders do not want to replace entirely, but need to be made safer without random mixing.

Core claim

Given a baseline policy and a fallback policy, the risk-controlled post-processed policy follows the baseline except on contexts where switching to the oracle fallback yields a large reduction in conditional violation risk. The finite-sample algorithm selects a threshold from calibration data, achieving expected excess risk O(log n/n) under regularity conditions in the i.i.d. setting, and precise risk control under exchangeability when an exact-safe fallback is available.

What carries the argument

The threshold structure of the optimal policy, where the threshold is chosen based on the conditional violation risk reduction when switching to the oracle fallback policy.

If this is right

The post-processed policy maximizes agreement with the baseline under the risk constraint.
The expected excess risk is O(log n/n) under i.i.d. sampling and regularity conditions.
With an exact-safe fallback, precise expected risk control is achieved under exchangeability.
High-probability near-optimality guarantees hold in the exact-safe case.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This threshold approach might extend to other constrained policy optimization settings beyond risk control.
Practitioners could apply this to existing ML systems in healthcare or AI routing without full retraining.
The method assumes separate calibration data; if data is limited, the guarantees might require adjustments for data efficiency.

Load-bearing premise

The loss and score functions satisfy regularity conditions, the data is i.i.d. or exchangeable, and the fallback policy and score are pre-fitted on separate data.

What would settle it

An experiment where the post-processed policy's risk exceeds the allowed budget by more than the O(log n/n) term on i.i.d. data with regular losses would falsify the finite-sample guarantees.

Figures

Figures reproduced from arXiv: 2605.06479 by Edgar Dobriban, Hamed Hassani, Sunay Joshi, Tao Wang.

**Figure 1.** Figure 1: Workflow for risk-controlled post-processing. A fitted fallback policy and score view at source ↗

**Figure 2.** Figure 2: Synthetic multiclass experiment. The panels show violation risk, switch rate, view at source ↗

read the original abstract

Predictive models are often deployed through existing decision policies that stakeholders are reluctant to change unless a risk constraint requires intervention. We study risk-controlled post-processing: given a deterministic baseline policy, choose a new policy that maximizes agreement with the baseline subject to a chance constraint on a user-specified loss. At the population level, we show that the optimal policy has a threshold structure: it follows the baseline except on contexts where switching to the oracle fallback policy yields a large reduction in conditional violation risk. At the finite-sample level, given a fitted fallback policy and score, we develop a post-processing algorithm that uses calibration data to select a threshold. Leveraging tools from algorithmic stability and stochastic processes, we show that under regularity conditions, in the i.i.d. setting, the expected excess risk of the post-processed policy is $O(\log n/n)$. In the special case when an exact-safe fallback policy is available, the algorithm achieves precise expected risk control under exchangeability. In this setting, we also give high-probability near-optimality guarantees on the post-processed policy. Experiments on a COVID-19 radiograph diagnosis task, an LLM routing problem, and a synthetic multiclass decision task show that targeted post-processing can meet or nearly meet risk budgets while preserving substantially more agreement with the baseline than score-blind random mixing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper studies risk-controlled post-processing of decision policies: given a deterministic baseline policy, select a new policy maximizing agreement with the baseline subject to a user-specified chance constraint on loss. At the population level, the optimal policy has a threshold structure, following the baseline except where switching to an oracle fallback yields large conditional risk reduction. For finite samples, given pre-fitted fallback and score, a calibration-based threshold selection algorithm is proposed; under regularity conditions on loss/score and i.i.d. sampling, the expected excess risk is O(log n/n). When an exact-safe fallback is available, the method achieves precise expected risk control under exchangeability, plus high-probability near-optimality. Experiments on COVID-19 radiograph diagnosis, LLM routing, and synthetic multiclass tasks show risk budgets met while preserving high baseline agreement.

Significance. If the results hold, the work provides a principled, minimally invasive way to enforce risk constraints on deployed policies while maximizing fidelity to an existing baseline—an important practical need in high-stakes settings such as medical diagnosis and LLM routing. The population-level threshold characterization is clean and intuitive, directly following from the chance-constrained objective. The finite-sample O(log n/n) excess-risk bound and the exact-control result under exchangeability are non-trivial and leverage standard stability tools in a targeted way. The three experiments supply concrete evidence of practical utility. These elements together advance the literature on safe policy deployment with theoretical guarantees.

major comments (1)

[finite-sample analysis (around the statement of the O(log n/n) bound)] The O(log n/n) excess-risk bound is derived via algorithmic stability applied to threshold selection on calibration data. Please expand the key steps showing how the stability parameter of the threshold rule translates into this specific rate (including any dependence on the number of calibration points and the regularity conditions on the score function).

minor comments (2)

[problem setup and algorithm description] Clarify whether the fallback policy and score are required to be fitted on completely held-out data or whether any overlap with the calibration set is permitted; the current statement leaves this boundary condition implicit.
[special-case guarantees] In the exchangeability special case, the high-probability near-optimality guarantee should explicitly state the failure probability and its dependence on n.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive assessment of the work and for the constructive suggestion regarding the finite-sample analysis. We address the major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [finite-sample analysis (around the statement of the O(log n/n) bound)] The O(log n/n) excess-risk bound is derived via algorithmic stability applied to threshold selection on calibration data. Please expand the key steps showing how the stability parameter of the threshold rule translates into this specific rate (including any dependence on the number of calibration points and the regularity conditions on the score function).

Authors: We agree that additional detail on the stability argument would improve clarity. In the revised version we will expand the proof of the O(log n/n) bound (currently in Section 4.2) with the following steps. The threshold rule selects the largest tau such that the empirical violation probability on the n calibration points is at most the target level alpha. Under the regularity condition that the score function admits a density bounded above and below by positive constants in a neighborhood of the population-optimal threshold (Assumption 2), a change of one calibration point perturbs the empirical risk curve by at most 1/n. Because the density is bounded away from zero, the induced shift in the selected threshold is at most O((log n)/n) in expectation; this follows from a standard concentration argument on the number of calibration scores falling into an interval of width O((log n)/n) around the threshold. The policy risk is Lipschitz continuous in the threshold (by the bounded-loss assumption), so the algorithmic-stability lemma directly yields an expected excess-risk bound of O(beta) where beta = O(log n/n) is the stability parameter. The i.i.d. assumption enters only through the concentration inequalities used to control beta. We will insert a short auxiliary lemma stating these relations explicitly and will make the dependence on n and on the density bounds transparent. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper derives the population-level threshold structure directly from the chance-constrained objective by ranking contexts according to conditional risk reduction (a first-principles step with no self-reference). The finite-sample O(log n/n) excess-risk bound is obtained by applying external algorithmic-stability and stochastic-process tools to the threshold-selection procedure on calibration data; the bound is not fitted to the target risk and does not reduce to the algorithm's own outputs. Exact risk control under exchangeability in the special case follows from standard exchangeability arguments once an exact-safe fallback is given. All steps are conditioned on explicitly stated regularity and sampling assumptions, with no load-bearing self-citation chains, self-definitional loops, or renaming of fitted quantities as predictions. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard i.i.d. and regularity assumptions from statistical learning theory plus the existence of a pre-fitted fallback policy; no new free parameters or invented entities are introduced.

axioms (2)

domain assumption i.i.d. sampling and regularity conditions on loss and score functions
Invoked to obtain the O(log n/n) excess-risk bound via algorithmic stability and stochastic processes.
domain assumption existence of an oracle or exact-safe fallback policy and score
Required for the population-level threshold structure and for exact risk control under exchangeability.

pith-pipeline@v0.9.0 · 5540 in / 1342 out tokens · 52187 ms · 2026-05-08T04:45:38.396954+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

300 extracted references · 66 canonical work pages · 2 internal anchors

[1]

2016 , journal =

TorchVision: PyTorch's Computer Vision library , author =. 2016 , journal =

2016
[2]

2009 , pages =

Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Li, Kai and Fei-Fei, Li , booktitle =. 2009 , pages =

2009
[3]

The Annals of Mathematical Statistics , pages=

A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations , author=. The Annals of Mathematical Statistics , pages=. 1952 , publisher=

1952
[4]

IEEE Transactions on Information Theory , volume=

On optimum recognition error and reject tradeoff , author=. IEEE Transactions on Information Theory , volume=. 1970 , doi=

1970
[5]

International Conference on Algorithmic Learning Theory , pages=

Learning with rejection , author=. International Conference on Algorithmic Learning Theory , pages=. 2016 , organization=

2016
[6]

Machine Learning , volume=

Machine learning with a reject option: A survey , author=. Machine Learning , volume=
[7]

2026 , eprint=

Conformal Risk Control under Non-Monotone Losses: Theory and Finite-Sample Guarantees , author=. 2026 , eprint=

2026
[8]

2026 , eprint=

Conformal Risk Control for Non-Monotonic Losses , author=. 2026 , eprint=

2026
[9]

The Annals of Probability , pages=

The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality , author=. The Annals of Probability , pages=. 1990 , publisher=

1990
[10]

arXiv preprint arXiv:2512.24587 , year=

MultiRisk: Multiple Risk Control via Iterative Score Thresholding , author=. arXiv preprint arXiv:2512.24587 , year=

work page arXiv
[11]

Statistical methods in generative

Dobriban, Edgar , journal=. Statistical methods in generative
[12]

2014 , publisher=

Conformal prediction for reliable machine learning: theory, adaptations and applications , author=. 2014 , publisher=

2014
[13]

predict, then optimize

Smart “predict, then optimize” , author=. Management Science , volume=. 2022 , publisher=

2022
[14]

Annals of Operations Research , volume=

A survey of decision making and optimization under uncertainty , author=. Annals of Operations Research , volume=. 2021 , publisher=

2021
[15]

Conference on Learning Theory , pages=

Moment multicalibration for uncertainty estimation , author=. Conference on Learning Theory , pages=. 2021 , organization=

2021
[16]

Proceedings of the 25th ACM Conference on Economics and Computation , pages=

Forecasting for swap regret for all downstream agents , author=. Proceedings of the 25th ACM Conference on Economics and Computation , pages=
[17]

arXiv preprint arXiv:2305.12616 , year=

Conformal prediction with conditional guarantees , author=. arXiv preprint arXiv:2305.12616 , year=

work page arXiv
[18]

Games and Economic Behavior , volume=

Calibrated learning and correlated equilibrium , author=. Games and Economic Behavior , volume=. 1997 , publisher=

1997
[19]

Advances in Neural Information Processing Systems , volume=

Calibrating predictions to decisions: A novel approach to multi-class calibration , author=. Advances in Neural Information Processing Systems , volume=
[20]

1969 , publisher=

Optimization by vector space methods , author=. 1969 , publisher=

1969
[21]

Emanuel , title =

Ziad Obermeyer and Ezekiel J. Emanuel , title =. New England Journal of Medicine , volume =. 2016 , doi =. https://www.nejm.org/doi/pdf/10.1056/NEJMp1606181 , abstract =

work page doi:10.1056/nejmp1606181 2016
[22]

arXiv preprint arXiv:2310.17651 , year=

High-dimensional prediction for sequential decision making , author=. arXiv preprint arXiv:2310.17651 , year=

work page arXiv
[23]

Journal of derivatives , volume=

An overview of value at risk , author=. Journal of derivatives , volume=
[24]

Advances in Neural Information Processing Systems , volume=

Practical adversarial multivalid conformal prediction , author=. Advances in Neural Information Processing Systems , volume=
[25]

International Conference on Learning Representations (ICLR) , year=

Batch Multivalid Conformal Prediction , author=. International Conference on Learning Representations (ICLR) , year=
[26]

The International Journal of Robotics Research , year=

Autonomous Helicopter Aerobatics through Apprenticeship Learning , author=. The International Journal of Robotics Research , year=
[27]

The Journal of Finance 7(1):77--91, ://dx.doi.org/10.1111/j.1540-6261.1952.tb01525.x

Markowitz, Harry , title =. The Journal of Finance , volume =. doi:https://doi.org/10.1111/j.1540-6261.1952.tb01525.x , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1540-6261.1952.tb01525.x , year =

work page doi:10.1111/j.1540-6261.1952.tb01525.x 1952
[28]

Journal of Economic Perspectives 15(4), 143–156 (December 2001)

Koenker, Roger and Hallock, Kevin F. , Title =. Journal of Economic Perspectives , Volume =. 2001 , Month =. doi:10.1257/jep.15.4.143 , URL =

work page doi:10.1257/jep.15.4.143 2001
[29]

Oxford Economic Papers , volume=

Estimating nonlinear effects of fiscal policy using quantile regression methods , author=. Oxford Economic Papers , volume=. 2016 , publisher=

2016
[30]

2007 , publisher=

Value at Risk: The new benchmark for managing financial risk , author=. 2007 , publisher=

2007
[31]

The Review of Economic Studies , volume=

Quantile maximization in decision theory , author=. The Review of Economic Studies , volume=. 2010 , publisher=

2010
[32]

2013 , publisher=

An introduction to value-at-risk , author=. 2013 , publisher=

2013
[33]

Ieee Access , volume=

Can AI help in screening viral and COVID-19 pneumonia? , author=. Ieee Access , volume=. 2020 , publisher=

2020
[34]

IEEE Robotics and Automation Letters , year=

Safe planning in dynamic environments using conformal prediction , author=. IEEE Robotics and Automation Letters , year=
[35]

International Conference on Learning Representations , year=

Conformal risk control , author=. International Conference on Learning Representations , year=
[36]

The Annals of Applied Statistics , volume=

Learn then test: Calibrating predictive algorithms to achieve risk control , author=. The Annals of Applied Statistics , volume=. 2025 , doi=

2025
[37]

Utility-directed conformal prediction: A decision-aware framework for actionable uncertainty quantification.arXiv preprint arXiv:2410.01767,

Decision-Focused Uncertainty Quantification , author=. International Conference on Learning Representations , year=. 2410.01767 , archivePrefix=

work page arXiv
[38]

2024 IEEE International Conference on Robotics and Automation (ICRA) , pages=

Conformal decision theory: Safe autonomous decisions from imperfect predictions , author=. 2024 IEEE International Conference on Robotics and Automation (ICRA) , pages=. 2024 , organization=

2024
[39]

arXiv preprint arXiv:2403.20149 , year=

Conformal Prediction for Stochastic Decision-Making of PV Power in Electricity Markets , author=. arXiv preprint arXiv:2403.20149 , year=

work page arXiv
[40]

Proceedings of the 27th International Conference on Artificial Intelligence and Statistics , series=

Conformal contextual robust optimization , author=. Proceedings of the 27th International Conference on Artificial Intelligence and Statistics , series=. 2024 , publisher=

2024
[41]

Conformal and Probabilistic Prediction and Applications , pages=

Conformal uncertainty sets for robust optimization , author=. Conformal and Probabilistic Prediction and Applications , pages=. 2021 , organization=

2021
[42]

Formal Verification and Control with Conformal Prediction,

Formal verification and control with conformal prediction , author=. arXiv preprint arXiv:2409.00536 , year=

work page arXiv
[43]

arXiv preprint arXiv:2402.07407 , year=

Conformal Predictive Programming for Chance Constrained Optimization , author=. arXiv preprint arXiv:2402.07407 , year=

work page arXiv
[44]

Nature medicine , volume=

Clinical AI tools must convey predictive uncertainty for each individual patient , author=. Nature medicine , volume=. 2023 , publisher=

2023
[45]

Advances in Neural Information Processing Systems , volume=

Classification with valid and adaptive coverage , author=. Advances in Neural Information Processing Systems , volume=
[46]

Journal of the American Statistical Association , volume=

Least ambiguous set-valued classifiers with bounded error levels , author=. Journal of the American Statistical Association , volume=. 2019 , publisher=

2019
[47]

2005 , publisher=

Algorithmic learning in a random world , author=. 2005 , publisher=

2005
[48]

and Bates, Stephen , year =

Conformal prediction: A gentle introduction , author=. Foundations and Trends in Machine Learning , volume=. 2023 , publisher=. doi:10.1561/2200000101 , url=

work page doi:10.1561/2200000101 2023
[49]

arXiv preprint arXiv:2501.11413 , year=

Generalization and Informativeness of Weighted Conformal Risk Control Under Covariate Shift , author=. arXiv preprint arXiv:2501.11413 , year=

work page arXiv
[50]

arXiv preprint arXiv:2409.15844 , year=

Adaptive Learn-then-Test: Statistically Valid and Efficient Hyperparameter Selection , author=. arXiv preprint arXiv:2409.15844 , year=

work page arXiv
[51]

arXiv preprint arXiv:2406.17819 , year=

Automatically adaptive conformal risk control , author=. arXiv preprint arXiv:2406.17819 , year=

work page arXiv
[52]

arXiv preprint arXiv:2405.07976 , year=

Localized Adaptive Risk Control , author=. arXiv preprint arXiv:2405.07976 , year=

work page arXiv
[53]

2024 , url =

Jocher, Glenn and Qiu, Jing , title =. 2024 , url =

2024
[54]

arXiv preprint arXiv:2403.04670 , year=

End-to-end conditional robust optimization , author=. arXiv preprint arXiv:2403.04670 , year=

work page arXiv
[55]

Computers & Chemical Engineering , pages=

Data-driven contextual robust optimization based on support vector clustering , author=. Computers & Chemical Engineering , pages=. 2025 , publisher=

2025
[56]

Transactions on Machine Learning Research , year=

End-to-End Conformal Calibration for Optimization Under Uncertainty , author=. Transactions on Machine Learning Research , year=
[57]

2024 , school=

Non-Parameteric Conformal Distributionally Robust Optimization , author=. 2024 , school=

2024
[58]

arXiv preprint arXiv:2305.19225 , year=

Learning Decision-Focused Uncertainty Sets in Robust Optimization , author=. arXiv preprint arXiv:2305.19225 , year=

work page arXiv
[59]

Available at SSRN 4890089 , year=

Estimation and Prediction Procedures for Unified Robust Decision Models , author=. Available at SSRN 4890089 , year=
[60]

arXiv preprint arXiv:2402.01489 , year=

Conformal Inverse Optimization , author=. arXiv preprint arXiv:2402.01489 , year=

work page arXiv
[61]

Available at SSRN , year=

Conformal Inverse Optimization for Adherence-aware Prescriptive Analytics , author=. Available at SSRN , year=
[62]

ICML 2024 Workshop on Structured Probabilistic Inference \ & \ Generative Modeling , year=

Non-Parameteric Conformal Distributionally Robust Optimization , author=. ICML 2024 Workshop on Structured Probabilistic Inference \ & \ Generative Modeling , year=

2024
[63]

European Journal of Operational Research , volume=

Inverse optimization for the recovery of constraint parameters , author=. European Journal of Operational Research , volume=. 2020 , publisher=

2020
[64]

arXiv preprint arXiv:2304.06833 , year=

Estimate-then-optimize versus integrated-estimationoptimization: A stochastic dominance perspective , author=. arXiv preprint arXiv:2304.06833 , year=

work page arXiv
[65]

Operations Research , year=

Inverse optimization: Theory and applications , author=. Operations Research , year=
[66]

2015 , note =

NIHCE , title =. 2015 , note =

2015
[67]

2022 , note =

CDC , title =. 2022 , note =

2022
[68]

2020 , note =

CDC , title =. 2020 , note =

2020
[69]

2023 , note =

American College of Radiology , title =. 2023 , note =

2023
[70]

An official clinical practice guideline of the American Thoracic Society and Infectious Diseases Society of America , author=

Diagnosis and treatment of adults with community-acquired pneumonia. An official clinical practice guideline of the American Thoracic Society and Infectious Diseases Society of America , author=. American journal of respiratory and critical care medicine , volume=. 2019 , publisher=

2019
[71]

2021 , note =

WHO , title =. 2021 , note =

2021
[72]

2020 , note =

WHO , title =. 2020 , note =

2020
[73]

Radiology , volume=

The role of chest imaging in patient management during the COVID-19 pandemic: a multinational consensus statement from the Fleischner Society , author=. Radiology , volume=. 2020 , publisher=

2020
[74]

Journal of Healthcare Informatics Research , volume=

Conformal prediction in clinical medical sciences , author=. Journal of Healthcare Informatics Research , volume=. 2022 , publisher=

2022
[75]

Conformal and Probabilistic Prediction and Applications , pages=

Conformal predictive decision making , author=. Conformal and Probabilistic Prediction and Applications , pages=. 2018 , organization=

2018
[76]

The Annals of Mathematical Statistics , number =

Wald, Abraham , doi =. The Annals of Mathematical Statistics , number =
[77]

Wilks, S. S. , doi =. The Annals of Mathematical Statistics , number =
[78]

Statistically equivalent blocks and multivariate tolerance regions--the discontinuous case , author=

Nonparametric estimation, III. Statistically equivalent blocks and multivariate tolerance regions--the discontinuous case , author=. The Annals of Mathematical Statistics , pages=. 1948 , publisher=

1948
[79]

Statistically Equivalent Blocks and Tolerance Regions--the Continuous Case , author=

Non-Parametric Estimation Ii. Statistically Equivalent Blocks and Tolerance Regions--the Continuous Case , author=. The Annals of Mathematical Statistics , pages=. 1947 , publisher=

1947
[80]

Non-parametric estimation. I. Validation of order statistics , author=. The Annals of Mathematical Statistics , volume=. 1945 , publisher=

1945

Showing first 80 references.