arxiv: 2605.12235 · v1 · submitted 2026-05-12 · 📊 stat.ML · cs.LG

Recognition: no theorem link

Optimal Policy Learning under Budget and Coverage Constraints

Giovanni Cerulli

Pith reviewed 2026-05-13 04:30 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords policy learningbudget constraintscoverage constraintsknapsack structureintegrality gaplinear programminggreedy algorithmthreshold rule

0 comments

The pith

The optimal policy under budget and coverage constraints is characterized by an affine threshold rule with an O(1) integrality gap in its LP relaxation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows how to find the best policy when you have a limited budget and must cover a minimum number of cases. It proves that the problem has a simple knapsack-like form, so the best policy is an affine threshold rule based on the prices of the two constraints. Because the continuous relaxation has only a constant gap from the integer solution, it becomes exactly optimal in large samples. Two easy-to-run algorithms are given, one of which works well in general and the other when costs are similar or the coverage rule is not tight. Simulations back up the theory.

Core claim

We show that the problem admits a knapsack-type structure and that the optimal policy can be characterized by an affine threshold rule involving both budget and coverage shadow prices. We establish that the linear programming relaxation of the combinatorial solution has an O(1) integrality gap, implying asymptotic equivalence with the optimal discrete allocation. Building on this result, we analyze two implementable approaches: a Greedy-Lagrangian (GLC) and a rank-and-cut (RC) algorithm. We show that the GLC closely approximates the optimal solution and achieves near-optimal performance in finite samples. By contrast, RC is approximately optimal whenever the coverage constraint is slack or 0

What carries the argument

Knapsack-type structure that permits characterization of the optimal policy by an affine threshold rule involving budget and coverage shadow prices.

Load-bearing premise

The optimization problem has a special structure that makes the best policy follow a simple rule based on the costs of violating the budget and coverage limits.

What would settle it

A family of instances where the gap between the LP relaxation value and the true optimal discrete policy grows without bound as the instance size increases.

Figures

Figures reproduced from arXiv: 2605.12235 by Giovanni Cerulli.

**Figure 2.** Figure 2: Per-capita regret of GLC and integrality gap of LP as a function of sample size. The solid [PITH_FULL_IMAGE:figures/full_fig_p036_2.png] view at source ↗

read the original abstract

We study optimal policy learning under combined budget and minimum coverage constraints. We show that the problem admits a knapsack-type structure and that the optimal policy can be characterized by an affine threshold rule involving both budget and coverage shadow prices. We establish that the linear programming relaxation of the combinatorial solution has an O(1) integrality gap, implying asymptotic equivalence with the optimal discrete allocation. Building on this result, we analyze two implementable approaches: a Greedy-Lagrangian (GLC) and a rank-and-cut (RC) algorithm. We show that the GLC closely approximates the optimal solution and achieves near-optimal performance in finite samples. By contrast, RC is approximately optimal whenever the coverage constraint is slack or costs are homogeneous, while misallocation arises only when cost heterogeneity interacts with a binding coverage constraint. Monte Carlo evidence supports these findings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper cleanly handles dual constraints in policy learning with an affine threshold and an O(1) integrality gap for the LP relaxation.

read the letter

The punchline for this paper is that it shows how to solve for the optimal policy when you have both a budget limit and a minimum coverage requirement by using an affine threshold rule based on the shadow prices of the two constraints. It also proves that the linear programming relaxation has only an O(1) integrality gap, so the relaxed solution is asymptotically as good as the discrete one. This is new in the way it combines the two constraints in one characterization and then compares the practical algorithms. The Greedy-Lagrangian method turns out to be robust, while the rank-and-cut method runs into trouble only when costs are heterogeneous and the coverage constraint is active. That kind of conditional analysis is helpful for practitioners. The paper does well with the theoretical guarantee. The integrality gap being constant rather than growing with problem size is a strong result, and it comes from applying standard rounding techniques to the bounded reward and cost terms. The Monte Carlo experiments are used to show that the near-optimality holds in finite samples as claimed. One soft spot is that the conditions for the knapsack-type structure are mentioned but not fully unpacked in the provided abstract. It would be good to see more on how the policy class or the data process affects whether the affine rule applies exactly. That said, the stress-test note indicates that no major inconsistency shows up in the stated results. This paper is aimed at researchers in statistical machine learning who deal with constrained policy learning, especially in settings like targeted treatment allocation or budget-limited interventions. A reader who needs both theory and implementable methods for these problems would find it worthwhile. The work shows clear thinking and honest engagement with the literature on the topic. It deserves a serious referee because the claims are specific enough to be checked and the setting has practical bite. I would recommend sending it out for peer review.

Referee Report

0 major / 3 minor

Summary. The paper studies optimal policy learning under combined budget and minimum coverage constraints. It shows that the problem admits a knapsack-type structure and that the optimal policy can be characterized by an affine threshold rule involving both budget and coverage shadow prices. The authors establish that the linear programming relaxation of the combinatorial solution has an O(1) integrality gap, implying asymptotic equivalence with the optimal discrete allocation. They analyze two implementable approaches: a Greedy-Lagrangian (GLC) algorithm that closely approximates the optimal solution and achieves near-optimal performance in finite samples, and a rank-and-cut (RC) algorithm that is approximately optimal when the coverage constraint is slack or costs are homogeneous. Monte Carlo evidence is provided to support these findings.

Significance. If the O(1) integrality gap and affine threshold characterization hold, the work provides a theoretically grounded approach to constrained policy optimization that is relevant for applications involving resource allocation under multiple constraints. The distinction between the performance of GLC and RC under different regimes offers practical guidance, and the asymptotic equivalence result strengthens the case for using LP-based approximations in large-scale settings.

minor comments (3)

The abstract states the O(1) integrality gap but does not specify whether it is additive or multiplicative; this should be clarified with a precise statement and reference to the relevant theorem in the main text.
The Monte Carlo experiments are mentioned as supporting the findings, but additional details on the data-generating process, policy class, and specific metrics used to evaluate GLC vs. RC would improve reproducibility and clarity.
Notation for the shadow prices and the affine threshold rule should be introduced consistently in the main body before the algorithmic sections to aid readability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the careful reading and positive assessment of our manuscript. The summary accurately captures the main contributions regarding the affine threshold characterization, O(1) integrality gap, and the comparative analysis of the GLC and RC algorithms. We are pleased that the referee finds the work relevant for constrained policy optimization and recommends minor revision.

Circularity Check

0 steps flagged

No significant circularity; derivation follows standard IP/LP theory from problem formulation

full rationale

The paper's core claims—the knapsack-type structure, affine threshold characterization via shadow prices, and O(1) integrality gap for the LP relaxation—are derived directly from the given budget-plus-coverage constrained optimization problem using classical results on knapsack problems and bounded integrality gaps for integer programs with linear relaxations. No parameters are fitted to data and then renamed as predictions; no self-citations are invoked as load-bearing uniqueness theorems; the threshold rule and gap bound follow from the primal-dual structure and standard rounding arguments without self-referential definitions or ansatzes smuggled via prior work. Monte Carlo evidence is presented separately as validation, not as part of the derivation chain. The result is self-contained against external benchmarks in combinatorial optimization.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the problem having a knapsack-type structure and the validity of the LP relaxation analysis; no free parameters or invented entities are mentioned.

axioms (1)

domain assumption The problem admits a knapsack-type structure
Invoked to enable the affine threshold characterization with shadow prices.

pith-pipeline@v0.9.0 · 5428 in / 1146 out tokens · 33316 ms · 2026-05-13T04:30:00.117807+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

[1]

Tyrrell Rockafellar , title =

R. Tyrrell Rockafellar , title =

work page
[2]

SIAM Review , volume =

Juan Pablo Vielma , title =. SIAM Review , volume =. 2015 , doi =

work page 2015
[3]

Dantzig , title =

George B. Dantzig , title =

work page
[4]

Nemhauser and Laurence A

George L. Nemhauser and Laurence A. Wolsey , title =

work page
[5]

Wolsey , title =

Laurence A. Wolsey , title =

work page
[6]

Alexander Schrijver , title =

work page
[7]

, title =

Manski, Charles F. , title =. Econometrica , volume =

work page
[8]

Econometrica , volume =

Kitagawa, Toru and Tetenov, Aleksey , title =. Econometrica , volume =

work page
[9]

Econometrica , volume =

Athey, Susan and Wager, Stefan , title =. Econometrica , volume =. 2021 , doi =

work page 2021
[10]

Journal of Econometrics , volume =

Bhattacharya, Debopam and Dupas, Pascaline , title =. Journal of Econometrics , volume =. 2012 , doi =

work page 2012
[11]

, title =

Hirano, Keisuke and Porter, Jack R. , title =. Econometrica , volume =

work page
[12]

Minimax Regret Treatment Choice with Finite Samples , journal =

Stoye, J. Minimax Regret Treatment Choice with Finite Samples , journal =

work page
[13]

Minimax Regret Treatment Choice with Covariates , journal =

Stoye, J. Minimax Regret Treatment Choice with Covariates , journal =

work page
[14]

John and Kosorok, Michael R

Zhao, Yingqi and Zeng, Donglin and Rush, A. John and Kosorok, Michael R. , title =. Journal of the American Statistical Association , volume =

work page
[15]

Doubly Robust Policy Evaluation and Learning , booktitle =

Dud. Doubly Robust Policy Evaluation and Learning , booktitle =

work page
[16]

Advances in Neural Information Processing Systems , year =

Kallus, Nathan , title =. Advances in Neural Information Processing Systems , year =

work page
[17]

, title =

Tsybakov, Alexandre B. , title =. Annals of Statistics , volume =

work page
[18]

, title =

Audibert, Jean-Yves and Tsybakov, Alexandre B. , title =. Annals of Statistics , volume =

work page
[19]

European Journal of Operational Research , volume =

Bengio, Yoshua and Lodi, Andrea and Prouvost, Antoine , title =. European Journal of Operational Research , volume =. 2021 , doi =

work page 2021
[20]

arXiv preprint arXiv:2203.02878 , year =

Zhang, Yu and Dietterich, Thomas and Dilkina, Bistra , title =. arXiv preprint arXiv:2203.02878 , year =

work page arXiv
[21]

arXiv preprint , year =

Garn, Wolfgang and Amirghasemi, Mahdi , title =. arXiv preprint , year =

work page
[22]

The Econometrics Journal , volume =

Carneiro, Pedro and Lee, Sokbae and Wilhelm, Daniel , title =. The Econometrics Journal , volume =. 2020 , doi =

work page 2020
[23]

The Oxford Handbook of Bayesian Econometrics , publisher =

Chamberlain, Gary , title =. The Oxford Handbook of Bayesian Econometrics , publisher =. 2011 , pages =

work page 2011
[24]

and Diebold, Francis X

Christoffersen, Peter F. and Diebold, Francis X. , title =. Economic Theory , volume =

work page
[25]

, title =

Dehejia, Rajeev H. , title =. Journal of Econometrics , volume =. 2005 , doi =

work page 2005
[26]

arXiv preprint arXiv:1905.10116 , year =

Demirer, Mert and Syrgkanis, Vasilis and Lewis, Greg and Chernozhukov, Victor , title =. arXiv preprint arXiv:1905.10116 , year =

work page arXiv 1905
[27]

, title =

van der Vaart, Aad W. , title =

work page
[28]

Education Policy in Developing Countries , publisher =

Dhailiwal, Iqbal and Duflo, Esther and Glennerster, Rachel and Tulloch, Caitlin , title =. Education Policy in Developing Countries , publisher =

work page
[29]

New England Journal of Medicine , volume =

Finkelstein, Amy and Gentzkow, Matthew and Hull, Peter and Williams, Heidi , title =. New England Journal of Medicine , volume =. 2017 , doi =

work page 2017
[30]

, title =

Finkelstein, Amy and Notowidigdo, Matthew J. , title =. Quarterly Journal of Economics , volume =. 2019 , doi =

work page 2019
[31]

Quarterly Journal of Economics , volume =

Hendren, Nathaniel and Sprung-Keyser, Ben , title =. Quarterly Journal of Economics , volume =

work page
[32]

arXiv preprint arXiv:2205.08586 , year =

Kitagawa, Toru and Lee, Sokbae and Qiu, Chen , title =. arXiv preprint arXiv:2205.08586 , year =

work page arXiv
[33]

arXiv preprint arXiv:2401.17909 , year =

Kock, Anders Bredahl and Preinerstorfer, David , title =. arXiv preprint arXiv:2401.17909 , year =

work page arXiv
[34]

Econometrica , volume =

Mbakop, Eric and Tabord-Meehan, Marten , title =. Econometrica , volume =

work page
[35]

arXiv preprint arXiv:2103.11066 , year =

Sun, Huanan and Munro, Ewan and Kalashnov, Gleb and Du, Shuo and Wager, Stefan , title =. arXiv preprint arXiv:2103.11066 , year =

work page arXiv
[36]

arXiv preprint arXiv:2103.15298 , year =

Sun, Liyang , title =. arXiv preprint arXiv:2103.15298 , year =

work page arXiv
[37]

and Wellner, Jon A

van der Vaart, Aad W. and Wellner, Jon A. , title =

work page
[38]

Journal of the American Statistical Association , volume =

Viviano, Davide and Bradic, Jelena , title =. Journal of the American Statistical Association , volume =. 2024 , doi =

work page 2024
[39]

arXiv preprint arXiv:2111.04926 , year =

Yata, Kohei , title =. arXiv preprint arXiv:2111.04926 , year =

work page arXiv
[40]

Journal of Econometrics , volume =

Sun, Liyang , title =. Journal of Econometrics , volume =. 2026 , doi =

work page 2026