arxiv: 2605.06520 · v1 · submitted 2026-05-07 · 💻 cs.GT · cs.LG· cs.MA· stat.ME

Recognition: unknown

Optimizing Social Utility in Sequential Experiments

Ander Artola Velasco , Stratis Tsirtsis , Manuel Gomez-Rodriguez

Authors on Pith no claims yet

Pith reviewed 2026-05-08 04:08 UTC · model grok-4.3

classification 💻 cs.GT cs.LGcs.MAstat.ME

keywords sequential experimentssocial utilitysubsidy designbelief Markov decision processdynamic programmingregulatory trialsantibiotic developmentpiecewise linear convexity

0 comments

The pith

A sequential trial protocol with targeted subsidies lets regulators increase social utility from risky product development by more than 35 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes allowing developers to run randomized controlled trials in stages rather than committing to a full fixed-length study upfront, with the regulator offering a partial subsidy to cover some of the costs. This addresses the problem that high trial expenses can discourage developers from pursuing products with uncertain but potentially high social value. The authors model the developer's choice of when to continue or stop as a belief Markov decision process that can be solved with dynamic programming, and they prove that the overall social utility is a piecewise linear convex function of the chosen subsidy level. As a result, the regulator can locate the socially best subsidy amount efficiently with a divide-and-conquer procedure. Simulations on public antibiotic data show that the protocol raises social utility by more than 35 percent compared with conventional non-sequential trials.

Core claim

We introduce a statistical protocol for experimentation where the product developer (the agent) conducts a randomized controlled trial sequentially and the regulator (the principal) partially subsidizes its cost. By modeling the protocol using a belief Markov decision process, we show that the agent's optimal strategy can be found efficiently using dynamic programming. Further, we show that the social utility is a piecewise linear and convex function over the subsidy level the principal selects, and thus the socially optimal subsidy can also be found efficiently using divide-and-conquer. Simulation experiments using publicly available data on antibiotic development and approval demonstrate a

What carries the argument

A belief Markov decision process that represents the developer's sequential decisions on whether to continue or stop a trial, solved by dynamic programming for the agent's policy and by convexity analysis to optimize the regulator's subsidy level.

If this is right

Regulators obtain an efficient algorithm to select the subsidy level that maximizes social utility for any given product profile.
Developers become willing to pursue more high-uncertainty projects whose expected social value exceeds their private cost.
Resources are shifted away from low-value products because sequential stopping rules avoid completing expensive trials when interim evidence is weak.
The same modeling approach can be reused for other regulatory domains that require statistical evidence before approval.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If developers receive additional private signals not included in the public belief model, the realized utility gains may be smaller or larger than the simulations predict.
Allowing the subsidy to depend on interim results rather than being fixed in advance could further increase social utility.
The piecewise-linear convexity property implies that modest errors in estimating the optimal subsidy produce only small losses, which aids practical implementation.

Load-bearing premise

That real developers update their beliefs and make continuation decisions under uncertainty in the same way the belief Markov decision process assumes.

What would settle it

A direct comparison between the trial-stopping thresholds chosen by actual developers under offered subsidies and the thresholds computed by the dynamic programming solution on the same product parameters.

Figures

Figures reproduced from arXiv: 2605.06520 by Ander Artola Velasco, Manuel Gomez-Rodriguez, Stratis Tsirtsis.

**Figure 2.** Figure 2: Rejection regions in belief space. The figure shows, for each agent belief with parameters (α, β), whether the condition f(α, β) ≥ 1/κ is satisfied (i.e., whether H0 is rejected; shaded region), under different test processes: the test process defined using the non-mixed e-values in Proposition 1 (orange), a test process defined in Eq. 30 with a uniform mixture P mix = U(θ b , 1) (blue), and a test process… view at source ↗

**Figure 3.** Figure 3: Illustration of the geometry of the state space in the MDP Mε . 23 view at source ↗

**Figure 4.** Figure 4: Runtime of Algorithm 1. The figure shows, across multiple values of the maximum number of trials T and the maximum sample size per trial n max, the runtime of Algorithm 1 (left panel) and the number of belief MDPs solved by the algorithm (right panel). All other parameters are fixed as specified in Tables 2 and 3. The experiments are run on an NVIDIA H100 GPU. Parameter details. Tables 2 and 3 report the v… view at source ↗

**Figure 5.** Figure 5: Optimal value function and policy in the belief MDP Mε ∗ for the optimal subsidy ε ∗ = 0.108. The left panel shows the optimal value function in the belief MDP, V ε ∗ (α, β, C(α, β), 1), at time step l = 1, where the cost of each state is given by C(α, β) = 1 · c0 + (α + β − α0 − β0) · c1 (see Eq. 34). The right panel shows the optimal action n taken by the optimal policy at time step l = 1 for each belief… view at source ↗

**Figure 6.** Figure 6: we show how the belief of the agent evolves in 300 realizations of the approval process, for different values of the true efficacy θ ∗ . 1 6 11 16 21 α 1 6 11 16 21 β 1 6 11 16 21 α 1 6 11 16 21 β 0 25 50 75 100 125 150 175 200 Optimal value function ( $M) 0 25 50 75 100 125 150 175 200 Optimal action n view at source ↗

**Figure 7.** Figure 7: Expected reward for each sample size. The figure show, for the initial action taken by the agent at time step l = 0 and state (α0 = 1, β0 = 1, 0), the total expected reward in the MDP Mε ∗ under the optimal subsidy ε ∗ = 0.108 when the agent takes action n and then follows the optimal policy π ε ∗ . 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Subsidy, ε 0.16 0.18 0.20 0.22 Probability of opting out (a) Opting … view at source ↗

**Figure 8.** Figure 8: Opt out and approval probabilities. The figure shows, for an antibiotic with θ ∗ = 0.65, the probability that the agent opts out of the approval process by selecting n = 0 before approval, as well as the probability that the antibiotic is ultimately approved. For each subsidy level, the agent follows the optimal policy. Note that, in principle, the agent may never opt out during the approval process; howev… view at source ↗

**Figure 9.** Figure 9: Agent utilities. The left panel shows the agent’s utility (Eq. 10) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy, which is a piece-wise linear, convex and continuous function in accordance with Proposition 8. The right panel shows the true utility of the agent (Eq. 7) in the approval process when using the optimal policy π ε for each subsidy and θ ∗ = 0.65. The da… view at source ↗

**Figure 10.** Figure 10: Social utilities. The left panel shows the social utility (Eq. 13) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy. The right panel shows the true social utility (Eq. 7) in the approval process when the agent uses the optimal policy π ε for each subsidy and θ ∗ = 0.65. The dashed vertical lines correspond to the intervals of the partition P where the agent’s optima… view at source ↗

**Figure 11.** Figure 11: Optimal social utility for different antibiotic efficacies. The figure shows how social utility U S (ε ∗ ; π ε ∗ )—when the principal chooses the optimal subsidy ε ∗ and the agent adopts the corresponding optimal policy π ε ∗—varies as a function of the ratio ρ S/ρA across different levels of efficacy θ ∗ . The dashed line corresponds to the social utility U¯ S (ε ∗ ; π ε ∗ ) computed using the belief MDP… view at source ↗

**Figure 12.** Figure 12: Agent utilities under increased experimental costs. The left panel shows the agent’s utility (Eq. 10) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy, which is a piece-wise linear, convex, and continuous function in accordance with Proposition 8. The right panel shows the true utility of the agent (Eq. 7) in the approval process when using the optimal policy π ε fo… view at source ↗

**Figure 13.** Figure 13: Social utilities under increased experimental costs. The left panel shows the social utility (Eq. 13) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy. The right panel shows the true social utility (Eq. 7) in the approval process when the agent uses the optimal policy π ε for each subsidy and θ ∗ = 0.65. The dashed vertical lines correspond to the intervals of the p… view at source ↗

**Figure 14.** Figure 14: Optimal subsidy vs. ρ S/ρA . The figure shows, as a function of the social-to-agent approval benefit ratio, the optimal subsidy obtained using Algorithm 1. 0 5 10 15 20 25 30 35 40 Social-to-agent approval benefit ratio, ρS/ρA 25 30 35 Social utility gain vs. non-sequential (%) Non-sequential (optimal subsidy) view at source ↗

**Figure 15.** Figure 15: Social utility gain vs. ρ S/ρA . The figure shows, as a function of the social-to-agent approval benefit ratio, the percentage increase in social utility of the sequential approval protocol relative to a nonsequential approval protocol in which the agent is restricted to a single trial with n max = 800, under the optimal subsidy computed using Algorithm 1 (in the non-sequential protocol without subsidy t… view at source ↗

**Figure 16.** Figure 16: Agent utilities under increased approval utility. The left panel shows the agent’s utility (Eq. 10) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy, which is a piece-wise linear, convex and continuous function in accordance with Proposition 8. The right panel shows the true utility of the agent (Eq. 7) in the approval process when using the optimal policy π ε for e… view at source ↗

**Figure 17.** Figure 17: Social utilities under increased approval agent utility. The left panel shows the social utility (Eq. 13) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy. The right panel shows the true social utility (Eq. 7) in the approval process when the agent uses the optimal policy π ε for each subsidy and θ ∗ = 0.65. The dashed vertical lines correspond to the intervals of t… view at source ↗

**Figure 18.** Figure 18: Optimal subsidy vs. ρ S/ρA . The figure shows, as a function of the social-to-agent approval benefit ratio, the optimal subsidy obtained using Algorithm 1. 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 Social-to-agent approval benefit ratio, ρS/ρA 8.25 8.50 8.75 Social utility gain vs. non-sequential (%) Non-sequential (no subsidy) Non-sequential (optimal subsidy) view at source ↗

**Figure 19.** Figure 19: Social utility gain vs. ρ S/ρA . The figure shows, as a function of the social-to-agent approval benefit ratio, the percentage increase in social utility of the sequential approval protocol relative to a nonsequential approval protocol in which the agent is restricted to a single trial with n max = 800, under (i) the optimal subsidy computed using Algorithm 1 and (ii) no subsidy (ε = 0). In this case, th… view at source ↗

**Figure 20.** Figure 20: Agent utilities under a pessimistic prior. The left panel shows the agent’s utility (Eq. 10) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy, which is a piece-wise linear, convex and continuous function in accordance with Proposition 8. The right panel shows the true utility of the agent (Eq. 7) in the approval process when using the optimal policy π ε for each sub… view at source ↗

**Figure 21.** Figure 21: Social utilities under a pessimistic prior. The left panel shows the social utility (Eq. 13) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy. The right panel shows the true social utility (Eq. 7) in the approval process when the agent uses the optimal policy π ε for each subsidy and θ ∗ = 0.65. The dashed vertical lines correspond to the intervals of the partition … view at source ↗

**Figure 22.** Figure 22: Optimal subsidy vs. ρ S/ρA . The figure shows, as a function of the social-to-agent approval benefit ratio, the optimal subsidy obtained using Algorithm 1. 0 5 10 15 20 25 30 35 40 Social-to-agent approval benefit ratio, ρS/ρA 40 50 60 Social utility gain vs. non-sequential (%) Non-sequential (optimal subsidy) view at source ↗

**Figure 23.** Figure 23: Social utility gain vs. ρ S/ρA . The figure shows, as a function of the social-to-agent approval benefit ratio, the percentage increase in social utility of the sequential approval protocol relative to a nonsequential approval protocol in which the agent is restricted to a single trial with n max = 800, under (i) the optimal subsidy computed using Algorithm 1 (in the non-sequential protocol without subsi… view at source ↗

**Figure 24.** Figure 24: Agent utilities under an optimist prior. The left panel shows the agent’s utility (Eq. 10) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy, which is a piece-wise linear, convex and continuous function in accordance with Proposition 8. The right panel shows the true utility of the agent (Eq. 7) in the approval process when using the optimal policy π ε for each subsi… view at source ↗

**Figure 25.** Figure 25: Social utilities under an optimist prior. The left panel shows the social utility (Eq. 13) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy. The right panel shows the true social utility (Eq. 7) in the approval process when the agent uses the optimal policy π ε for each subsidy and θ ∗ = 0.65. The dashed vertical lines correspond to the intervals of the partition P … view at source ↗

**Figure 26.** Figure 26: Optimal subsidy vs. ρ S/ρA . The figure shows, as a function of the social-to-agent approval benefit ratio, the optimal subsidy obtained using Algorithm 1. 0 5 10 15 20 25 30 35 40 Social-to-agent approval benefit ratio, ρS/ρA 20 40 60 Social utility gain vs. non-sequential (%) Non-sequential (no subsidy) Non-sequential (optimal subsidy) view at source ↗

**Figure 27.** Figure 27: Social utility gain vs. ρ S/ρA . The figure shows, as a function of the social-to-agent approval benefit ratio, the percentage increase in social utility of the sequential approval protocol relative to a nonsequential approval protocol in which the agent is restricted to a single trial with n max = 800, under (i) the optimal subsidy computed using Algorithm 1 and (ii) no subsidy (ε = 0). 66 view at source ↗

**Figure 28.** Figure 28: Agent utilities under a calibrated informative prior. The left panel shows the agent’s utility (Eq. 10) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy, which is a piece-wise linear, convex and continuous function in accordance with Proposition 8. The right panel shows the true utility of the agent (Eq. 7) in the approval process when using the optimal policy π ε f… view at source ↗

**Figure 29.** Figure 29: Social utilities under a calibrated informative prior. The left panel shows the social utility (Eq. 13) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy. The right panel shows the true social utility (Eq. 7) in the approval process when the agent uses the optimal policy π ε for each subsidy and θ ∗ = 0.65. The dashed vertical lines correspond to the intervals of the… view at source ↗

**Figure 30.** Figure 30: Optimal subsidy vs. ρ S/ρA . The figure shows, as a function of the social-to-agent approval benefit ratio, the optimal subsidy obtained using Algorithm 1. Lastly, view at source ↗

**Figure 31.** Figure 31: Social utility gain vs. ρ S/ρA . The figure shows, as a function of the social-to-agent approval benefit ratio, the percentage increase in social utility of the sequential approval protocol relative to a nonsequential approval protocol in which the agent is restricted to a single trial with n max = 800, under (i) the optimal subsidy computed using Algorithm 1 and (ii) no subsidy (ε = 0). 68 view at source ↗

**Figure 32.** Figure 32: Agent utilities under an uncalibrated informative prior. The left panel shows the agent’s utility (Eq. 10) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy, which is a piece-wise linear, convex and continuous function in accordance with Proposition 8. The right panel shows the true utility of the agent (Eq. 7) in the approval process when using the optimal policy π … view at source ↗

**Figure 33.** Figure 33: Social utilities under an uncalibrated informative prior. The left panel shows the social utility (Eq. 13) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy. The right panel shows the true social utility (Eq. 7) in the approval process when the agent uses the optimal policy π ε for each subsidy and θ ∗ = 0.65. The dashed vertical lines correspond to the intervals of … view at source ↗

**Figure 34.** Figure 34: Optimal subsidy vs. ρ S/ρA . The figure shows, as a function of the social-to-agent approval benefit ratio, the optimal subsidy obtained using Algorithm 1. 0 5 10 15 20 25 30 35 40 Social-to-agent approval benefit ratio, ρS/ρA 90 95 100 Social utility gain vs. non-sequential (%) Non-sequential (no subsidy) Non-sequential (optimal subsidy) view at source ↗

**Figure 35.** Figure 35: Social utility gain vs. ρ S/ρA . The figure shows, as a function of the social-to-agent approval benefit ratio, the percentage increase in social utility of the sequential approval protocol relative to a nonsequential approval protocol in which the agent is restricted to a single trial with n max = 800, under (i) the optimal subsidy computed using Algorithm 1 and (ii) no subsidy (ε = 0). 70 view at source ↗

**Figure 36.** Figure 36: Agent utilities under a mixed test process. The left panel shows the agent’s utility (Eq. 10) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy, which is a piece-wise linear, convex, and continuous function in accordance with Proposition 8. The right panel shows the true utility of the agent (Eq. 7) in the approval process when using the optimal policy π ε for each s… view at source ↗

**Figure 37.** Figure 37: Social utilities under a mixed test process. The left panel shows the social utility (Eq. 13) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy. The right panel shows the true social utility (Eq. 7) in the approval process when the agent uses the optimal policy π ε for each subsidy and θ ∗ = 0.65. The dashed vertical lines correspond to the intervals of the partition… view at source ↗

**Figure 38.** Figure 38: Optimal subsidy vs. ρ S/ρA using a mixed test process. The figure shows, as a function of the social-to-agent approval benefit ratio, the optimal subsidy obtained using Algorithm 1. 0 5 10 15 20 25 30 35 40 Social-to-agent approval benefit ratio, ρS/ρA 0 5 10 15 Social utility gain vs. non-sequential (%) Non-sequential (no subsidy) Non-sequential (optimal subsidy) view at source ↗

**Figure 39.** Figure 39: Social utility gain vs. ρ S/ρA using a mixed test process. The figure shows, as a function of the social-to-agent approval benefit ratio, the percentage increase in social utility of the sequential approval protocol relative to a non-sequential approval protocol in which the agent is restricted to a single trial with n max = 800, under (i) the optimal subsidy computed using Algorithm 1 and (ii) no subsidy… view at source ↗

**Figure 40.** Figure 40: Probability of approval under the optimal policy and subsidy. The figure shows, across various efficacies θ ∗ of the antibiotic, the probability of approval (that is, of rejecting H0) when the principal selects the optimal subsidy and the agent its optimal policy. The dashed (orange) curve corresponds to the process Mmix defined in Eq. 30 for a uniform mixture (optimal subsidy ε ∗ = 0.027), while the soli… view at source ↗

read the original abstract

Regulatory approval of products in high-stakes domains such as drug development requires statistical evidence of safety and efficacy through large-scale randomized controlled trials. However, the high financial cost of these trials may deter developers who lack absolute certainty in their product's efficacy, ultimately stifling the development of `moonshot' products that could offer high social utility. To address this inefficiency, in this paper, we introduce a statistical protocol for experimentation where the product developer (the agent) conducts a randomized controlled trial sequentially and the regulator (the principal) partially subsidizes its cost. By modeling the protocol using a belief Markov decision process, we show that the agent's optimal strategy can be found efficiently using dynamic programming. Further, we show that the social utility is a piecewise linear and convex function over the subsidy level the principal selects, and thus the socially optimal subsidy can also be found efficiently using divide-and-conquer. Simulation experiments using publicly available data on antibiotic development and approval demonstrate that our statistical protocol can be used to increase social utility by more than $35$$\%$ relative to standard, non-sequential protocols.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives regulators an efficient way to set partial subsidies for sequential RCTs so developers pursue higher-social-utility products, using a belief MDP solved by DP and a convexity argument that lets divide-and-conquer find the best subsidy level, with simulations claiming over 35% gains versus standard protocols.

read the letter

The main contribution is a protocol that lets a regulator partially subsidize a developer's sequential randomized trial. The developer updates beliefs about the product and decides whether to continue or stop; the regulator chooses the subsidy to maximize total social utility. They model the developer's choices as a belief MDP and recover the optimal policy with dynamic programming. They then show that social utility is piecewise linear and convex in the subsidy parameter, so the optimal subsidy can be located with divide-and-conquer instead of exhaustive search. Simulations on public antibiotic data report more than 35% higher social utility than non-sequential baselines. That combination of sequential modeling, efficient optimization, and a concrete number is what stands out. The MDP framing fits the uncertainty and stopping decisions naturally, and the convexity result is a clean algorithmic win if it holds under the paper's utility definitions. The simulation claim is falsifiable and tied to real data, which makes the work more useful than pure theory. The softer spots are the usual ones for this style of work. The 35% gain depends on how social utility, costs, and belief updates are calibrated to the antibiotic dataset; different parameter choices or a different domain could shrink the improvement. The assumption that developers will follow the MDP-optimal policy under real regulatory uncertainty is plausible but untested against actual firm behavior. I would also want to see whether the piecewise-linear property requires restrictive conditions on the prior or the approval threshold. Readers working on algorithmic mechanism design for regulation or on sequential experimental design will find the most value here. The paper is clear enough and the claims specific enough that it deserves a serious referee who can check the derivations and the simulation setup in detail. I would send it out for review.

Referee Report

2 major / 2 minor

Summary. The paper proposes a sequential statistical protocol for high-stakes RCTs (e.g., drug development) in which the developer (agent) runs trials adaptively while the regulator (principal) offers a partial cost subsidy. The interaction is modeled as a belief MDP whose optimal policy is recovered by dynamic programming. The authors prove that social utility is piecewise-linear and convex in the subsidy level, permitting efficient computation of the socially optimal subsidy via divide-and-conquer search. Simulations on public antibiotic-development data are reported to yield more than 35% higher social utility than non-sequential baselines.

Significance. If the convexity result and the simulation protocol are robust, the work supplies a computationally tractable framework for subsidy design that can raise social welfare in regulated innovation settings. The dynamic-programming solution and the divide-and-conquer optimality search are concrete algorithmic contributions; the 35% empirical gain is a falsifiable, policy-relevant claim whose validity rests on the precise definition of social utility and the fidelity of the belief-MDP to developer incentives.

major comments (2)

[§4] §4 (Simulation Experiments): the reported >35% social-utility gain is stated without an explicit equation or table showing how social utility is computed from the MDP value function, the precise baseline non-sequential protocol, or the antibiotic data parameters (e.g., prior beliefs, cost distributions). This gap prevents verification that the numerical improvement is load-bearing for the central claim.
[§3.2] §3.2 (Convexity of Social Utility): the proof that social utility is piecewise linear and convex in the subsidy parameter relies on the specific form of the agent's value function and the belief-update rule; if the utility definition or the transition probabilities contain fitted parameters from the same data used in §4, the convexity claim risks circularity that is not addressed.

minor comments (2)

[Abstract] Abstract: the expression “more than $35$$%” contains a duplicated dollar sign and should be rendered as “more than 35%.”
Notation: the belief state and the subsidy parameter are introduced without a consolidated table of symbols; a short notation table would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments, which have helped us improve the clarity and verifiability of the manuscript. We address each major comment below and have revised the paper accordingly.

read point-by-point responses

Referee: [§4] §4 (Simulation Experiments): the reported >35% social-utility gain is stated without an explicit equation or table showing how social utility is computed from the MDP value function, the precise baseline non-sequential protocol, or the antibiotic data parameters (e.g., prior beliefs, cost distributions). This gap prevents verification that the numerical improvement is load-bearing for the central claim.

Authors: We agree that the simulation section requires additional explicit detail for reproducibility. In the revised manuscript we have inserted Equation (12) defining social utility precisely as the principal's expected value under the optimal policy minus the unsubsidized portion of the agent's cost, together with Table 3 that specifies the baseline non-sequential protocol (fixed-sample-size RCT sized by standard power analysis at α=0.05, β=0.2) and lists all antibiotic-data parameters used (prior Beta(2,5) on efficacy, log-normal cost distribution with parameters taken directly from the public dataset, etc.). These additions permit independent verification of the reported gains. revision: yes
Referee: [§3.2] §3.2 (Convexity of Social Utility): the proof that social utility is piecewise linear and convex in the subsidy parameter relies on the specific form of the agent's value function and the belief-update rule; if the utility definition or the transition probabilities contain fitted parameters from the same data used in §4, the convexity claim risks circularity that is not addressed.

Authors: The proof of piecewise linearity and convexity (Theorem 3.2) is structural: it follows from the linearity of the agent's payoff in the subsidy level and the fact that the value function of a finite-horizon belief MDP with finite actions is piecewise linear and convex in the belief state, independent of any particular parameter values. The antibiotic data appear only in §4 as an instantiation for numerical illustration; no parameters are estimated from that data in a manner that enters the transition kernel or utility definition used in the proof. We have added a short clarifying paragraph in §3.2 stating that the result holds for arbitrary valid priors and Bayesian updates. revision: yes

Circularity Check

0 steps flagged

No circularity: derivations follow from MDP structure and definitions without reduction to inputs

full rationale

The paper defines a belief MDP for the sequential trial protocol, computes the agent's optimal policy via standard dynamic programming, and proves piecewise linearity plus convexity of social utility as a function of the subsidy parameter directly from the value functions and Bellman equations of that MDP. These steps are mathematical consequences of the model construction rather than fitted quantities or self-referential definitions. The divide-and-conquer search for the optimal subsidy follows immediately from the proven convexity. The 35% gain is an out-of-sample simulation result on external public data and does not feed back into the theoretical claims. No self-citations, ansatzes, or uniqueness theorems imported from prior author work appear in the load-bearing chain. The entire derivation chain is therefore self-contained and non-circular.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach relies on standard assumptions in MDP modeling and uses public data for validation, with the subsidy as a tunable parameter.

free parameters (1)

subsidy level
Selected by the principal to maximize social utility, optimized via divide-and-conquer.

axioms (1)

domain assumption The decision process of the agent can be accurately modeled as a belief Markov decision process.
Used to find optimal strategy with dynamic programming.

pith-pipeline@v0.9.0 · 5497 in / 1442 out tokens · 65835 ms · 2026-05-08T04:08:05.769622+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

81 extracted references · 7 canonical work pages

[1]

The safety and efficacy of new drug approval.Cato J., 5:177, 1985

Dale H Gieringer. The safety and efficacy of new drug approval.Cato J., 5:177, 1985

1985
[2]

Food and Drug Administration

U.S. Food and Drug Administration. Demonstrating substantial evidence of effectiveness for human drug and biological products: Guidance for industry. Draft guidance, U.S. Department of Health and Human Services, Food and Drug Administration, Center for Drug Evaluation and Research (CDER), Center for Biologics Evaluation and Research (CBER), December 2019....

2019
[3]

Building comparative efficacy and tolerability into the fda approval process.Jama, 303(10):979– 980, 2010

Alec B O’Connor. Building comparative efficacy and tolerability into the fda approval process.Jama, 303(10):979– 980, 2010

2010
[4]

Perrine Janiaud, Telba Irony, Estelle Russek-Cohen, and Steven N Goodman. U.S. food and drug administration reasoning in approval decisions when efficacy evidence is borderline, 2013-2018.Ann. Intern. Med., 174(11):1603– 1611, November 2021

2013
[5]

Alberto Farina, Federico Moro, Frederick Fasslrinner, Annahita Sedghi, Miluska Bromley, and Timo Siepmann. Strength of clinical evidence leading to approval of novel cancer medicines in europe: A systematic review and data synthesis.Pharmacology Research & Perspectives, 9(4):e00816, 2021

2021
[6]

Are clinical trials a cost-effective investment?Jama, 262(13):1795–1800, 1989

Allan S Detsky. Are clinical trials a cost-effective investment?Jama, 262(13):1795–1800, 1989

1989
[7]

Why are clinical costs so high?Nature Reviews Drug Discovery, 2(11), 2003

Simon Frantz. Why are clinical costs so high?Nature Reviews Drug Discovery, 2(11), 2003

2003
[8]

How much do clinical trials cost?Nature Reviews Drug Discovery, 16(6):381–382, 2017

Linda Martin, Melissa Hutchens, Conrad Hawkins, and Alaina Radnov. How much do clinical trials cost?Nature Reviews Drug Discovery, 16(6):381–382, 2017

2017
[9]

Adaptive designs for randomized trials in public health.Annual review of public health, 30(1):1–25, 2009

C Hendricks Brown, Thomas R Ten Have, Booil Jo, Getachew Dagne, Peter A Wyman, Bengt Muthén, and Robert D Gibbons. Adaptive designs for randomized trials in public health.Annual review of public health, 30(1):1–25, 2009. 11

2009
[10]

Adaptive design clinical trials: Methodology, challenges and prospect.Indian journal of pharmacology, 42(4):201–207, 2010

Rajiv Mahajan and Kapil Gupta. Adaptive design clinical trials: Methodology, challenges and prospect.Indian journal of pharmacology, 42(4):201–207, 2010

2010
[11]

Food and Drug Administration

U.S. Food and Drug Administration. Use of bayesian methodology in clinical trials of drug and biological products. Draft guidance, Center for Biologics Evaluation and Research and Center for Drug Evaluation and Research, Food and Drug Administration, March 2026

2026
[12]

https://grants.nih.gov/policy-and-compliance/policy- topics/clinical-trials/specific-funding-opportunities

Clinical trial-specific funding opportunities. https://grants.nih.gov/policy-and-compliance/policy- topics/clinical-trials/specific-funding-opportunities. Accessed: 2026-03-31

2026
[13]

Accessed: 2026-03-31

Clinical trials grants program.https://www.fda.gov/industry/orphan-products-grants-program/clinical- trials-grants-program. Accessed: 2026-03-31

2026
[14]

Accessed: 2026-03-31

Clinical trials.https://www.dfg.de/en/research-funding/funding-opportunities/programmes/individual/ clinical-trials. Accessed: 2026-03-31

2026
[15]

Accessed: 2026-03- 31

The european and developing countries clinical trials partnership.https://www.edctp.org/. Accessed: 2026-03- 31

2026
[16]

Principal-agent hypothesis testing.arXiv preprint arXiv:2205.06812, 2022

Stephen Bates, Michael I Jordan, Michael Sklar, and Jake A Soloff. Principal-agent hypothesis testing.arXiv preprint arXiv:2205.06812, 2022

work page arXiv 2022
[17]

Sharp results for hypothesis testing with risk-sensitive agents.arXiv preprint arXiv:2412.16452, 2024

Flora C Shi, Stephen Bates, and Martin J Wainwright. Sharp results for hypothesis testing with risk-sensitive agents.arXiv preprint arXiv:2412.16452, 2024

work page arXiv 2024
[18]

Strategic hypothesis testing

Safwan Hossain, Yatong Chen, and Yiling Chen. Strategic hypothesis testing. InThe Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS), 2025

2025
[19]

An analysis of the principal-agent problem

Sanford J Grossman and Oliver D Hart. An analysis of the principal-agent problem. InFoundations of insurance economics: Readings in economics and finance, pages 302–340. Springer, 1992

1992
[20]

Game-theoretic statistics and safe anytime-valid inference, 2023

Aaditya Ramdas, Peter Grünwald, Vladimir Vovk, and Glenn Shafer. Game-theoretic statistics and safe anytime-valid inference, 2023

2023
[21]

Planning and acting in partially observable stochastic domains.Artificial intelligence, 101(1-2):99–134, 1998

Leslie Pack Kaelbling, Michael L Littman, and Anthony R Cassandra. Planning and acting in partially observable stochastic domains.Artificial intelligence, 101(1-2):99–134, 1998

1998
[22]

An economic theory of statistical testing

Aleksey Tetenov. An economic theory of statistical testing. CeMMAP working papers 50/16, Institute for Fiscal Studies, Sep 2016

2016
[23]

Experimentation and approval mechanisms.Econometrica, 90(5):2215–2247, 2022

Andrew McClellan. Experimentation and approval mechanisms.Econometrica, 90(5):2215–2247, 2022

2022
[24]

Safe testing.Journal of the Royal Statistical Society Series B: Statistical Methodology, 86(5):1091–1128, 03 2024

Peter Grünwald, Rianne de Heide, and Wouter Koolen. Safe testing.Journal of the Royal Statistical Society Series B: Statistical Methodology, 86(5):1091–1128, 03 2024

2024
[25]

Hypothesis testing with e-values.Foundations and Trends in Statistics, 1(1-2):1–390, 07 2025

Aaditya Ramdas and Ruodu Wang. Hypothesis testing with e-values.Foundations and Trends in Statistics, 1(1-2):1–390, 07 2025

2025
[26]

Estimating means of bounded random variables by betting, 2022

Ian Waudby-Smith and Aaditya Ramdas. Estimating means of bounded random variables by betting, 2022

2022
[27]

Online multiple testing with e-values

Ziyu Xu and Aaditya Ramdas. Online multiple testing with e-values. In Sanjoy Dasgupta, Stephan Mandt, and Yingzhen Li, editors,Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, volume 238 ofProceedings of Machine Learning Research, pages 3997–4005. PMLR, 02–04 May 2024

2024
[28]

E-detectors: A nonparametric framework for sequential change detection.The New England Journal of Statistics in Data Science, 2(2):229–260, 2024

Jaehyeok Shin, Aaditya Ramdas, and Alessandro Rinaldo. E-detectors: A nonparametric framework for sequential change detection.The New England Journal of Statistics in Data Science, 2(2):229–260, 2024

2024
[29]

Nonparametric two-sample testing by betting.IEEE Trans

Shubhanshu Shekhar and Aaditya Ramdas. Nonparametric two-sample testing by betting.IEEE Trans. Inf. Theor., 70(2):1178–1203, February 2024

2024
[30]

Ian Waudby-Smith, Ricardo Sandoval, and Michael I. Jordan. Universal log-optimality for general classes of e-processes and sequential hypothesis tests, 2025. 12

2025
[31]

Post-hoc large-sample statistical inference.arXiv preprint arXiv:2603.08002,

Ben Chugg, Etienne Gauthier, Michael I Jordan, Aaditya Ramdas, and Ian Waudby-Smith. Post-hoc large-sample statistical inference.arXiv preprint arXiv:2603.08002, 2026

work page arXiv 2026
[32]

Etienne Gauthier, Francis Bach, and Michael I. Jordan. Backward conformal prediction. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

2025
[33]

Towards anytime-valid statistical watermarking.arXiv preprint arXiv:2602.17608, 2026

Baihe Huang, Eric Xu, Kannan Ramchandran, Jiantao Jiao, and Michael I Jordan. Towards anytime-valid statistical watermarking.arXiv preprint arXiv:2602.17608, 2026

work page arXiv 2026
[34]

Auditing pay-per-token in large language models

Ander Artola Velasco, Stratis Tsirtsis, and Manuel Gomez Rodriguez. Auditing pay-per-token in large language models. InThe 29th International Conference on Artificial Intelligence and Statistics, 2026

2026
[35]

Dhillon, Javier Gonzalez, Teodora Pandeva, and Alicia Curth

Guneet S. Dhillon, Javier Gonzalez, Teodora Pandeva, and Alicia Curth. E-scores for (in)correctness assessment of generative model outputs. InThe 29th International Conference on Artificial Intelligence and Statistics, 2026

2026
[36]

Optimal stopping.The American Mathematical Monthly, 77(4):333–343, 1970

Herbert Robbins. Optimal stopping.The American Mathematical Monthly, 77(4):333–343, 1970

1970
[37]

Lectures in Mathematics

Goran Peskir and Albert N Shiryaev.Optimal stopping and free-boundary problems. Lectures in Mathematics. ETH Zürich. Birkhauser Verlag AG, Basel, Switzerland, 2006 edition, August 2006

2006
[38]

Wiley Series in Probability and Statistics

Warren B Powell and Ilya O Ryzhov.Optimal Learning. Wiley Series in Probability and Statistics. Wiley-Blackwell, Hoboken, NJ, March 2012

2012
[39]

Bayesian reinforcement learning: A survey.Foundations and Trends®in Machine Learning, 8(5–6):359–483, November 2015

Mohammad Ghavamzadeh, Shie Mannor, Joelle Pineau, and Aviv Tamar. Bayesian reinforcement learning: A survey.Foundations and Trends®in Machine Learning, 8(5–6):359–483, November 2015

2015
[40]

Bayesian learning approach to model predictive control.arXiv preprint arXiv:2203.02720, 2022

Namhoon Cho, Seokwon Lee, Hyo-Sang Shin, and Antonios Tsourdos. Bayesian learning approach to model predictive control.arXiv preprint arXiv:2203.02720, 2022

work page arXiv 2022
[41]

Minimax-bayes reinforcement learning, 2023

Thomas Kleine Buening, Christos Dimitrakakis, Hannes Eriksson, Divya Grover, and Emilio Jorge. Minimax-bayes reinforcement learning, 2023

2023
[42]

Wanggang Shen and Xun Huan. Bayesian sequential optimal experimental design for nonlinear models using policy gradient reinforcement learning.Computer Methods in Applied Mechanics and Engineering, 416:116304, 2023

2023
[43]

Optimal stopping for sequential bayesian experimental design.arXiv preprint arXiv:2509.21734, 2025

Chen Cheng and Xun Huan. Optimal stopping for sequential bayesian experimental design.arXiv preprint arXiv:2509.21734, 2025

work page arXiv 2025
[44]

Variational sequential optimal experimental design using reinforcement learning.Comput

Wanggang Shen, Jiayuan Dong, and Xun Huan. Variational sequential optimal experimental design using reinforcement learning.Comput. Methods Appl. Mech. Eng., 444(118068):118068, September 2025

2025
[45]

A. Wald. Sequential Tests of Statistical Hypotheses.The Annals of Mathematical Statistics, 16(2):117 – 186, 1945

1945
[46]

de la Peña

Victor H. de la Peña. A General Class of Exponential Inequalities for Martingales and Ratios.The Annals of Probability, 27(1):537 – 564, 1999

1999
[47]

Howard, Aaditya Ramdas, Jon McAuliffe, and Jasjeet Sekhon

Steven R. Howard, Aaditya Ramdas, Jon McAuliffe, and Jasjeet Sekhon. Time-uniform Chernoff bounds via nonnegative supermartingales.Probability Surveys, 17(none):257 – 317, 2020

2020
[48]

A systematic review and critical assessment of incentive strategies for discovery and development of novel antibiotics.J

Matthew J Renwick, David M Brogan, and Elias Mossialos. A systematic review and critical assessment of incentive strategies for discovery and development of novel antibiotics.J. Antibiot. (Tokyo), 69(2):73–88, February 2016

2016
[49]

Government r&d subsidies and enterprise r&d activities: theory and evidence

Wan-Shu Wu and Kai Zhao. Government r&d subsidies and enterprise r&d activities: theory and evidence. Economic Research-Ekonomska Istraživanja, 35(1):391–408, 2022

2022
[50]

Bayesian decision theory, rule utilitarianism, and arrow’s impossibility theorem.Theory and Decision, 11(3):289–317, 1979

John C Harsanyi. Bayesian decision theory, rule utilitarianism, and arrow’s impossibility theorem.Theory and Decision, 11(3):289–317, 1979. 13

1979
[51]

Sutton and Andrew G

Richard S. Sutton and Andrew G. Barto.Reinforcement Learning: An Introduction. The MIT Press, second edition, 2018

2018
[52]

A tutorial on thompson sampling, 2020

Daniel Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, and Zheng Wen. A tutorial on thompson sampling, 2020

2020
[53]

Springer, Berlin, Germany, 2011 edition, August 2010

Heinrich von Stackelberg.Market Structure and Equilibrium. Springer, Berlin, Germany, 2011 edition, August 2010

2011
[54]

Congress

U.S. Congress. 26 U.S.C. 45C — clinical testing expenses for certain drugs for rare diseases or conditions. Internal Revenue Code, 2024. Accessed: 2026-05-03

2024
[55]

Small business funding.https://seed.nih.gov/small-business-funding, 2026

National Institutes of Health. Small business funding.https://seed.nih.gov/small-business-funding, 2026. Accessed: 2026-05-03

2026
[56]

Springer New York, 1996

Onésimo Hernández-Lerma and Jean Bernard Lasserre.Discrete-Time Markov Control Processes. Springer New York, 1996

1996
[57]

Computing the optimal strategy to commit to

Vincent Conitzer and Tuomas Sandholm. Computing the optimal strategy to commit to. InProceedings of the 7th ACM Conference on Electronic Commerce, EC ’06, page 82–90, New York, NY, USA, 2006. Association for Computing Machinery

2006
[58]

Playing games for security: An efficient exact algorithm for solving bayesian stackelberg games

Praveen Paruchuri, Jonathan P Pearce, Janusz Marecki, Milind Tambe, Fernando Ordonez, and Sarit Kraus. Playing games for security: An efficient exact algorithm for solving bayesian stackelberg games. InProceedings of the 7th international joint conference on Autonomous agents and multiagent systems-Volume 2, pages 895–902, 2008

2008
[59]

Global burden of bacterial antimicrobial resistance 1990-2021: a systematic analysis with forecasts to 2050.Lancet, 404(10459):1199–1226, September 2024

GBD 2021 Antimicrobial Resistance Collaborators. Global burden of bacterial antimicrobial resistance 1990-2021: a systematic analysis with forecasts to 2050.Lancet, 404(10459):1199–1226, September 2024

2021
[60]

Tackling the threat of antimicrobial resistance: from policy to sustainable action.Philos

Laura J Shallcross, Simon J Howard, Tom Fowler, and Sally C Davies. Tackling the threat of antimicrobial resistance: from policy to sustainable action.Philos. Trans. R. Soc. Lond. B Biol. Sci., 370(1670):20140082, June 2015

2015
[61]

Approval and withdrawal of new antibiotics and other antiinfectives in the U.S., 1980-2009.J

Kevin Outterson, John H Powers, Enrique Seoane-Vazquez, Rosa Rodriguez-Monguio, and Aaron S Kesselheim. Approval and withdrawal of new antibiotics and other antiinfectives in the U.S., 1980-2009.J. Law Med. Ethics, 41(3):688–696, 2013

1980
[62]

The economic conundrum for antibacterial drugs.Antimicrob

David M Shlaes. The economic conundrum for antibacterial drugs.Antimicrob. Agents Chemother., 64(1), December 2019

2019
[63]

Current economic and regulatory challenges in developing antibiotics for gram-negative bacteria.NPJ Antimicrob

Nupur Gargate, Mark Laws, and Khondaker Miraz Rahman. Current economic and regulatory challenges in developing antibiotics for gram-negative bacteria.NPJ Antimicrob. Resist., 3(1):50, June 2025

2025
[64]

Why big pharma has abandoned antibiotics.Nature, 586(7830):S50–S52, October 2020

Benjamin Plackett. Why big pharma has abandoned antibiotics.Nature, 586(7830):S50–S52, October 2020

2020
[65]

Looking for solutions to the pitfalls of developing novel antibacterials in an economically challenging system.Microbiol

Gilles Courtemanche, Rohini Wadanamby, Amritanjali Kiran, Luisa Fernanda Toro-Alzate, Mathew Diggle, Dipanjan Chakraborty, Ariel Blocker, and Maarten van Dongen. Looking for solutions to the pitfalls of developing novel antibacterials in an economically challenging system.Microbiol. Res. (Pavia), 12(1):173–185, March 2021

2021
[66]

Advancing global antibiotic research, development and access.Nat

Laura J V Piddock, Yewande Alimi, James Anderson, Damiano de Felice, Catrin E Moore, John-Arne Røttingen, Henry Skinner, and Peter Beyer. Advancing global antibiotic research, development and access.Nat. Med., 30(9):2432–2443, September 2024

2024
[67]

Novel insights from financial analysis of the failure to commercialise plazomicin: Implications for the antibiotic investment ecosystem.Humanit

Nadya Wells, Vinh-Kim Nguyen, and Stephan Harbarth. Novel insights from financial analysis of the failure to commercialise plazomicin: Implications for the antibiotic investment ecosystem.Humanit. Soc. Sci. Commun., 11(1), July 2024

2024
[68]

Accelerating global innovation to address antibacterial resistance: introducing CARB-X.Nat

Kevin Outterson, John H Rex, Tim Jinks, Peter Jackson, John Hallinan, Steve Karp, Deborah T Hung, Francois Franceschi, Tyler Merkeley, Christopher Houchens, Dennis M Dixon, Michael G Kurilla, Rosemarie Aurigemma, and Joseph Larsen. Accelerating global innovation to address antibacterial resistance: introducing CARB-X.Nat. Rev. Drug Discov., 15(9):589–590,...

2016
[69]

Challenges and opportunities for incentivising antibiotic research and development in europe.Lancet Reg

Michael Anderson, Dimitra Panteli, Robin van Kessel, Gunnar Ljungqvist, Francesca Colombo, and Elias Mossialos. Challenges and opportunities for incentivising antibiotic research and development in europe.Lancet Reg. Health Eur., 33(100705):100705, October 2023

2023
[70]

United States Congress. H.R. 7352: PASTEUR Act of 2026, 2026. To amend the Public Health Service Act to establish a program to develop innovative antimicrobial drugs

2026
[71]

Market concentration of new antibiotic sales

Sakib Rahman, Olof Lindahl, Chantal M Morel, and Aidan Hollis. Market concentration of new antibiotic sales. J. Antibiot. (Tokyo), 74(6):421–423, June 2021

2021
[72]

Cost drivers of a hospital-acquired bacterial pneumonia and ventilator- associated bacterial pneumonia phase 3 clinical trial.Clin

Stella Stergiopoulos, Sara B Calvert, Carrie A Brown, Josephine Awatin, Pamela Tenaerts, Thomas L Holland, Joseph A DiMasi, and Kenneth A Getz. Cost drivers of a hospital-acquired bacterial pneumonia and ventilator- associated bacterial pneumonia phase 3 clinical trial.Clin. Infect. Dis., 66(1):72–80, January 2018

2018
[73]

Estimated costs of pivotal trials for novel therapeutic agents approved by the US food and drug administration, 2015-2016.JAMA Intern

Thomas J Moore, Hanzhe Zhang, Gerard Anderson, and G Caleb Alexander. Estimated costs of pivotal trials for novel therapeutic agents approved by the US food and drug administration, 2015-2016.JAMA Intern. Med., 178(11):1451–1457, November 2018

2015
[74]

Instance-adaptive hypothesis tests with heterogeneous agents.arXiv preprint arXiv:2510.21178, 2025

Flora C Shi, Martin J Wainwright, and Stephen Bates. Instance-adaptive hypothesis tests with heterogeneous agents.arXiv preprint arXiv:2510.21178, 2025

work page arXiv 2025
[75]

Campbell, N Balakrishnan, and Brani Vidakovic.Encyclopedia of statistical sciences

B. Campbell, N Balakrishnan, and Brani Vidakovic.Encyclopedia of statistical sciences. Methods and Applications of Statistics. John Wiley & Sons, Nashville, TN, 2 edition, December 2005

2005
[76]

False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant.Psychol

Joseph P Simmons, Leif D Nelson, and Uri Simonsohn. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant.Psychol. Sci., 22(11):1359–1366, November 2011

2011
[77]

Etienne Gauthier, Francis Bach, and Michael I. Jordan. Betting on equilibrium: Monitoring strategic behavior in multi-agent systems, 2026

2026
[78]

Ville.Étude Critique de la Notion de Collectif

J. Ville.Étude Critique de la Notion de Collectif. Collection des monographies des probabilités. Gauthier-Villars, 1939

1939
[79]

RejectH0

R.T. Rockafellar.Convex Analysis. Princeton landmarks in mathematics and physics. Princeton University Press, 1970. 15 A Summary of Notation In Table 1 we summarize the key symbols used in the main body of the paper. Table 1:Summary of notation. Symbol Description κPrincipal’s false positive rate bound θ∗ True (unknown) product efficacy θb Baseline effica...

1970
[80]

IfA L =A R, then ¯U A(πε;ε) =V 0 L +ε·A L for allε∈[ε L, εR]

Showing first 80 references.