pith. machine review for the scientific record. sign in

arxiv: 2605.06520 · v1 · submitted 2026-05-07 · 💻 cs.GT · cs.LG· cs.MA· stat.ME

Recognition: unknown

Optimizing Social Utility in Sequential Experiments

Authors on Pith no claims yet

Pith reviewed 2026-05-08 04:08 UTC · model grok-4.3

classification 💻 cs.GT cs.LGcs.MAstat.ME
keywords sequential experimentssocial utilitysubsidy designbelief Markov decision processdynamic programmingregulatory trialsantibiotic developmentpiecewise linear convexity
0
0 comments X

The pith

A sequential trial protocol with targeted subsidies lets regulators increase social utility from risky product development by more than 35 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes allowing developers to run randomized controlled trials in stages rather than committing to a full fixed-length study upfront, with the regulator offering a partial subsidy to cover some of the costs. This addresses the problem that high trial expenses can discourage developers from pursuing products with uncertain but potentially high social value. The authors model the developer's choice of when to continue or stop as a belief Markov decision process that can be solved with dynamic programming, and they prove that the overall social utility is a piecewise linear convex function of the chosen subsidy level. As a result, the regulator can locate the socially best subsidy amount efficiently with a divide-and-conquer procedure. Simulations on public antibiotic data show that the protocol raises social utility by more than 35 percent compared with conventional non-sequential trials.

Core claim

We introduce a statistical protocol for experimentation where the product developer (the agent) conducts a randomized controlled trial sequentially and the regulator (the principal) partially subsidizes its cost. By modeling the protocol using a belief Markov decision process, we show that the agent's optimal strategy can be found efficiently using dynamic programming. Further, we show that the social utility is a piecewise linear and convex function over the subsidy level the principal selects, and thus the socially optimal subsidy can also be found efficiently using divide-and-conquer. Simulation experiments using publicly available data on antibiotic development and approval demonstrate a

What carries the argument

A belief Markov decision process that represents the developer's sequential decisions on whether to continue or stop a trial, solved by dynamic programming for the agent's policy and by convexity analysis to optimize the regulator's subsidy level.

If this is right

  • Regulators obtain an efficient algorithm to select the subsidy level that maximizes social utility for any given product profile.
  • Developers become willing to pursue more high-uncertainty projects whose expected social value exceeds their private cost.
  • Resources are shifted away from low-value products because sequential stopping rules avoid completing expensive trials when interim evidence is weak.
  • The same modeling approach can be reused for other regulatory domains that require statistical evidence before approval.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If developers receive additional private signals not included in the public belief model, the realized utility gains may be smaller or larger than the simulations predict.
  • Allowing the subsidy to depend on interim results rather than being fixed in advance could further increase social utility.
  • The piecewise-linear convexity property implies that modest errors in estimating the optimal subsidy produce only small losses, which aids practical implementation.

Load-bearing premise

That real developers update their beliefs and make continuation decisions under uncertainty in the same way the belief Markov decision process assumes.

What would settle it

A direct comparison between the trial-stopping thresholds chosen by actual developers under offered subsidies and the thresholds computed by the dynamic programming solution on the same product parameters.

Figures

Figures reproduced from arXiv: 2605.06520 by Ander Artola Velasco, Manuel Gomez-Rodriguez, Stratis Tsirtsis.

Figure 1
Figure 1. Figure 1: Subsidizing antibiotic development. The figure shows the results of the approval process for an antibiotic with true (unknown) efficacy θ ∗ = 0.65. Panel (a) shows the result of running Algorithm 1 to compute the optimal subsidy for the principal ε ∗ = 0.108 when the social benefit of approval is ρ S = $2000 M. The dashed vertical lines correspond to the intervals of the partition P where the agent’s optim… view at source ↗
Figure 2
Figure 2. Figure 2: Rejection regions in belief space. The figure shows, for each agent belief with parameters (α, β), whether the condition f(α, β) ≥ 1/κ is satisfied (i.e., whether H0 is rejected; shaded region), under different test processes: the test process defined using the non-mixed e-values in Proposition 1 (orange), a test process defined in Eq. 30 with a uniform mixture P mix = U(θ b , 1) (blue), and a test process… view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of the geometry of the state space in the MDP Mε . 23 view at source ↗
Figure 4
Figure 4. Figure 4: Runtime of Algorithm 1. The figure shows, across multiple values of the maximum number of trials T and the maximum sample size per trial n max, the runtime of Algorithm 1 (left panel) and the number of belief MDPs solved by the algorithm (right panel). All other parameters are fixed as specified in Tables 2 and 3. The experiments are run on an NVIDIA H100 GPU. Parameter details. Tables 2 and 3 report the v… view at source ↗
Figure 5
Figure 5. Figure 5: Optimal value function and policy in the belief MDP Mε ∗ for the optimal subsidy ε ∗ = 0.108. The left panel shows the optimal value function in the belief MDP, V ε ∗ (α, β, C(α, β), 1), at time step l = 1, where the cost of each state is given by C(α, β) = 1 · c0 + (α + β − α0 − β0) · c1 (see Eq. 34). The right panel shows the optimal action n taken by the optimal policy at time step l = 1 for each belief… view at source ↗
Figure 6
Figure 6. Figure 6: we show how the belief of the agent evolves in 300 realizations of the approval process, for different values of the true efficacy θ ∗ . 1 6 11 16 21 α 1 6 11 16 21 β 1 6 11 16 21 α 1 6 11 16 21 β 0 25 50 75 100 125 150 175 200 Optimal value function ( $M) 0 25 50 75 100 125 150 175 200 Optimal action n view at source ↗
Figure 7
Figure 7. Figure 7: Expected reward for each sample size. The figure show, for the initial action taken by the agent at time step l = 0 and state (α0 = 1, β0 = 1, 0), the total expected reward in the MDP Mε ∗ under the optimal subsidy ε ∗ = 0.108 when the agent takes action n and then follows the optimal policy π ε ∗ . 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Subsidy, ε 0.16 0.18 0.20 0.22 Probability of opting out (a) Opting … view at source ↗
Figure 8
Figure 8. Figure 8: Opt out and approval probabilities. The figure shows, for an antibiotic with θ ∗ = 0.65, the probability that the agent opts out of the approval process by selecting n = 0 before approval, as well as the probability that the antibiotic is ultimately approved. For each subsidy level, the agent follows the optimal policy. Note that, in principle, the agent may never opt out during the approval process; howev… view at source ↗
Figure 9
Figure 9. Figure 9: Agent utilities. The left panel shows the agent’s utility (Eq. 10) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy, which is a piece-wise linear, convex and continuous function in accordance with Proposition 8. The right panel shows the true utility of the agent (Eq. 7) in the approval process when using the optimal policy π ε for each subsidy and θ ∗ = 0.65. The da… view at source ↗
Figure 10
Figure 10. Figure 10: Social utilities. The left panel shows the social utility (Eq. 13) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy. The right panel shows the true social utility (Eq. 7) in the approval process when the agent uses the optimal policy π ε for each subsidy and θ ∗ = 0.65. The dashed vertical lines correspond to the intervals of the partition P where the agent’s optima… view at source ↗
Figure 11
Figure 11. Figure 11: Optimal social utility for different antibiotic efficacies. The figure shows how social utility U S (ε ∗ ; π ε ∗ )—when the principal chooses the optimal subsidy ε ∗ and the agent adopts the corresponding optimal policy π ε ∗—varies as a function of the ratio ρ S/ρA across different levels of efficacy θ ∗ . The dashed line corresponds to the social utility U¯ S (ε ∗ ; π ε ∗ ) computed using the belief MDP… view at source ↗
Figure 12
Figure 12. Figure 12: Agent utilities under increased experimental costs. The left panel shows the agent’s utility (Eq. 10) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy, which is a piece-wise linear, convex, and continuous function in accordance with Proposition 8. The right panel shows the true utility of the agent (Eq. 7) in the approval process when using the optimal policy π ε fo… view at source ↗
Figure 13
Figure 13. Figure 13: Social utilities under increased experimental costs. The left panel shows the social utility (Eq. 13) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy. The right panel shows the true social utility (Eq. 7) in the approval process when the agent uses the optimal policy π ε for each subsidy and θ ∗ = 0.65. The dashed vertical lines correspond to the intervals of the p… view at source ↗
Figure 14
Figure 14. Figure 14: Optimal subsidy vs. ρ S/ρA . The figure shows, as a function of the social-to-agent approval benefit ratio, the optimal subsidy obtained using Algorithm 1. 0 5 10 15 20 25 30 35 40 Social-to-agent approval benefit ratio, ρS/ρA 25 30 35 Social utility gain vs. non-sequential (%) Non-sequential (optimal subsidy) view at source ↗
Figure 15
Figure 15. Figure 15: Social utility gain vs. ρ S/ρA . The figure shows, as a function of the social-to-agent approval benefit ratio, the percentage increase in social utility of the sequential approval protocol relative to a non￾sequential approval protocol in which the agent is restricted to a single trial with n max = 800, under the optimal subsidy computed using Algorithm 1 (in the non-sequential protocol without subsidy t… view at source ↗
Figure 16
Figure 16. Figure 16: Agent utilities under increased approval utility. The left panel shows the agent’s utility (Eq. 10) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy, which is a piece-wise linear, convex and continuous function in accordance with Proposition 8. The right panel shows the true utility of the agent (Eq. 7) in the approval process when using the optimal policy π ε for e… view at source ↗
Figure 17
Figure 17. Figure 17: Social utilities under increased approval agent utility. The left panel shows the social utility (Eq. 13) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy. The right panel shows the true social utility (Eq. 7) in the approval process when the agent uses the optimal policy π ε for each subsidy and θ ∗ = 0.65. The dashed vertical lines correspond to the intervals of t… view at source ↗
Figure 18
Figure 18. Figure 18: Optimal subsidy vs. ρ S/ρA . The figure shows, as a function of the social-to-agent approval benefit ratio, the optimal subsidy obtained using Algorithm 1. 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 Social-to-agent approval benefit ratio, ρS/ρA 8.25 8.50 8.75 Social utility gain vs. non-sequential (%) Non-sequential (no subsidy) Non-sequential (optimal subsidy) view at source ↗
Figure 19
Figure 19. Figure 19: Social utility gain vs. ρ S/ρA . The figure shows, as a function of the social-to-agent approval benefit ratio, the percentage increase in social utility of the sequential approval protocol relative to a non￾sequential approval protocol in which the agent is restricted to a single trial with n max = 800, under (i) the optimal subsidy computed using Algorithm 1 and (ii) no subsidy (ε = 0). In this case, th… view at source ↗
Figure 20
Figure 20. Figure 20: Agent utilities under a pessimistic prior. The left panel shows the agent’s utility (Eq. 10) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy, which is a piece-wise linear, convex and continuous function in accordance with Proposition 8. The right panel shows the true utility of the agent (Eq. 7) in the approval process when using the optimal policy π ε for each sub… view at source ↗
Figure 21
Figure 21. Figure 21: Social utilities under a pessimistic prior. The left panel shows the social utility (Eq. 13) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy. The right panel shows the true social utility (Eq. 7) in the approval process when the agent uses the optimal policy π ε for each subsidy and θ ∗ = 0.65. The dashed vertical lines correspond to the intervals of the partition … view at source ↗
Figure 22
Figure 22. Figure 22: Optimal subsidy vs. ρ S/ρA . The figure shows, as a function of the social-to-agent approval benefit ratio, the optimal subsidy obtained using Algorithm 1. 0 5 10 15 20 25 30 35 40 Social-to-agent approval benefit ratio, ρS/ρA 40 50 60 Social utility gain vs. non-sequential (%) Non-sequential (optimal subsidy) view at source ↗
Figure 23
Figure 23. Figure 23: Social utility gain vs. ρ S/ρA . The figure shows, as a function of the social-to-agent approval benefit ratio, the percentage increase in social utility of the sequential approval protocol relative to a non￾sequential approval protocol in which the agent is restricted to a single trial with n max = 800, under (i) the optimal subsidy computed using Algorithm 1 (in the non-sequential protocol without subsi… view at source ↗
Figure 24
Figure 24. Figure 24: Agent utilities under an optimist prior. The left panel shows the agent’s utility (Eq. 10) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy, which is a piece-wise linear, convex and continuous function in accordance with Proposition 8. The right panel shows the true utility of the agent (Eq. 7) in the approval process when using the optimal policy π ε for each subsi… view at source ↗
Figure 25
Figure 25. Figure 25: Social utilities under an optimist prior. The left panel shows the social utility (Eq. 13) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy. The right panel shows the true social utility (Eq. 7) in the approval process when the agent uses the optimal policy π ε for each subsidy and θ ∗ = 0.65. The dashed vertical lines correspond to the intervals of the partition P … view at source ↗
Figure 26
Figure 26. Figure 26: Optimal subsidy vs. ρ S/ρA . The figure shows, as a function of the social-to-agent approval benefit ratio, the optimal subsidy obtained using Algorithm 1. 0 5 10 15 20 25 30 35 40 Social-to-agent approval benefit ratio, ρS/ρA 20 40 60 Social utility gain vs. non-sequential (%) Non-sequential (no subsidy) Non-sequential (optimal subsidy) view at source ↗
Figure 27
Figure 27. Figure 27: Social utility gain vs. ρ S/ρA . The figure shows, as a function of the social-to-agent approval benefit ratio, the percentage increase in social utility of the sequential approval protocol relative to a non￾sequential approval protocol in which the agent is restricted to a single trial with n max = 800, under (i) the optimal subsidy computed using Algorithm 1 and (ii) no subsidy (ε = 0). 66 view at source ↗
Figure 28
Figure 28. Figure 28: Agent utilities under a calibrated informative prior. The left panel shows the agent’s utility (Eq. 10) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy, which is a piece-wise linear, convex and continuous function in accordance with Proposition 8. The right panel shows the true utility of the agent (Eq. 7) in the approval process when using the optimal policy π ε f… view at source ↗
Figure 29
Figure 29. Figure 29: Social utilities under a calibrated informative prior. The left panel shows the social utility (Eq. 13) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy. The right panel shows the true social utility (Eq. 7) in the approval process when the agent uses the optimal policy π ε for each subsidy and θ ∗ = 0.65. The dashed vertical lines correspond to the intervals of the… view at source ↗
Figure 30
Figure 30. Figure 30: Optimal subsidy vs. ρ S/ρA . The figure shows, as a function of the social-to-agent approval benefit ratio, the optimal subsidy obtained using Algorithm 1. Lastly, view at source ↗
Figure 31
Figure 31. Figure 31: Social utility gain vs. ρ S/ρA . The figure shows, as a function of the social-to-agent approval benefit ratio, the percentage increase in social utility of the sequential approval protocol relative to a non￾sequential approval protocol in which the agent is restricted to a single trial with n max = 800, under (i) the optimal subsidy computed using Algorithm 1 and (ii) no subsidy (ε = 0). 68 view at source ↗
Figure 32
Figure 32. Figure 32: Agent utilities under an uncalibrated informative prior. The left panel shows the agent’s utility (Eq. 10) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy, which is a piece-wise linear, convex and continuous function in accordance with Proposition 8. The right panel shows the true utility of the agent (Eq. 7) in the approval process when using the optimal policy π … view at source ↗
Figure 33
Figure 33. Figure 33: Social utilities under an uncalibrated informative prior. The left panel shows the social utility (Eq. 13) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy. The right panel shows the true social utility (Eq. 7) in the approval process when the agent uses the optimal policy π ε for each subsidy and θ ∗ = 0.65. The dashed vertical lines correspond to the intervals of … view at source ↗
Figure 34
Figure 34. Figure 34: Optimal subsidy vs. ρ S/ρA . The figure shows, as a function of the social-to-agent approval benefit ratio, the optimal subsidy obtained using Algorithm 1. 0 5 10 15 20 25 30 35 40 Social-to-agent approval benefit ratio, ρS/ρA 90 95 100 Social utility gain vs. non-sequential (%) Non-sequential (no subsidy) Non-sequential (optimal subsidy) view at source ↗
Figure 35
Figure 35. Figure 35: Social utility gain vs. ρ S/ρA . The figure shows, as a function of the social-to-agent approval benefit ratio, the percentage increase in social utility of the sequential approval protocol relative to a non￾sequential approval protocol in which the agent is restricted to a single trial with n max = 800, under (i) the optimal subsidy computed using Algorithm 1 and (ii) no subsidy (ε = 0). 70 view at source ↗
Figure 36
Figure 36. Figure 36: Agent utilities under a mixed test process. The left panel shows the agent’s utility (Eq. 10) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy, which is a piece-wise linear, convex, and continuous function in accordance with Proposition 8. The right panel shows the true utility of the agent (Eq. 7) in the approval process when using the optimal policy π ε for each s… view at source ↗
Figure 37
Figure 37. Figure 37: Social utilities under a mixed test process. The left panel shows the social utility (Eq. 13) computed using the belief MDP Mε when the agent uses the optimal policy for each subsidy. The right panel shows the true social utility (Eq. 7) in the approval process when the agent uses the optimal policy π ε for each subsidy and θ ∗ = 0.65. The dashed vertical lines correspond to the intervals of the partition… view at source ↗
Figure 38
Figure 38. Figure 38: Optimal subsidy vs. ρ S/ρA using a mixed test process. The figure shows, as a function of the social-to-agent approval benefit ratio, the optimal subsidy obtained using Algorithm 1. 0 5 10 15 20 25 30 35 40 Social-to-agent approval benefit ratio, ρS/ρA 0 5 10 15 Social utility gain vs. non-sequential (%) Non-sequential (no subsidy) Non-sequential (optimal subsidy) view at source ↗
Figure 39
Figure 39. Figure 39: Social utility gain vs. ρ S/ρA using a mixed test process. The figure shows, as a function of the social-to-agent approval benefit ratio, the percentage increase in social utility of the sequential approval protocol relative to a non-sequential approval protocol in which the agent is restricted to a single trial with n max = 800, under (i) the optimal subsidy computed using Algorithm 1 and (ii) no subsidy… view at source ↗
Figure 40
Figure 40. Figure 40: Probability of approval under the optimal policy and subsidy. The figure shows, across various efficacies θ ∗ of the antibiotic, the probability of approval (that is, of rejecting H0) when the principal selects the optimal subsidy and the agent its optimal policy. The dashed (orange) curve corresponds to the process Mmix defined in Eq. 30 for a uniform mixture (optimal subsidy ε ∗ = 0.027), while the soli… view at source ↗
read the original abstract

Regulatory approval of products in high-stakes domains such as drug development requires statistical evidence of safety and efficacy through large-scale randomized controlled trials. However, the high financial cost of these trials may deter developers who lack absolute certainty in their product's efficacy, ultimately stifling the development of `moonshot' products that could offer high social utility. To address this inefficiency, in this paper, we introduce a statistical protocol for experimentation where the product developer (the agent) conducts a randomized controlled trial sequentially and the regulator (the principal) partially subsidizes its cost. By modeling the protocol using a belief Markov decision process, we show that the agent's optimal strategy can be found efficiently using dynamic programming. Further, we show that the social utility is a piecewise linear and convex function over the subsidy level the principal selects, and thus the socially optimal subsidy can also be found efficiently using divide-and-conquer. Simulation experiments using publicly available data on antibiotic development and approval demonstrate that our statistical protocol can be used to increase social utility by more than $35$$\%$ relative to standard, non-sequential protocols.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a sequential statistical protocol for high-stakes RCTs (e.g., drug development) in which the developer (agent) runs trials adaptively while the regulator (principal) offers a partial cost subsidy. The interaction is modeled as a belief MDP whose optimal policy is recovered by dynamic programming. The authors prove that social utility is piecewise-linear and convex in the subsidy level, permitting efficient computation of the socially optimal subsidy via divide-and-conquer search. Simulations on public antibiotic-development data are reported to yield more than 35% higher social utility than non-sequential baselines.

Significance. If the convexity result and the simulation protocol are robust, the work supplies a computationally tractable framework for subsidy design that can raise social welfare in regulated innovation settings. The dynamic-programming solution and the divide-and-conquer optimality search are concrete algorithmic contributions; the 35% empirical gain is a falsifiable, policy-relevant claim whose validity rests on the precise definition of social utility and the fidelity of the belief-MDP to developer incentives.

major comments (2)
  1. [§4] §4 (Simulation Experiments): the reported >35% social-utility gain is stated without an explicit equation or table showing how social utility is computed from the MDP value function, the precise baseline non-sequential protocol, or the antibiotic data parameters (e.g., prior beliefs, cost distributions). This gap prevents verification that the numerical improvement is load-bearing for the central claim.
  2. [§3.2] §3.2 (Convexity of Social Utility): the proof that social utility is piecewise linear and convex in the subsidy parameter relies on the specific form of the agent's value function and the belief-update rule; if the utility definition or the transition probabilities contain fitted parameters from the same data used in §4, the convexity claim risks circularity that is not addressed.
minor comments (2)
  1. [Abstract] Abstract: the expression “more than $35$$%” contains a duplicated dollar sign and should be rendered as “more than 35%.”
  2. Notation: the belief state and the subsidy parameter are introduced without a consolidated table of symbols; a short notation table would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments, which have helped us improve the clarity and verifiability of the manuscript. We address each major comment below and have revised the paper accordingly.

read point-by-point responses
  1. Referee: [§4] §4 (Simulation Experiments): the reported >35% social-utility gain is stated without an explicit equation or table showing how social utility is computed from the MDP value function, the precise baseline non-sequential protocol, or the antibiotic data parameters (e.g., prior beliefs, cost distributions). This gap prevents verification that the numerical improvement is load-bearing for the central claim.

    Authors: We agree that the simulation section requires additional explicit detail for reproducibility. In the revised manuscript we have inserted Equation (12) defining social utility precisely as the principal's expected value under the optimal policy minus the unsubsidized portion of the agent's cost, together with Table 3 that specifies the baseline non-sequential protocol (fixed-sample-size RCT sized by standard power analysis at α=0.05, β=0.2) and lists all antibiotic-data parameters used (prior Beta(2,5) on efficacy, log-normal cost distribution with parameters taken directly from the public dataset, etc.). These additions permit independent verification of the reported gains. revision: yes

  2. Referee: [§3.2] §3.2 (Convexity of Social Utility): the proof that social utility is piecewise linear and convex in the subsidy parameter relies on the specific form of the agent's value function and the belief-update rule; if the utility definition or the transition probabilities contain fitted parameters from the same data used in §4, the convexity claim risks circularity that is not addressed.

    Authors: The proof of piecewise linearity and convexity (Theorem 3.2) is structural: it follows from the linearity of the agent's payoff in the subsidy level and the fact that the value function of a finite-horizon belief MDP with finite actions is piecewise linear and convex in the belief state, independent of any particular parameter values. The antibiotic data appear only in §4 as an instantiation for numerical illustration; no parameters are estimated from that data in a manner that enters the transition kernel or utility definition used in the proof. We have added a short clarifying paragraph in §3.2 stating that the result holds for arbitrary valid priors and Bayesian updates. revision: yes

Circularity Check

0 steps flagged

No circularity: derivations follow from MDP structure and definitions without reduction to inputs

full rationale

The paper defines a belief MDP for the sequential trial protocol, computes the agent's optimal policy via standard dynamic programming, and proves piecewise linearity plus convexity of social utility as a function of the subsidy parameter directly from the value functions and Bellman equations of that MDP. These steps are mathematical consequences of the model construction rather than fitted quantities or self-referential definitions. The divide-and-conquer search for the optimal subsidy follows immediately from the proven convexity. The 35% gain is an out-of-sample simulation result on external public data and does not feed back into the theoretical claims. No self-citations, ansatzes, or uniqueness theorems imported from prior author work appear in the load-bearing chain. The entire derivation chain is therefore self-contained and non-circular.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach relies on standard assumptions in MDP modeling and uses public data for validation, with the subsidy as a tunable parameter.

free parameters (1)
  • subsidy level
    Selected by the principal to maximize social utility, optimized via divide-and-conquer.
axioms (1)
  • domain assumption The decision process of the agent can be accurately modeled as a belief Markov decision process.
    Used to find optimal strategy with dynamic programming.

pith-pipeline@v0.9.0 · 5497 in / 1442 out tokens · 65835 ms · 2026-05-08T04:08:05.769622+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

81 extracted references · 7 canonical work pages

  1. [1]

    The safety and efficacy of new drug approval.Cato J., 5:177, 1985

    Dale H Gieringer. The safety and efficacy of new drug approval.Cato J., 5:177, 1985

  2. [2]

    Food and Drug Administration

    U.S. Food and Drug Administration. Demonstrating substantial evidence of effectiveness for human drug and biological products: Guidance for industry. Draft guidance, U.S. Department of Health and Human Services, Food and Drug Administration, Center for Drug Evaluation and Research (CDER), Center for Biologics Evaluation and Research (CBER), December 2019....

  3. [3]

    Building comparative efficacy and tolerability into the fda approval process.Jama, 303(10):979– 980, 2010

    Alec B O’Connor. Building comparative efficacy and tolerability into the fda approval process.Jama, 303(10):979– 980, 2010

  4. [4]

    Perrine Janiaud, Telba Irony, Estelle Russek-Cohen, and Steven N Goodman. U.S. food and drug administration reasoning in approval decisions when efficacy evidence is borderline, 2013-2018.Ann. Intern. Med., 174(11):1603– 1611, November 2021

  5. [5]

    Alberto Farina, Federico Moro, Frederick Fasslrinner, Annahita Sedghi, Miluska Bromley, and Timo Siepmann. Strength of clinical evidence leading to approval of novel cancer medicines in europe: A systematic review and data synthesis.Pharmacology Research & Perspectives, 9(4):e00816, 2021

  6. [6]

    Are clinical trials a cost-effective investment?Jama, 262(13):1795–1800, 1989

    Allan S Detsky. Are clinical trials a cost-effective investment?Jama, 262(13):1795–1800, 1989

  7. [7]

    Why are clinical costs so high?Nature Reviews Drug Discovery, 2(11), 2003

    Simon Frantz. Why are clinical costs so high?Nature Reviews Drug Discovery, 2(11), 2003

  8. [8]

    How much do clinical trials cost?Nature Reviews Drug Discovery, 16(6):381–382, 2017

    Linda Martin, Melissa Hutchens, Conrad Hawkins, and Alaina Radnov. How much do clinical trials cost?Nature Reviews Drug Discovery, 16(6):381–382, 2017

  9. [9]

    Adaptive designs for randomized trials in public health.Annual review of public health, 30(1):1–25, 2009

    C Hendricks Brown, Thomas R Ten Have, Booil Jo, Getachew Dagne, Peter A Wyman, Bengt Muthén, and Robert D Gibbons. Adaptive designs for randomized trials in public health.Annual review of public health, 30(1):1–25, 2009. 11

  10. [10]

    Adaptive design clinical trials: Methodology, challenges and prospect.Indian journal of pharmacology, 42(4):201–207, 2010

    Rajiv Mahajan and Kapil Gupta. Adaptive design clinical trials: Methodology, challenges and prospect.Indian journal of pharmacology, 42(4):201–207, 2010

  11. [11]

    Food and Drug Administration

    U.S. Food and Drug Administration. Use of bayesian methodology in clinical trials of drug and biological products. Draft guidance, Center for Biologics Evaluation and Research and Center for Drug Evaluation and Research, Food and Drug Administration, March 2026

  12. [12]

    https://grants.nih.gov/policy-and-compliance/policy- topics/clinical-trials/specific-funding-opportunities

    Clinical trial-specific funding opportunities. https://grants.nih.gov/policy-and-compliance/policy- topics/clinical-trials/specific-funding-opportunities. Accessed: 2026-03-31

  13. [13]

    Accessed: 2026-03-31

    Clinical trials grants program.https://www.fda.gov/industry/orphan-products-grants-program/clinical- trials-grants-program. Accessed: 2026-03-31

  14. [14]

    Accessed: 2026-03-31

    Clinical trials.https://www.dfg.de/en/research-funding/funding-opportunities/programmes/individual/ clinical-trials. Accessed: 2026-03-31

  15. [15]

    Accessed: 2026-03- 31

    The european and developing countries clinical trials partnership.https://www.edctp.org/. Accessed: 2026-03- 31

  16. [16]

    Principal-agent hypothesis testing.arXiv preprint arXiv:2205.06812, 2022

    Stephen Bates, Michael I Jordan, Michael Sklar, and Jake A Soloff. Principal-agent hypothesis testing.arXiv preprint arXiv:2205.06812, 2022

  17. [17]

    Sharp results for hypothesis testing with risk-sensitive agents.arXiv preprint arXiv:2412.16452, 2024

    Flora C Shi, Stephen Bates, and Martin J Wainwright. Sharp results for hypothesis testing with risk-sensitive agents.arXiv preprint arXiv:2412.16452, 2024

  18. [18]

    Strategic hypothesis testing

    Safwan Hossain, Yatong Chen, and Yiling Chen. Strategic hypothesis testing. InThe Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS), 2025

  19. [19]

    An analysis of the principal-agent problem

    Sanford J Grossman and Oliver D Hart. An analysis of the principal-agent problem. InFoundations of insurance economics: Readings in economics and finance, pages 302–340. Springer, 1992

  20. [20]

    Game-theoretic statistics and safe anytime-valid inference, 2023

    Aaditya Ramdas, Peter Grünwald, Vladimir Vovk, and Glenn Shafer. Game-theoretic statistics and safe anytime-valid inference, 2023

  21. [21]

    Planning and acting in partially observable stochastic domains.Artificial intelligence, 101(1-2):99–134, 1998

    Leslie Pack Kaelbling, Michael L Littman, and Anthony R Cassandra. Planning and acting in partially observable stochastic domains.Artificial intelligence, 101(1-2):99–134, 1998

  22. [22]

    An economic theory of statistical testing

    Aleksey Tetenov. An economic theory of statistical testing. CeMMAP working papers 50/16, Institute for Fiscal Studies, Sep 2016

  23. [23]

    Experimentation and approval mechanisms.Econometrica, 90(5):2215–2247, 2022

    Andrew McClellan. Experimentation and approval mechanisms.Econometrica, 90(5):2215–2247, 2022

  24. [24]

    Safe testing.Journal of the Royal Statistical Society Series B: Statistical Methodology, 86(5):1091–1128, 03 2024

    Peter Grünwald, Rianne de Heide, and Wouter Koolen. Safe testing.Journal of the Royal Statistical Society Series B: Statistical Methodology, 86(5):1091–1128, 03 2024

  25. [25]

    Hypothesis testing with e-values.Foundations and Trends in Statistics, 1(1-2):1–390, 07 2025

    Aaditya Ramdas and Ruodu Wang. Hypothesis testing with e-values.Foundations and Trends in Statistics, 1(1-2):1–390, 07 2025

  26. [26]

    Estimating means of bounded random variables by betting, 2022

    Ian Waudby-Smith and Aaditya Ramdas. Estimating means of bounded random variables by betting, 2022

  27. [27]

    Online multiple testing with e-values

    Ziyu Xu and Aaditya Ramdas. Online multiple testing with e-values. In Sanjoy Dasgupta, Stephan Mandt, and Yingzhen Li, editors,Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, volume 238 ofProceedings of Machine Learning Research, pages 3997–4005. PMLR, 02–04 May 2024

  28. [28]

    E-detectors: A nonparametric framework for sequential change detection.The New England Journal of Statistics in Data Science, 2(2):229–260, 2024

    Jaehyeok Shin, Aaditya Ramdas, and Alessandro Rinaldo. E-detectors: A nonparametric framework for sequential change detection.The New England Journal of Statistics in Data Science, 2(2):229–260, 2024

  29. [29]

    Nonparametric two-sample testing by betting.IEEE Trans

    Shubhanshu Shekhar and Aaditya Ramdas. Nonparametric two-sample testing by betting.IEEE Trans. Inf. Theor., 70(2):1178–1203, February 2024

  30. [30]

    Ian Waudby-Smith, Ricardo Sandoval, and Michael I. Jordan. Universal log-optimality for general classes of e-processes and sequential hypothesis tests, 2025. 12

  31. [31]

    Post-hoc large-sample statistical inference.arXiv preprint arXiv:2603.08002,

    Ben Chugg, Etienne Gauthier, Michael I Jordan, Aaditya Ramdas, and Ian Waudby-Smith. Post-hoc large-sample statistical inference.arXiv preprint arXiv:2603.08002, 2026

  32. [32]

    Etienne Gauthier, Francis Bach, and Michael I. Jordan. Backward conformal prediction. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  33. [33]

    Towards anytime-valid statistical watermarking.arXiv preprint arXiv:2602.17608, 2026

    Baihe Huang, Eric Xu, Kannan Ramchandran, Jiantao Jiao, and Michael I Jordan. Towards anytime-valid statistical watermarking.arXiv preprint arXiv:2602.17608, 2026

  34. [34]

    Auditing pay-per-token in large language models

    Ander Artola Velasco, Stratis Tsirtsis, and Manuel Gomez Rodriguez. Auditing pay-per-token in large language models. InThe 29th International Conference on Artificial Intelligence and Statistics, 2026

  35. [35]

    Dhillon, Javier Gonzalez, Teodora Pandeva, and Alicia Curth

    Guneet S. Dhillon, Javier Gonzalez, Teodora Pandeva, and Alicia Curth. E-scores for (in)correctness assessment of generative model outputs. InThe 29th International Conference on Artificial Intelligence and Statistics, 2026

  36. [36]

    Optimal stopping.The American Mathematical Monthly, 77(4):333–343, 1970

    Herbert Robbins. Optimal stopping.The American Mathematical Monthly, 77(4):333–343, 1970

  37. [37]

    Lectures in Mathematics

    Goran Peskir and Albert N Shiryaev.Optimal stopping and free-boundary problems. Lectures in Mathematics. ETH Zürich. Birkhauser Verlag AG, Basel, Switzerland, 2006 edition, August 2006

  38. [38]

    Wiley Series in Probability and Statistics

    Warren B Powell and Ilya O Ryzhov.Optimal Learning. Wiley Series in Probability and Statistics. Wiley-Blackwell, Hoboken, NJ, March 2012

  39. [39]

    Bayesian reinforcement learning: A survey.Foundations and Trends®in Machine Learning, 8(5–6):359–483, November 2015

    Mohammad Ghavamzadeh, Shie Mannor, Joelle Pineau, and Aviv Tamar. Bayesian reinforcement learning: A survey.Foundations and Trends®in Machine Learning, 8(5–6):359–483, November 2015

  40. [40]

    Bayesian learning approach to model predictive control.arXiv preprint arXiv:2203.02720, 2022

    Namhoon Cho, Seokwon Lee, Hyo-Sang Shin, and Antonios Tsourdos. Bayesian learning approach to model predictive control.arXiv preprint arXiv:2203.02720, 2022

  41. [41]

    Minimax-bayes reinforcement learning, 2023

    Thomas Kleine Buening, Christos Dimitrakakis, Hannes Eriksson, Divya Grover, and Emilio Jorge. Minimax-bayes reinforcement learning, 2023

  42. [42]

    Wanggang Shen and Xun Huan. Bayesian sequential optimal experimental design for nonlinear models using policy gradient reinforcement learning.Computer Methods in Applied Mechanics and Engineering, 416:116304, 2023

  43. [43]

    Optimal stopping for sequential bayesian experimental design.arXiv preprint arXiv:2509.21734, 2025

    Chen Cheng and Xun Huan. Optimal stopping for sequential bayesian experimental design.arXiv preprint arXiv:2509.21734, 2025

  44. [44]

    Variational sequential optimal experimental design using reinforcement learning.Comput

    Wanggang Shen, Jiayuan Dong, and Xun Huan. Variational sequential optimal experimental design using reinforcement learning.Comput. Methods Appl. Mech. Eng., 444(118068):118068, September 2025

  45. [45]

    A. Wald. Sequential Tests of Statistical Hypotheses.The Annals of Mathematical Statistics, 16(2):117 – 186, 1945

  46. [46]

    de la Peña

    Victor H. de la Peña. A General Class of Exponential Inequalities for Martingales and Ratios.The Annals of Probability, 27(1):537 – 564, 1999

  47. [47]

    Howard, Aaditya Ramdas, Jon McAuliffe, and Jasjeet Sekhon

    Steven R. Howard, Aaditya Ramdas, Jon McAuliffe, and Jasjeet Sekhon. Time-uniform Chernoff bounds via nonnegative supermartingales.Probability Surveys, 17(none):257 – 317, 2020

  48. [48]

    A systematic review and critical assessment of incentive strategies for discovery and development of novel antibiotics.J

    Matthew J Renwick, David M Brogan, and Elias Mossialos. A systematic review and critical assessment of incentive strategies for discovery and development of novel antibiotics.J. Antibiot. (Tokyo), 69(2):73–88, February 2016

  49. [49]

    Government r&d subsidies and enterprise r&d activities: theory and evidence

    Wan-Shu Wu and Kai Zhao. Government r&d subsidies and enterprise r&d activities: theory and evidence. Economic Research-Ekonomska Istraživanja, 35(1):391–408, 2022

  50. [50]

    Bayesian decision theory, rule utilitarianism, and arrow’s impossibility theorem.Theory and Decision, 11(3):289–317, 1979

    John C Harsanyi. Bayesian decision theory, rule utilitarianism, and arrow’s impossibility theorem.Theory and Decision, 11(3):289–317, 1979. 13

  51. [51]

    Sutton and Andrew G

    Richard S. Sutton and Andrew G. Barto.Reinforcement Learning: An Introduction. The MIT Press, second edition, 2018

  52. [52]

    A tutorial on thompson sampling, 2020

    Daniel Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, and Zheng Wen. A tutorial on thompson sampling, 2020

  53. [53]

    Springer, Berlin, Germany, 2011 edition, August 2010

    Heinrich von Stackelberg.Market Structure and Equilibrium. Springer, Berlin, Germany, 2011 edition, August 2010

  54. [54]

    Congress

    U.S. Congress. 26 U.S.C. 45C — clinical testing expenses for certain drugs for rare diseases or conditions. Internal Revenue Code, 2024. Accessed: 2026-05-03

  55. [55]

    Small business funding.https://seed.nih.gov/small-business-funding, 2026

    National Institutes of Health. Small business funding.https://seed.nih.gov/small-business-funding, 2026. Accessed: 2026-05-03

  56. [56]

    Springer New York, 1996

    Onésimo Hernández-Lerma and Jean Bernard Lasserre.Discrete-Time Markov Control Processes. Springer New York, 1996

  57. [57]

    Computing the optimal strategy to commit to

    Vincent Conitzer and Tuomas Sandholm. Computing the optimal strategy to commit to. InProceedings of the 7th ACM Conference on Electronic Commerce, EC ’06, page 82–90, New York, NY, USA, 2006. Association for Computing Machinery

  58. [58]

    Playing games for security: An efficient exact algorithm for solving bayesian stackelberg games

    Praveen Paruchuri, Jonathan P Pearce, Janusz Marecki, Milind Tambe, Fernando Ordonez, and Sarit Kraus. Playing games for security: An efficient exact algorithm for solving bayesian stackelberg games. InProceedings of the 7th international joint conference on Autonomous agents and multiagent systems-Volume 2, pages 895–902, 2008

  59. [59]

    Global burden of bacterial antimicrobial resistance 1990-2021: a systematic analysis with forecasts to 2050.Lancet, 404(10459):1199–1226, September 2024

    GBD 2021 Antimicrobial Resistance Collaborators. Global burden of bacterial antimicrobial resistance 1990-2021: a systematic analysis with forecasts to 2050.Lancet, 404(10459):1199–1226, September 2024

  60. [60]

    Tackling the threat of antimicrobial resistance: from policy to sustainable action.Philos

    Laura J Shallcross, Simon J Howard, Tom Fowler, and Sally C Davies. Tackling the threat of antimicrobial resistance: from policy to sustainable action.Philos. Trans. R. Soc. Lond. B Biol. Sci., 370(1670):20140082, June 2015

  61. [61]

    Approval and withdrawal of new antibiotics and other antiinfectives in the U.S., 1980-2009.J

    Kevin Outterson, John H Powers, Enrique Seoane-Vazquez, Rosa Rodriguez-Monguio, and Aaron S Kesselheim. Approval and withdrawal of new antibiotics and other antiinfectives in the U.S., 1980-2009.J. Law Med. Ethics, 41(3):688–696, 2013

  62. [62]

    The economic conundrum for antibacterial drugs.Antimicrob

    David M Shlaes. The economic conundrum for antibacterial drugs.Antimicrob. Agents Chemother., 64(1), December 2019

  63. [63]

    Current economic and regulatory challenges in developing antibiotics for gram-negative bacteria.NPJ Antimicrob

    Nupur Gargate, Mark Laws, and Khondaker Miraz Rahman. Current economic and regulatory challenges in developing antibiotics for gram-negative bacteria.NPJ Antimicrob. Resist., 3(1):50, June 2025

  64. [64]

    Why big pharma has abandoned antibiotics.Nature, 586(7830):S50–S52, October 2020

    Benjamin Plackett. Why big pharma has abandoned antibiotics.Nature, 586(7830):S50–S52, October 2020

  65. [65]

    Looking for solutions to the pitfalls of developing novel antibacterials in an economically challenging system.Microbiol

    Gilles Courtemanche, Rohini Wadanamby, Amritanjali Kiran, Luisa Fernanda Toro-Alzate, Mathew Diggle, Dipanjan Chakraborty, Ariel Blocker, and Maarten van Dongen. Looking for solutions to the pitfalls of developing novel antibacterials in an economically challenging system.Microbiol. Res. (Pavia), 12(1):173–185, March 2021

  66. [66]

    Advancing global antibiotic research, development and access.Nat

    Laura J V Piddock, Yewande Alimi, James Anderson, Damiano de Felice, Catrin E Moore, John-Arne Røttingen, Henry Skinner, and Peter Beyer. Advancing global antibiotic research, development and access.Nat. Med., 30(9):2432–2443, September 2024

  67. [67]

    Novel insights from financial analysis of the failure to commercialise plazomicin: Implications for the antibiotic investment ecosystem.Humanit

    Nadya Wells, Vinh-Kim Nguyen, and Stephan Harbarth. Novel insights from financial analysis of the failure to commercialise plazomicin: Implications for the antibiotic investment ecosystem.Humanit. Soc. Sci. Commun., 11(1), July 2024

  68. [68]

    Accelerating global innovation to address antibacterial resistance: introducing CARB-X.Nat

    Kevin Outterson, John H Rex, Tim Jinks, Peter Jackson, John Hallinan, Steve Karp, Deborah T Hung, Francois Franceschi, Tyler Merkeley, Christopher Houchens, Dennis M Dixon, Michael G Kurilla, Rosemarie Aurigemma, and Joseph Larsen. Accelerating global innovation to address antibacterial resistance: introducing CARB-X.Nat. Rev. Drug Discov., 15(9):589–590,...

  69. [69]

    Challenges and opportunities for incentivising antibiotic research and development in europe.Lancet Reg

    Michael Anderson, Dimitra Panteli, Robin van Kessel, Gunnar Ljungqvist, Francesca Colombo, and Elias Mossialos. Challenges and opportunities for incentivising antibiotic research and development in europe.Lancet Reg. Health Eur., 33(100705):100705, October 2023

  70. [70]

    United States Congress. H.R. 7352: PASTEUR Act of 2026, 2026. To amend the Public Health Service Act to establish a program to develop innovative antimicrobial drugs

  71. [71]

    Market concentration of new antibiotic sales

    Sakib Rahman, Olof Lindahl, Chantal M Morel, and Aidan Hollis. Market concentration of new antibiotic sales. J. Antibiot. (Tokyo), 74(6):421–423, June 2021

  72. [72]

    Cost drivers of a hospital-acquired bacterial pneumonia and ventilator- associated bacterial pneumonia phase 3 clinical trial.Clin

    Stella Stergiopoulos, Sara B Calvert, Carrie A Brown, Josephine Awatin, Pamela Tenaerts, Thomas L Holland, Joseph A DiMasi, and Kenneth A Getz. Cost drivers of a hospital-acquired bacterial pneumonia and ventilator- associated bacterial pneumonia phase 3 clinical trial.Clin. Infect. Dis., 66(1):72–80, January 2018

  73. [73]

    Estimated costs of pivotal trials for novel therapeutic agents approved by the US food and drug administration, 2015-2016.JAMA Intern

    Thomas J Moore, Hanzhe Zhang, Gerard Anderson, and G Caleb Alexander. Estimated costs of pivotal trials for novel therapeutic agents approved by the US food and drug administration, 2015-2016.JAMA Intern. Med., 178(11):1451–1457, November 2018

  74. [74]

    Instance-adaptive hypothesis tests with heterogeneous agents.arXiv preprint arXiv:2510.21178, 2025

    Flora C Shi, Martin J Wainwright, and Stephen Bates. Instance-adaptive hypothesis tests with heterogeneous agents.arXiv preprint arXiv:2510.21178, 2025

  75. [75]

    Campbell, N Balakrishnan, and Brani Vidakovic.Encyclopedia of statistical sciences

    B. Campbell, N Balakrishnan, and Brani Vidakovic.Encyclopedia of statistical sciences. Methods and Applications of Statistics. John Wiley & Sons, Nashville, TN, 2 edition, December 2005

  76. [76]

    False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant.Psychol

    Joseph P Simmons, Leif D Nelson, and Uri Simonsohn. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant.Psychol. Sci., 22(11):1359–1366, November 2011

  77. [77]

    Etienne Gauthier, Francis Bach, and Michael I. Jordan. Betting on equilibrium: Monitoring strategic behavior in multi-agent systems, 2026

  78. [78]

    Ville.Étude Critique de la Notion de Collectif

    J. Ville.Étude Critique de la Notion de Collectif. Collection des monographies des probabilités. Gauthier-Villars, 1939

  79. [79]

    RejectH0

    R.T. Rockafellar.Convex Analysis. Princeton landmarks in mathematics and physics. Princeton University Press, 1970. 15 A Summary of Notation In Table 1 we summarize the key symbols used in the main body of the paper. Table 1:Summary of notation. Symbol Description κPrincipal’s false positive rate bound θ∗ True (unknown) product efficacy θb Baseline effica...

  80. [80]

    IfA L =A R, then ¯U A(πε;ε) =V 0 L +ε·A L for allε∈[ε L, εR]

Showing first 80 references.