arxiv: 2605.12111 · v1 · submitted 2026-05-12 · 💻 cs.AI · cs.DS

Recognition: no theorem link

Adaptive Multi-Round Allocation with Stochastic Arrivals

Yuqi Pan , Davin Choo , Haichuan Wang , Milind Tambe , Alastair van Heerden , Cheryl Johnson

Authors on Pith no claims yet

Pith reviewed 2026-05-13 06:08 UTC · model grok-4.3

classification 💻 cs.AI cs.DS

keywords resource allocationdynamic programmingstochastic arrivalsnetwork recruitmentsurrogate value functionpolynomial complexityprobability generating functions

0 comments

The pith

A population-level surrogate value function depending only on remaining budget and frontier size makes multi-round stochastic allocation solvable via exact polynomial-time dynamic programming.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies how to allocate a fixed budget of resources across multiple rounds to individuals who generate new referral opportunities stochastically, with each additional resource to the same person showing diminishing returns. The single-round version has a simple greedy solution based on marginal survival probabilities, but the multi-round version produces an intractable Bellman recursion because the frontier of potential recipients evolves in a high-dimensional stochastic way. The authors introduce a surrogate value function that collapses the state to just the remaining budget and the current frontier size. This surrogate supports an exact dynamic program computed with truncated probability generating functions, giving a planning algorithm whose running time is polynomial in the total budget. The method also comes with a proven error bound under model misspecification and is tested on recruitment-style scenarios.

Core claim

We introduce a population-level surrogate value function that depends only on the remaining budget and frontier size. This surrogate enables an exact dynamic program via truncated probability generating functions, yielding a planning algorithm with polynomial complexity in the total budget. We further analyze robustness under model misspecification, proving a multi-round error bound that decomposes into a tight single-round frontier error and a population-level transition error.

What carries the argument

The population-level surrogate value function depending only on remaining budget and frontier size, which replaces the full high-dimensional stochastic frontier state inside the dynamic program and permits exact computation through truncated probability generating functions.

If this is right

The single-round allocation problem admits an exact greedy solution based on marginal survival probabilities.
The multi-round planning problem can be solved exactly in time polynomial in the total budget.
The approximation error over multiple rounds decomposes into a single-round frontier error term and a population-level transition error term.
The resulting algorithm can be evaluated directly on recruitment scenarios drawn from real-world data distributions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same surrogate reduction might apply to other sequential allocation settings that combine stochastic arrivals with diminishing returns per recipient.
One could measure the surrogate's fidelity by running both the surrogate DP and a small-scale exact DP on identical instances and plotting the gap as budget grows.
The polynomial dependence on budget suggests the method could scale to budgets large enough for practical network-recruitment campaigns where exact dynamic programming was previously impossible.

Load-bearing premise

The population-level surrogate that depends only on remaining budget and frontier size is a close enough approximation to the true value function even though the frontier evolves stochastically in high dimension.

What would settle it

Compute the exact Bellman values on small-budget instances where the full frontier state can be enumerated, then check whether the surrogate-based dynamic program produces values within the derived multi-round error bound.

Figures

Figures reproduced from arXiv: 2605.12111 by Alastair van Heerden, Cheryl Johnson, Davin Choo, Haichuan Wang, Milind Tambe, Yuqi Pan.

**Figure 1.** Figure 1: The mapping from covariates to an arrival distribution can be learnt from historical data and domain knowledge. Our setting is related to classical problems such as stochastic knapsack, budgeted bandits, and prophet inequalities; see Section 2. However, it differs in a fundamental way: allocation decisions not only consume budget and yield immediate reward, but also alter the distribution of future decisi… view at source ↗

**Figure 2.** Figure 2: illustrates the schematic timeline of events. Observe frontier arrival distributions D1, . . . , Dn Decide allocation k = (k1, . . . , kn) ∈ N n Observe realizations X1:n ∼ D1:n Observe next frontier D′ 1 , . . . , D′N Time [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Experimental plots for simulated referrals from ICPSR HIV network; see Section D for other diseases. We compare our proposed policy π our (solid black line) against the constant- and greedy-allocation baselines described in Section 8 for discount factors γ ∈ {0.5, 0.7, 0.9} and initial frontier size n ∈ {5, 10, 15}. Each configuration is evaluated over 30 independent runs and we report mean accumulated dis… view at source ↗

**Figure 4.** Figure 4: Experimental plots for realized referrals from ICPSR HIV network; see Section D for other diseases. We compare our proposed policy π our (solid black line) against the constant- and greedy-allocation baselines described in Section 8 for discount factors γ ∈ {0.5, 0.7, 0.9} and initial frontier size n ∈ {5, 10, 15}. Each configuration is evaluated over 30 independent runs and we report mean accumulated disc… view at source ↗

**Figure 5.** Figure 5: Experimental plots for simulated referrals from ICPSR Chlamydia network. We compare our proposed policy π our (solid black line) against the constant- and greedy-allocation baselines described in Section 8 for discount factors γ ∈ {0.5, 0.7, 0.9} and initial frontier size n ∈ {5, 10, 15}. Each configuration is evaluated over 30 independent runs and we report mean accumulated discounted reward with standard… view at source ↗

**Figure 6.** Figure 6: Experimental plots for realized referrals from ICPSR Chlamydia network. We compare our proposed policy π our (solid black line) against the constant- and greedy-allocation baselines described in Section 8 for discount factors γ ∈ {0.5, 0.7, 0.9} and initial frontier size n ∈ {5, 10, 15}. Each configuration is evaluated over 30 independent runs and we report mean accumulated discounted reward with standard … view at source ↗

**Figure 7.** Figure 7: Experimental plots for simulated referrals from ICPSR Gonorrhea network. We compare our proposed policy π our (solid black line) against the constant- and greedy-allocation baselines described in Section 8 for discount factors γ ∈ {0.5, 0.7, 0.9} and initial frontier size n ∈ {5, 10, 15}. Each configuration is evaluated over 30 independent runs and we report mean accumulated discounted reward with standard… view at source ↗

**Figure 8.** Figure 8: Experimental plots for realized referrals from ICPSR Gonorrhea network. We compare our proposed policy π our (solid black line) against the constant- and greedy-allocation baselines described in Section 8 for discount factors γ ∈ {0.5, 0.7, 0.9} and initial frontier size n ∈ {5, 10, 15}. Each configuration is evaluated over 30 independent runs and we report mean accumulated discounted reward with standard … view at source ↗

**Figure 9.** Figure 9: Experimental plots for simulated referrals from ICPSR Hepatitis network. We compare our proposed policy π our (solid black line) against the constant- and greedy-allocation baselines described in Section 8 for discount factors γ ∈ {0.5, 0.7, 0.9} and initial frontier size n ∈ {5, 10, 15}. Each configuration is evaluated over 30 independent runs and we report mean accumulated discounted reward with standard… view at source ↗

**Figure 10.** Figure 10: Experimental plots for realized referrals from ICPSR Hepatitis network. We compare our proposed policy π our (solid black line) against the constant- and greedy-allocation baselines described in Section 8 for discount factors γ ∈ {0.5, 0.7, 0.9} and initial frontier size n ∈ {5, 10, 15}. Each configuration is evaluated over 30 independent runs and we report mean accumulated discounted reward with standard… view at source ↗

**Figure 11.** Figure 11: Experimental plots for simulated referrals from ICPSR Syphilis network. We compare our proposed policy π our (solid black line) against the constant- and greedy-allocation baselines described in Section 8 for discount factors γ ∈ {0.5, 0.7, 0.9} and initial frontier size n ∈ {5, 10, 15}. Each configuration is evaluated over 30 independent runs and we report mean accumulated discounted reward with standard… view at source ↗

**Figure 12.** Figure 12: Experimental plots for realized referrals from ICPSR Syphilis network. We compare our proposed policy π our (solid black line) against the constant- and greedy-allocation baselines described in Section 8 for discount factors γ ∈ {0.5, 0.7, 0.9} and initial frontier size n ∈ {5, 10, 15}. Each configuration is evaluated over 30 independent runs and we report mean accumulated discounted reward with standard … view at source ↗

read the original abstract

We study a sequential resource allocation problem motivated by adaptive network recruitment, in which a limited budget of identical resources must be allocated over multiple rounds to individuals with stochastic referral capacity. Successful referrals endogenously generate future decision opportunities while allocating additional resources to an individual exhibits diminishing returns. We first show that the single-round allocation problem admits an exact greedy solution based on marginal survival probabilities. In the multi-round setting, the resulting Bellman recursion is intractable due to the stochastic, high-dimensional evolution of the frontier. To address this, we introduce a population-level surrogate value function that depends only on the remaining budget and frontier size. This surrogate enables an exact dynamic program via truncated probability generating functions, yielding a planning algorithm with polynomial complexity in the total budget. We further analyze robustness under model misspecification, proving a multi-round error bound that decomposes into a tight single-round frontier error and a population-level transition error. Finally, we evaluate our method on real-world inspired recruitment scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a workable polynomial-time planner for multi-round stochastic network allocation via a budget-and-size surrogate plus truncated PGFs, but the approximation's tightness under heterogeneous referrals is the part that needs proof-level checking.

read the letter

The main thing to know is that the authors reduce an otherwise intractable multi-round allocation problem to a dynamic program whose state is just remaining budget and frontier cardinality. They first solve the single-round version exactly with a greedy rule on marginal survival probabilities. For the multi-round case they replace the full high-dimensional frontier with a population-level surrogate value function, then compute its transitions exactly using truncated probability generating functions. This yields a planning algorithm whose time is polynomial in the total budget. They also give a multi-round error bound that splits into single-round frontier error plus population transition error, plus some robustness analysis and experiments on recruitment-style instances. The combination is new relative to the single-round greedy literature they cite, and the technical steps stay within standard DP and generating-function tools without extra fitted parameters or circular definitions. The approach is clean and the complexity claim is concrete. The soft spot is whether the size-only surrogate actually keeps the error controlled when referral capacities differ across individuals. The stress-test note is on point here: heterogeneous stochastic arrivals can make frontier size a weak statistic, and nothing in the abstract rules out cases where the aggregation error grows over rounds. The paper asserts the bound decomposes nicely, but that rests on the surrogate preserving enough of the Markov property; the full proofs and any supporting experiments will have to show whether the bound remains useful in realistic heterogeneity regimes. This is for people working on stochastic resource allocation in networks, especially adaptive recruitment or influence problems. A reader who needs a practical planner with some guarantees will find the method and the error decomposition useful. It is worth sending to peer review so that referees can verify the surrogate construction and the claimed bound in detail.

Referee Report

2 major / 3 minor

Summary. The manuscript studies a sequential resource allocation problem for adaptive network recruitment, where a fixed budget of identical resources is allocated over multiple rounds to individuals whose successful referrals stochastically generate new allocation opportunities, subject to diminishing returns. It shows that the single-round subproblem admits an exact greedy solution based on marginal survival probabilities. For the multi-round case, the Bellman recursion is intractable due to the high-dimensional stochastic frontier evolution; the authors introduce a population-level surrogate value function depending only on remaining budget B and frontier cardinality n. This surrogate is claimed to enable an exact dynamic program via truncated probability generating functions, producing a planning algorithm with polynomial complexity in the total budget. The paper further derives a multi-round error bound under model misspecification that decomposes into single-round frontier error plus population-level transition error, and reports numerical evaluation on real-world-inspired recruitment scenarios.

Significance. If the surrogate approximation is shown to be sufficiently accurate and the error bounds hold uniformly, the work would provide a useful algorithmic contribution to scalable planning in stochastic networked allocation problems. The polynomial dependence on budget (rather than frontier size or horizon) and the explicit error decomposition are technically attractive features that could extend to related domains such as viral marketing or epidemic control. The use of truncated PGFs for exact computation under the reduced state is a positive technical choice.

major comments (2)

[Abstract and §4] Abstract and §4: The central claim that the population-level surrogate V(B, n) 'enables an exact dynamic program' and yields polynomial complexity rests on the unproven assertion that aggregation by frontier size alone preserves the Markov property and bounds transition error under heterogeneous referral distributions. The skeptic note correctly identifies that nothing in the given description rules out regimes where variance from distinct referral capacities makes the size statistic insufficient, which directly undermines the claimed multi-round error bound.
[§5.1] §5.1, error-bound decomposition: The multi-round error bound is stated to separate cleanly into single-round frontier error plus population-level transition error, but the argument does not address possible correlations induced by the shared remaining budget across rounds or by the truncation of the PGF; without an explicit uniform bound on these terms, the decomposition does not yet guarantee that the surrogate policy remains near-optimal.

minor comments (3)

[§6] The experimental section provides only high-level descriptions of the recruitment scenarios; specific parameter values, number of Monte Carlo replications, and statistical significance tests for the reported performance gaps are missing.
Notation for the truncated PGF (e.g., the truncation threshold and the exact form of the generating function) is introduced without a self-contained definition in the main text, forcing the reader to infer details from the appendix.
[§6] No comparison is reported against natural baselines such as myopic single-round greedy allocation or a fixed-horizon approximation that ignores future referrals entirely.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major comment below, clarifying the technical status of the surrogate model and error analysis.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4: The central claim that the population-level surrogate V(B, n) 'enables an exact dynamic program' and yields polynomial complexity rests on the unproven assertion that aggregation by frontier size alone preserves the Markov property and bounds transition error under heterogeneous referral distributions. The skeptic note correctly identifies that nothing in the given description rules out regimes where variance from distinct referral capacities makes the size statistic insufficient, which directly undermines the claimed multi-round error bound.

Authors: The surrogate V(B, n) is defined as the optimal value function for an auxiliary population-level MDP whose state is exactly (remaining budget B, frontier cardinality n) and whose transitions are obtained by averaging the referral distribution over the current frontier. By construction, this auxiliary process is Markovian in (B, n). The dynamic program is therefore exact for the surrogate; the polynomial-time algorithm follows from the fact that the next-state distribution can be computed exactly via a truncated probability generating function whose size is linear in B. The multi-round error bound (Theorem 5.1) then measures the sub-optimality of the surrogate policy relative to the true heterogeneous process, with the population-level transition error term explicitly controlling the effect of heterogeneity. We agree that the manuscript would benefit from a clearer separation between the exact surrogate MDP and the approximation error; we will revise §4 to state the Markov property of the surrogate formally and to emphasize that the error bound, rather than exact optimality, is the guarantee provided under heterogeneity. revision: partial
Referee: [§5.1] §5.1, error-bound decomposition: The multi-round error bound is stated to separate cleanly into single-round frontier error plus population-level transition error, but the argument does not address possible correlations induced by the shared remaining budget across rounds or by the truncation of the PGF; without an explicit uniform bound on these terms, the decomposition does not yet guarantee that the surrogate policy remains near-optimal.

Authors: The proof in Appendix C proceeds by induction on the horizon, conditioning each one-step error on the realized remaining budget B_t at that round; because B_t is part of the surrogate state, the shared-budget dependence is already internalized. The resulting bound is taken uniformly over all feasible B by using the trivial Lipschitz constant equal to the maximum single-round reward. For PGF truncation we fix a threshold T such that the total-variation distance between the true and truncated transition kernels is at most ε; this ε is absorbed into the population-level transition error with an explicit multiplicative factor of 2 (from the triangle inequality in the value-function difference). We will add a short lemma in §5.1 that records this uniform bound over B and the explicit contribution of truncation error, thereby making the decomposition fully rigorous. revision: yes

Circularity Check

0 steps flagged

No circularity in surrogate DP construction or error bounds

full rationale

The derivation introduces an explicit population-level surrogate value function indexed only by remaining budget and frontier cardinality as a modeling approximation to the high-dimensional state; this surrogate then admits an exact (surrogate) DP via truncated PGFs, with polynomial complexity following directly from the reduced state space. The multi-round error bound is obtained by standard decomposition into single-round frontier error plus population-level transition error, without any reduction of the claimed result to a fitted parameter, self-referential definition, or load-bearing self-citation. The central claims remain self-contained against external benchmarks and do not collapse to their inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Review performed on abstract only; full manuscript details on assumptions and parameters are unavailable. The surrogate itself is an ad-hoc modeling choice introduced to restore tractability.

axioms (2)

domain assumption Stochastic referral arrivals with diminishing marginal returns on repeated allocations to the same individual
Stated as the motivating model in the problem definition.
ad hoc to paper Population-level summary (budget + frontier size) suffices to approximate the true value function
Introduced explicitly to make the Bellman recursion tractable.

pith-pipeline@v0.9.0 · 5478 in / 1431 out tokens · 95433 ms · 2026-05-13T06:08:24.002004+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

[1]

Hilbe, Joseph M. , year=

work page
[2]

and Hilbe, Joseph M

Hardin, James W. and Hilbe, Joseph M. , year=

work page
[3]

and Goemans, Michel X

Dean, Brian C. and Goemans, Michel X. and Vondr. Mathematics of Operations Research , year=

work page
[4]

Gupta, Anupam and Krishnaswamy, Ravishankar and Molinaro, Marco and Ravi, Ramamoorthi , booktitle=

work page
[5]

Badanidiyuru, Ashwinkumar and Kleinberg, Robert and Slivkins, Aleksandrs , journal=

work page
[6]

Samuel-Cahn, Ester , journal=

work page
[7]

Krengel, Ulrich and Sucheston, Louis , journal=

work page
[8]

Kleinberg, Robert and Weinberg, Seth Matthew , booktitle=

work page
[9]

Hajiaghayi, Mohammad Taghi and Kleinberg, Robert and Sandholm, Tuomas , booktitle=

work page
[10]

Conference on Economics and Computation (EC) , year=

D. Conference on Economics and Computation (EC) , year=

work page
[11]

Altman, Eitan , year=

work page
[12]

Puterman, Martin L. , year=

work page
[13]

, booktitle=

Bertsekas, Dimitri P. , booktitle=. 2025 , publisher=

work page 2025
[14]

Bertsekas, Dimitri P. , year=

work page
[15]

Powell, Warren B. , year=

work page
[16]

2011 , publisher=

Morris, Martina and Rothenberg, Richard , journal=. 2011 , publisher=

work page 2011
[17]

, journal=

Heckathorn, Douglas D. , journal=

work page
[18]

, journal=

Goel, Sharad and Salganik, Matthew J. , journal=

work page
[19]

and Lowe, Will , journal=

Munzert, Simon and Selb, Peter and Gohdes, Anita and Stoetzer, Lukas F. and Lowe, Will , journal=

work page
[20]

Newman, M. E. J. , journal=

work page
[21]

Science , year=

Barab. Science , year=

work page
[22]

Journal of Machine Learning Research (JMLR) , year=

Munos, R. Journal of Machine Learning Research (JMLR) , year=

work page
[23]

Meuleau, Nicolas and Hauskrecht, Milos and Kim, Kee-Eung and Peshkin, Leonid and Kaelbling, Leslie Pack and Dean, Thomas and Boutilier, Craig , booktitle=

work page
[24]

Boutilier, Craig and Lu, Tyler , booktitle=

work page
[25]

Dolgov, Dmitri and Durfee, Edmund , booktitle=

work page
[26]

Carrara, Nicolas and Leurent, Edouard and Laroche, Romain and Urvoy, Tanguy and Maillard, Odalric-Ambrym and Pietquin, Olivier , booktitle=

work page
[27]

Etessami, Kousha and Stewart, Alistair and Yannakakis, Mihalis , journal=

work page
[28]

and Papastavrou, Jason D

Kleywegt, Anton J. and Papastavrou, Jason D. , journal=

work page
[29]

and Schapire, Robert and Slivkins, Aleksandrs , journal=

Immorlica, Nicole and Sankararaman, Karthik A. and Schapire, Robert and Slivkins, Aleksandrs , journal=

work page
[30]

Kesselheim, Thomas and Singla, Sahil , booktitle=

work page
[31]

Kumar, Raunak and Kleinberg, Robert , booktitle=

work page
[32]

Clauset, Aaron and Shalizi, Cosma Rohilla and Newman, M. E. J. , journal=

work page
[33]

Caldarelli, Guido , year=

work page
[34]

and Amaral, Lu

Liljeros, Fredrik and Edling, Christofer R. and Amaral, Lu. Nature , year=

work page
[35]

and Sonnenberg, Pam and Hayes, Richard J

McCreesh, Nicky and Copas, Andrew and Seeley, Janet and Johnston, Lisa G. and Sonnenberg, Pam and Hayes, Richard J. and Frost, Simon D. W. and White, Richard G. , journal=. 2013 , publisher=

work page 2013
[36]

2013 , publisher=

Wylie, John L and Jolly, Ann M , journal=. 2013 , publisher=

work page 2013
[37]

and Fatch, Robin and Ogolla, David and Otieno, Beatrice and Amboka, Sayo and Kadede, Kevin and Cohen, Craig R

Truong, Hong-Ha M. and Fatch, Robin and Ogolla, David and Otieno, Beatrice and Amboka, Sayo and Kadede, Kevin and Cohen, Craig R. and Bukusi, Elizabeth A. and Guz. Annals of epidemiology , volume=. 2023 , publisher=

work page 2023
[38]

2025 , publisher=

Wang, Wei and Sun, Guangsheng and Li, Chen and Qiu, Cunxi and Fan, Junyi and Jin, Yuhan , journal=. 2025 , publisher=

work page 2025