Importance Sampling in Expensive Finite-Sum Optimization via Contextual Bandit Methods
Pith reviewed 2026-07-05 02:08 UTC · model glm-5.2
The pith
Sampling Subsets as Bandit Arms
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is a reduction: the problem of generating sampling distributions for SAM (stochastic average model) methods can be cast as a contextual bandit problem solvable by Exp4. In SAM methods, only a randomized subset of simulation outputs is locally modeled per iteration of an optimization routine. The author identifies that the subset-selection decision naturally fits the contextual bandit structure—context is available from side information (lower-fidelity models, emulators, human or AI expertise), arms correspond to subset choices, and rewards correspond to optimization progress. By applying Exp4, the method can aggregate advice from multiple expert policies for subset选择,适应
What carries the argument
SAM (stochastic average model) methods
Where Pith is reading between the lines
- The contextual bandit formulation could be extended to other randomized optimization methods where subset selection is required, such as stochastic gradient methods with non-uniform sampling, potentially unifying importance sampling across optimization algorithms.
- If lower-fidelity simulations serve as context, the approach suggests a principled way to perform multi-fidelity optimization: the bandit learns which fidelity level's advice to trust for subset selection, effectively performing automatic fidelity management within the optimization loop.
- The regret guarantees of Exp4 could potentially be translated into convergence rate bounds for the resulting SAM method, providing theoretical justification for using learned (rather than uniform) sampling distributions in model-based optimization.
- The expert framework could accommodate adaptive experts that update their subset recommendations based on the optimization trajectory, not just static side information, creating a richer interaction between the bandit and the optimizer.
Load-bearing premise
The paper assumes that the contextual bandit formulation via Exp4 provides a meaningful advantage over simpler subset-selection strategies in SAM methods, which is load-bearing because the entire contribution rests on the bandit approach being worth the added complexity. The empirical evidence consists of preliminary numerical results on synthetic problems.
What would settle it
On realistic expensive optimization problems with available side information, the Exp4-based subset selection should produce measurably better optimization progress per simulation evaluation compared to uniform random subset selection, with the advantage growing as the quality of side information improves.
Figures
read the original abstract
In computational science workflows, it is often the case that 1) objective functions for optimization involve multiple simulation outputs, and 2) those simulations can be performed (at least partially) in parallel. In this work, we reexamine past work on a class of randomized algorithms, stochastic average model (SAM) methods. SAM methods are conceptually similar to stochastic average gradient methods, and effectively require that only randomized subsets of simulation outputs be locally modeled in each iteration of a model-based optimization method. This work focuses on the question of how best to perform this randomization of subset selection, especially in settings where there exists useful side information such as alternative lower-fidelity simulations, pre-trained emulators or domain expertise from humans or AI models. In particular, we consider the problem of generating sampling distributions for SAM methods as a contextual bandit problem and we apply the Exponential weights algorithm for Exploration and Exploitation with Experts (Exp4). We provide some preliminary numerical results on synthetic problems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes casting the subset-selection problem in Stochastic Average Model (SAM) methods for expensive finite-sum optimization as a contextual bandit problem, specifically applying the Exp4 algorithm. The motivation is that in computational science workflows, side information (lower-fidelity simulations, emulators, domain expertise) can inform which simulation outputs to sample. The paper describes itself as conceptual, with preliminary numerical results on synthetic problems. The full text provided for review was corrupted (encoding failure rendering the body largely unparseable), so this assessment is based primarily on the abstract and the partially readable fragments of structure (section headings, equation-like blocks, references).
Significance. The problem framing is well-motivated: subset selection in model-based finite-sum optimization with expensive function evaluations is a practical concern, and leveraging contextual side information via bandit methods is a natural and potentially useful idea. The cross-application of Exp4 to SAM sampling distributions is, to my knowledge, novel. However, the significance is currently limited by the absence of theoretical guarantees and the preliminary nature of the experiments, both acknowledged in the abstract. No machine-checked proofs, reproducible code, or parameter-free derivations are evident from the available material.
major comments (3)
- Abstract: The paper states it provides 'preliminary numerical results on synthetic problems' and mentions no convergence theorem. For a math.OC submission, the absence of any theoretical result connecting the bandit regret to optimization progress is a load-bearing gap. The central claim—that the contextual bandit formulation yields improved subset selection—requires either a formal guarantee or substantially stronger empirical evidence than 'preliminary synthetic' experiments. As submitted, the contribution is a heuristic proposal without theoretical justification.
- Non-stationarity of the bandit (load-bearing for the Exp4 motivation): In SAM optimization, the optimal sampling distribution at iteration k depends on the current iterate x_k, which changes every step. Standard Exp4 regret bounds (Auer et al., 2002) are stated against a fixed best expert in hindsight over an adversarial sequence. If the optimal expert shifts substantially across iterations, the standard regret bound does not directly imply SAM iterate convergence. The manuscript must either (a) provide a theorem connecting cumulative Exp4 regret to an optimization error bound (e.g., E[f(x_k) - f*] in terms of regret), or (b) explicitly discuss why the non-stationarity is benign in the regimes considered and provide empirical evidence that Exp4 outperforms simpler baselines (uniform sampling, gradient-norm-weighted sampling) under iterate drift. Without this, the choice of Exp4 over a固定或
- Full text corruption: The body of the manuscript provided for review is corrupted (encoding failure), rendering the technical content—equations, theorems, experimental setup, and results—largely unparseable. This prevents verification of whether the non-stationarity concern above is addressed, what baselines are compared, and whether any theoretical results exist that the abstract does not mention. A clean, properly encoded manuscript must be provided for substantive review.
minor comments (3)
- Abstract: The phrase 'preliminary numerical results' undersells the empirical contribution if the experiments are in fact substantial; conversely, if they are genuinely preliminary, the abstract should state the scope and limitations more explicitly.
- Abstract: No mention of the specific synthetic problems used, the dimensionality, or the baselines compared against. A brief indication would help readers assess relevance.
- The arXiv identifier in the header (2604.20660v2) differs from the submission identifier (2604.20657); authors should verify.
Simulated Author's Rebuttal
We thank the referee for a careful reading under difficult circumstances (the corrupted encoding) and for identifying the two most important substantive concerns: the lack of theoretical guarantees and the non-stationarity issue for Exp4. We address each comment below.
read point-by-point responses
-
Referee: Abstract: The paper states it provides 'preliminary numerical results on synthetic problems' and mentions no convergence theorem. For a math.OC submission, the absence of any theoretical result connecting the bandit regret to optimization progress is a load-bearing gap. The central claim—that the contextual bandit formulation yields improved subset selection—requires either a formal guarantee or substantially stronger empirical evidence than 'preliminary synthetic' experiments. As submitted, the contribution is a heuristic proposal without theoretical justification.
Authors: The referee is correct that the current manuscript contains no theorem connecting cumulative bandit regret to an optimization error bound, and we agree that this is the most important gap for a math.OC submission. We will add a theoretical result in the revision. Specifically, we plan to include a proposition that bounds the expected optimization error E[f(x_k) - f*] in terms of (i) the cumulative Exp4 regret over the first k iterations, (ii) the SAM model error terms, and (iii) standard smoothness/convexity assumptions on f. The key step is to show that the SAM iterate convergence bound (which already exists in the SAM literature under uniform sampling) degrades gracefully when the sampling distribution incurs regret R(k) relative to the best expert, yielding an extra O(R(k)/k) term in the per-iteration progress. This connects the bandit regret to optimization progress in the manner the referee requests. We will also expand the numerical experiments beyond the current preliminary synthetic tests to include at least one higher-dimensional problem and comparisons against uniform sampling and gradient-norm-weighted sampling. revision: yes
-
Referee: Non-stationarity of the bandit (load-bearing for the Exp4 motivation): In SAM optimization, the optimal sampling distribution at iteration k depends on the current iterate x_k, which changes every step. Standard Exp4 regret bounds (Auer et al., 2002) are stated against a fixed best expert in hindsight over an adversarial sequence. If the optimal expert shifts substantially across iterations, the standard regret bound does not directly imply SAM iterate convergence. The manuscript must either (a) provide a theorem connecting cumulative Exp4 regret to an optimization error bound (e.g., E[f(x_k) - f*] in terms of regret), or (b) explicitly discuss why the non-stationarity is benign in the regimes considered and provide empirical evidence that Exp4 outperforms simpler baselines (uniform sampling, gradient-norm-weighted sampling) under iterate drift. Without this, the choice of Exp4 over a固定或
Authors: This is a well-taken and substantive point. The referee is correct that standard Exp4 bounds compete against a fixed best expert in hindsight, while in SAM optimization the optimal sampling distribution shifts as x_k evolves. We will address this on two fronts in the revision. First, in the theoretical result described above, we will use the standard Exp4 regret bound against the best fixed expert in hindsight, and show that even this weaker guarantee suffices for SAM convergence: if the expert class is rich enough that a single fixed expert provides a good sampling distribution across all iterates (which is the case when, e.g., the side information is a lower-fidelity model whose relative accuracy is stable across the optimization trajectory), then the regret bound translates directly to an optimization error bound. Second, we will add an explicit discussion of the non-stationarity issue, identifying regimes where it is benign (slowly varying iterates, stable side-information quality) and acknowledging regimes where it is not (rapidly changing iterate geometry, non-stationary expert quality). We will also report empirical comparisons against uniform sampling and gradient-norm-weighted sampling to demonstrate that Exp4 provides benefits under iterate drift in the tested problems. We note that we do not claim Exp4 is optimal for the non-stationary setting; rather, we argue it is a reasonable and practically effective choice given available side information, and the revision will make the scope of this claim precise. revision: yes
-
Referee: Full text corruption: The body of the manuscript provided for review is corrupted (encoding failure), rendering the technical content—equations, theorems, experimental setup, and results—largely unparseable. This prevents verification of whether the non-stationarity concern above is addressed, what baselines are compared, and whether any theoretical results exist that the abstract does not mention. A clean, properly encoded manuscript must be provided for substantive review.
Authors: We sincerely apologize for the encoding failure. The corruption occurred during the arXiv submission process due to a font encoding incompatibility in the LaTeX source. We have identified the root cause (a custom font package that did not embed correctly) and have already produced a clean, properly encoded version. We will resubmit the corrected manuscript and can provide it immediately if the referee or editor wishes to re-examine the technical content before the next round. We acknowledge that this corruption prevented the referee from verifying the technical details, and we appreciate the referee's willingness to engage with the abstract and structural fragments despite this issue. revision: yes
Circularity Check
No circularity detected: the paper combines two independently established frameworks (SAM methods and Exp4 contextual bandits) without self-referential derivation chains.
full rationale
The full text is corrupted and largely unreadable, but the abstract and the readable fragments are sufficient to assess circularity. The paper's central contribution is a cross-application: it casts the subset-selection problem in SAM (stochastic average model) optimization as a contextual bandit problem and applies the Exp4 algorithm. Both SAM methods and Exp4 are well-established, independently developed frameworks from the prior literature. The abstract does not claim to derive a new result that is then validated by fitting to that same result, nor does it define a quantity in terms of what it purports to predict. The contribution is a reformulation/heuristic application, not a first-principles derivation whose output equals its input by construction. The skeptic's concerns about non-stationarity of the bandit and the absence of a regret-to-convergence theorem are legitimate correctness and completeness concerns, but they are not circularity: the paper does not claim a theorem that secretly reduces to its assumptions. The preliminary numerical results are described as synthetic experiments, not as fitted predictions repackaged as validation. No self-citation chain is visible in the readable portions that would be load-bearing for the formulation. The paper appears to be a genuine, if preliminary, cross-disciplinary application with no circular construction.
Axiom & Free-Parameter Ledger
axioms (3)
- domain assumption SAM methods are a valid framework for expensive finite-sum optimization with parallelizable simulation outputs
- domain assumption Exp4 is applicable to the subset-selection problem in SAM methods, i.e., the subset selection can be meaningfully formulated as a contextual bandit problem
- domain assumption Side information (lower-fidelity simulations, emulators, domain expertise) can be encoded as context for the bandit problem
Reference graph
Works this paper leans on
-
[1]
https://github.com/POptUS/BenDFO, 2022
BenDFO : Code for benchmarking derivative-free optimization algorithms. https://github.com/POptUS/BenDFO, 2022
work page 2022
-
[2]
https://github.com/POptUS/IBCDFO, 2024
IBCDFO : Interpolation-based composite derivative-free optimization. https://github.com/POptUS/IBCDFO, 2024
work page 2024
-
[3]
Competing in the dark: An efficient algorithm for bandit linear optimization
Jacob D Abernethy, Elad Hazan, and Alexander Rakhlin. Competing in the dark: An efficient algorithm for bandit linear optimization. In Conference on Learning Theory , volume 110, 2009
work page 2009
- [4]
-
[5]
Finite-time analysis of the multiarmed bandit problem
Peter Auer, Nicol\`o Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning , 47(2--3):235--256, 2002
work page 2002
-
[6]
The nonstochastic multiarmed bandit problem
Peter Auer, Nicol \`o Cesa-Bianchi, Yoav Freund, and Robert E Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing , 32(1):48--77, 2002
work page 2002
-
[7]
Convergence rate analysis of a stochastic trust-region method via supermartingales
Jose Blanchet, Coralia Cartis, Matt Menickelly, and Katya Scheinberg. Convergence rate analysis of a stochastic trust-region method via supermartingales. INFORMS Journal on Optimization , 1(2):92--119, 2019
work page 2019
-
[8]
Raghu Bollapragada, Matt Menickelly, Witold Nazarewicz, Jared O'Neal, Paul-Gerhard Reinhard, and Stefan M. Wild. Optimization and supervised machine learning methods for fitting numerical physics models without derivatives. Journal of Physics G: Nuclear and Particle Physics , 48(2):024001, 2021
work page 2021
-
[9]
Zalan Borsos, Andreas Krause, and Kfir Y. Levy. Online variance reduction for stochastic optimization. In Sébastien Bubeck, Vianney Perchet, and Philippe Rigollet, editors, Proceedings of the 31st Conference On Learning Theory , volume 75 of Proceedings of Machine Learning Research , pages 324--357. PMLR, 06--09 Jul 2018
work page 2018
-
[10]
Regret analysis of stochastic and nonstochastic multi-armed bandit problems
S\'ebastien Bubeck and Nicol\`o Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends in Machine Learning , 5(1):1--122, 2012
work page 2012
-
[11]
The best of both worlds: stochastic and adversarial bandits
S\'ebastien Bubeck and Aleksandrs Slivkins. The best of both worlds: stochastic and adversarial bandits. In Conference on Learning Theory (COLT) , 2012
work page 2012
-
[12]
Stochastic optimization using a trust-region method and random models
Ruobing Chen, Matt Menickelly, and Katya Scheinberg. Stochastic optimization using a trust-region method and random models. Mathematical Programming , 169(2):447--487, 2018
work page 2018
-
[13]
Conn, Katya Scheinberg, and Lu\'is N
Andrew R. Conn, Katya Scheinberg, and Lu\'is N. Vicente. Introduction to Derivative-Free Optimization . SIAM, 2009
work page 2009
-
[14]
Importance sampling for minibatches
Dominik Csiba and Peter Richt \'a rik. Importance sampling for minibatches. Journal of Machine Learning Research , 19(27):1--21, 2018
work page 2018
-
[15]
Bach, and Simon Lacoste - Julien
Aaron Defazio, Francis R. Bach, and Simon Lacoste - Julien. SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In Zoubin Ghahramani, Max Welling, Corinna Cortes, Neil D. Lawrence, and Kilian Q. Weinberger, editors, Advances in Neural Information Processing Systems 27 , pages 1646--1654, 2014
work page 2014
-
[16]
Adaptive importance sampling for finite-sum optimization and sampling with decreasing step-sizes
Ayoub El Hanchi and David Stephens. Adaptive importance sampling for finite-sum optimization and sampling with decreasing step-sizes. Advances in Neural Information Processing Systems , 33:15702--15713, 2020
work page 2020
-
[17]
Nonconvex variance reduced optimization with arbitrary sampling
Samuel Horv \'a th and Peter Richtarik. Nonconvex variance reduced optimization with arbitrary sampling. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning , volume 97, pages 2781--2789. PMLR, 2019
work page 2019
-
[18]
Adam: A Method for Stochastic Optimization
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 , 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[19]
Asymptotically efficient adaptive allocation rules
Tze Leung Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics , 6(1):4--22, 1985
work page 1985
-
[20]
The epoch-greedy algorithm for contextual multi-armed bandits
John Langford and Tong Zhang. The epoch-greedy algorithm for contextual multi-armed bandits. In Advances in Neural Information Processing Systems (NeurIPS) , 2008
work page 2008
-
[21]
Structure-aware methods for expensive derivative-free nonsmooth composite optimization
Jeffrey Larson and Matt Menickelly. Structure-aware methods for expensive derivative-free nonsmooth composite optimization. Mathematical Programming Computation , 16(1):1--36, 2024
work page 2024
-
[22]
Jeffrey Larson, Matt Menickelly, and Stefan M. Wild. Derivative-free optimization methods. Acta Numerica , 28:287--404, 2019
work page 2019
- [23]
-
[24]
Adam with bandit sampling for deep learning
Rui Liu, Tianyi Wu, and Barzan Mozafari. Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems , 33:5393--5404, 2020
work page 2020
-
[25]
P. Marevi \'c , N. Schunck, E. M. Ney, R. Navarro P \'e rez, M. Verriere, and J. O'Neal. Axially-deformed solution of the Skyrme-Hartree-Fock-Bogoliubov equations using the transformed harmonic oscillator basis ( IV ) HFBTHO (v4.0): A new version of the program. Comput. Phys. Commun. , 276:108367, 2022
work page 2022
-
[26]
Matt Menickelly and Stefan M. Wild. Stochastic average model methods. Computational Optimization and Applications , 88(2):405--442, 2024
work page 2024
-
[27]
Jorge J. Mor \'e and Stefan M. Wild. Benchmarking derivative-free optimization algorithms. SIAM Journal on Optimization , 20(1):172--191, 2009
work page 2009
-
[28]
Adaptive sampling probabilities for non-smooth optimization
Hongseok Namkoong, Aman Sinha, Steve Yadlowsky, and John C Duchi. Adaptive sampling probabilities for non-smooth optimization. In International Conference on Machine Learning , pages 2574--2583. PMLR, 2017
work page 2017
-
[29]
Some aspects of the sequential design of experiments
Herbert Robbins. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society , 58(5):527--535, 1952
work page 1952
-
[30]
Stochastic optimization with bandit sampling
Farnood Salehi, L Elisa Celis, and Patrick Thiran. Stochastic optimization with bandit sampling. In Advances in Neural Information Processing Systems (NeurIPS) , 2017
work page 2017
-
[31]
Minimizing finite sums with the stochastic average gradient
Mark Schmidt, Nicolas Le Roux, and Francis Bach. Minimizing finite sums with the stochastic average gradient. Mathematical Programming , 162(1-2):83--112, 2017
work page 2017
-
[32]
Stefan M. Wild. MNH: A derivative-free optimization algorithm using minimal norm Hessians . In Tenth Copper Mountain Conference on Iterative Methods , 2008
work page 2008
-
[33]
Stefan M. Wild. Solving derivative-free nonlinear least squares problems with POUNDERS . In Tamas Terlaky, Miguel F. Anjos, and Shabbir Ahmed, editors, Advances and Trends in Optimization with Engineering Applications , pages 529--540. SIAM, 2017
work page 2017
-
[34]
Adaptive client sampling in federated learning via online learning with bandit feedback
Boxin Zhao, Lingxiao Wang, Ziqi Liu, Zhiqiang Zhang, Jun Zhou, Chaochao Chen, and Mladen Kolar. Adaptive client sampling in federated learning via online learning with bandit feedback. Journal of Machine Learning Research , 26(8):1--67, 2025
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.