pith. sign in

arxiv: 2604.20657 · v2 · pith:KNGCKTCKnew · submitted 2026-04-22 · 🧮 math.OC

Importance Sampling in Expensive Finite-Sum Optimization via Contextual Bandit Methods

Pith reviewed 2026-07-05 02:08 UTC · model glm-5.2

classification 🧮 math.OC
keywords methodsoptimizationworkaveragebanditcontextualoutputsproblem
0
0 comments X

The pith

Sampling Subsets as Bandit Arms

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that the question of how to select subsets of simulation outputs in SAM methods can be reformulated as a contextual bandit problem. In expensive finite-sum optimization, where each function evaluation is costly and may involve parallel simulations, the choice of which subset to model locally at each iteration matters. The author proposes using the Exp4 algorithm—an exponential-weights method for contextual bandits with expert advice—to generate sampling distributions that leverage side information such as lower-fidelity simulations, pre-trained emulators, or domain expertise. The core mechanism is to treat each candidate subset-selection strategy as an expert advising the bandit algorithm, which then learns over iterations which expert's advice produces better optimization progress. This reframing connects subset selection in model-based optimization to a well-studied online learning framework, potentially allowing practitioners to encode and automatically weigh multiple sources of prior knowledge when deciding which simulations to run.

Core claim

The central discovery is a reduction: the problem of generating sampling distributions for SAM (stochastic average model) methods can be cast as a contextual bandit problem solvable by Exp4. In SAM methods, only a randomized subset of simulation outputs is locally modeled per iteration of an optimization routine. The author identifies that the subset-selection decision naturally fits the contextual bandit structure—context is available from side information (lower-fidelity models, emulators, human or AI expertise), arms correspond to subset choices, and rewards correspond to optimization progress. By applying Exp4, the method can aggregate advice from multiple expert policies for subset选择,适应

What carries the argument

SAM (stochastic average model) methods

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The contextual bandit formulation could be extended to other randomized optimization methods where subset selection is required, such as stochastic gradient methods with non-uniform sampling, potentially unifying importance sampling across optimization algorithms.
  • If lower-fidelity simulations serve as context, the approach suggests a principled way to perform multi-fidelity optimization: the bandit learns which fidelity level's advice to trust for subset selection, effectively performing automatic fidelity management within the optimization loop.
  • The regret guarantees of Exp4 could potentially be translated into convergence rate bounds for the resulting SAM method, providing theoretical justification for using learned (rather than uniform) sampling distributions in model-based optimization.
  • The expert framework could accommodate adaptive experts that update their subset recommendations based on the optimization trajectory, not just static side information, creating a richer interaction between the bandit and the optimizer.

Load-bearing premise

The paper assumes that the contextual bandit formulation via Exp4 provides a meaningful advantage over simpler subset-selection strategies in SAM methods, which is load-bearing because the entire contribution rests on the bandit approach being worth the added complexity. The empirical evidence consists of preliminary numerical results on synthetic problems.

What would settle it

On realistic expensive optimization problems with available side information, the Exp4-based subset selection should produce measurably better optimization progress per simulation evaluation compared to uniform random subset selection, with the advantage growing as the quality of side information improves.

Figures

Figures reproduced from arXiv: 2604.20657 by Matt Menickelly.

Figure 1
Figure 1. Figure 1: SAM-POUNDERS [26] employing quadratic interpolation models applied to a synthetic quadratic problem described in the text. The first variant uses the Lipschitz estimation scheme described in Section 4.1 to generate π k , while the latter updates models uniformly at random. Left: A comparison of a typical realization of the two variants comparing iteration counter to true function value f(x k ) (unknown to … view at source ↗
Figure 7
Figure 7. Figure 7: Reasoning text provided by Gemini Pro 3.1 on a random itera [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗
read the original abstract

In computational science workflows, it is often the case that 1) objective functions for optimization involve multiple simulation outputs, and 2) those simulations can be performed (at least partially) in parallel. In this work, we reexamine past work on a class of randomized algorithms, stochastic average model (SAM) methods. SAM methods are conceptually similar to stochastic average gradient methods, and effectively require that only randomized subsets of simulation outputs be locally modeled in each iteration of a model-based optimization method. This work focuses on the question of how best to perform this randomization of subset selection, especially in settings where there exists useful side information such as alternative lower-fidelity simulations, pre-trained emulators or domain expertise from humans or AI models. In particular, we consider the problem of generating sampling distributions for SAM methods as a contextual bandit problem and we apply the Exponential weights algorithm for Exploration and Exploitation with Experts (Exp4). We provide some preliminary numerical results on synthetic problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The manuscript proposes casting the subset-selection problem in Stochastic Average Model (SAM) methods for expensive finite-sum optimization as a contextual bandit problem, specifically applying the Exp4 algorithm. The motivation is that in computational science workflows, side information (lower-fidelity simulations, emulators, domain expertise) can inform which simulation outputs to sample. The paper describes itself as conceptual, with preliminary numerical results on synthetic problems. The full text provided for review was corrupted (encoding failure rendering the body largely unparseable), so this assessment is based primarily on the abstract and the partially readable fragments of structure (section headings, equation-like blocks, references).

Significance. The problem framing is well-motivated: subset selection in model-based finite-sum optimization with expensive function evaluations is a practical concern, and leveraging contextual side information via bandit methods is a natural and potentially useful idea. The cross-application of Exp4 to SAM sampling distributions is, to my knowledge, novel. However, the significance is currently limited by the absence of theoretical guarantees and the preliminary nature of the experiments, both acknowledged in the abstract. No machine-checked proofs, reproducible code, or parameter-free derivations are evident from the available material.

major comments (3)
  1. Abstract: The paper states it provides 'preliminary numerical results on synthetic problems' and mentions no convergence theorem. For a math.OC submission, the absence of any theoretical result connecting the bandit regret to optimization progress is a load-bearing gap. The central claim—that the contextual bandit formulation yields improved subset selection—requires either a formal guarantee or substantially stronger empirical evidence than 'preliminary synthetic' experiments. As submitted, the contribution is a heuristic proposal without theoretical justification.
  2. Non-stationarity of the bandit (load-bearing for the Exp4 motivation): In SAM optimization, the optimal sampling distribution at iteration k depends on the current iterate x_k, which changes every step. Standard Exp4 regret bounds (Auer et al., 2002) are stated against a fixed best expert in hindsight over an adversarial sequence. If the optimal expert shifts substantially across iterations, the standard regret bound does not directly imply SAM iterate convergence. The manuscript must either (a) provide a theorem connecting cumulative Exp4 regret to an optimization error bound (e.g., E[f(x_k) - f*] in terms of regret), or (b) explicitly discuss why the non-stationarity is benign in the regimes considered and provide empirical evidence that Exp4 outperforms simpler baselines (uniform sampling, gradient-norm-weighted sampling) under iterate drift. Without this, the choice of Exp4 over a固定或
  3. Full text corruption: The body of the manuscript provided for review is corrupted (encoding failure), rendering the technical content—equations, theorems, experimental setup, and results—largely unparseable. This prevents verification of whether the non-stationarity concern above is addressed, what baselines are compared, and whether any theoretical results exist that the abstract does not mention. A clean, properly encoded manuscript must be provided for substantive review.
minor comments (3)
  1. Abstract: The phrase 'preliminary numerical results' undersells the empirical contribution if the experiments are in fact substantial; conversely, if they are genuinely preliminary, the abstract should state the scope and limitations more explicitly.
  2. Abstract: No mention of the specific synthetic problems used, the dimensionality, or the baselines compared against. A brief indication would help readers assess relevance.
  3. The arXiv identifier in the header (2604.20660v2) differs from the submission identifier (2604.20657); authors should verify.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for a careful reading under difficult circumstances (the corrupted encoding) and for identifying the two most important substantive concerns: the lack of theoretical guarantees and the non-stationarity issue for Exp4. We address each comment below.

read point-by-point responses
  1. Referee: Abstract: The paper states it provides 'preliminary numerical results on synthetic problems' and mentions no convergence theorem. For a math.OC submission, the absence of any theoretical result connecting the bandit regret to optimization progress is a load-bearing gap. The central claim—that the contextual bandit formulation yields improved subset selection—requires either a formal guarantee or substantially stronger empirical evidence than 'preliminary synthetic' experiments. As submitted, the contribution is a heuristic proposal without theoretical justification.

    Authors: The referee is correct that the current manuscript contains no theorem connecting cumulative bandit regret to an optimization error bound, and we agree that this is the most important gap for a math.OC submission. We will add a theoretical result in the revision. Specifically, we plan to include a proposition that bounds the expected optimization error E[f(x_k) - f*] in terms of (i) the cumulative Exp4 regret over the first k iterations, (ii) the SAM model error terms, and (iii) standard smoothness/convexity assumptions on f. The key step is to show that the SAM iterate convergence bound (which already exists in the SAM literature under uniform sampling) degrades gracefully when the sampling distribution incurs regret R(k) relative to the best expert, yielding an extra O(R(k)/k) term in the per-iteration progress. This connects the bandit regret to optimization progress in the manner the referee requests. We will also expand the numerical experiments beyond the current preliminary synthetic tests to include at least one higher-dimensional problem and comparisons against uniform sampling and gradient-norm-weighted sampling. revision: yes

  2. Referee: Non-stationarity of the bandit (load-bearing for the Exp4 motivation): In SAM optimization, the optimal sampling distribution at iteration k depends on the current iterate x_k, which changes every step. Standard Exp4 regret bounds (Auer et al., 2002) are stated against a fixed best expert in hindsight over an adversarial sequence. If the optimal expert shifts substantially across iterations, the standard regret bound does not directly imply SAM iterate convergence. The manuscript must either (a) provide a theorem connecting cumulative Exp4 regret to an optimization error bound (e.g., E[f(x_k) - f*] in terms of regret), or (b) explicitly discuss why the non-stationarity is benign in the regimes considered and provide empirical evidence that Exp4 outperforms simpler baselines (uniform sampling, gradient-norm-weighted sampling) under iterate drift. Without this, the choice of Exp4 over a固定或

    Authors: This is a well-taken and substantive point. The referee is correct that standard Exp4 bounds compete against a fixed best expert in hindsight, while in SAM optimization the optimal sampling distribution shifts as x_k evolves. We will address this on two fronts in the revision. First, in the theoretical result described above, we will use the standard Exp4 regret bound against the best fixed expert in hindsight, and show that even this weaker guarantee suffices for SAM convergence: if the expert class is rich enough that a single fixed expert provides a good sampling distribution across all iterates (which is the case when, e.g., the side information is a lower-fidelity model whose relative accuracy is stable across the optimization trajectory), then the regret bound translates directly to an optimization error bound. Second, we will add an explicit discussion of the non-stationarity issue, identifying regimes where it is benign (slowly varying iterates, stable side-information quality) and acknowledging regimes where it is not (rapidly changing iterate geometry, non-stationary expert quality). We will also report empirical comparisons against uniform sampling and gradient-norm-weighted sampling to demonstrate that Exp4 provides benefits under iterate drift in the tested problems. We note that we do not claim Exp4 is optimal for the non-stationary setting; rather, we argue it is a reasonable and practically effective choice given available side information, and the revision will make the scope of this claim precise. revision: yes

  3. Referee: Full text corruption: The body of the manuscript provided for review is corrupted (encoding failure), rendering the technical content—equations, theorems, experimental setup, and results—largely unparseable. This prevents verification of whether the non-stationarity concern above is addressed, what baselines are compared, and whether any theoretical results exist that the abstract does not mention. A clean, properly encoded manuscript must be provided for substantive review.

    Authors: We sincerely apologize for the encoding failure. The corruption occurred during the arXiv submission process due to a font encoding incompatibility in the LaTeX source. We have identified the root cause (a custom font package that did not embed correctly) and have already produced a clean, properly encoded version. We will resubmit the corrected manuscript and can provide it immediately if the referee or editor wishes to re-examine the technical content before the next round. We acknowledge that this corruption prevented the referee from verifying the technical details, and we appreciate the referee's willingness to engage with the abstract and structural fragments despite this issue. revision: yes

Circularity Check

0 steps flagged

No circularity detected: the paper combines two independently established frameworks (SAM methods and Exp4 contextual bandits) without self-referential derivation chains.

full rationale

The full text is corrupted and largely unreadable, but the abstract and the readable fragments are sufficient to assess circularity. The paper's central contribution is a cross-application: it casts the subset-selection problem in SAM (stochastic average model) optimization as a contextual bandit problem and applies the Exp4 algorithm. Both SAM methods and Exp4 are well-established, independently developed frameworks from the prior literature. The abstract does not claim to derive a new result that is then validated by fitting to that same result, nor does it define a quantity in terms of what it purports to predict. The contribution is a reformulation/heuristic application, not a first-principles derivation whose output equals its input by construction. The skeptic's concerns about non-stationarity of the bandit and the absence of a regret-to-convergence theorem are legitimate correctness and completeness concerns, but they are not circularity: the paper does not claim a theorem that secretly reduces to its assumptions. The preliminary numerical results are described as synthetic experiments, not as fitted predictions repackaged as validation. No self-citation chain is visible in the readable portions that would be load-bearing for the formulation. The paper appears to be a genuine, if preliminary, cross-disciplinary application with no circular construction.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

No new entities are introduced; the paper combines existing frameworks.

axioms (3)
  • domain assumption SAM methods are a valid framework for expensive finite-sum optimization with parallelizable simulation outputs
    Stated in the abstract as background; SAM methods are from prior literature.
  • domain assumption Exp4 is applicable to the subset-selection problem in SAM methods, i.e., the subset selection can be meaningfully formulated as a contextual bandit problem
    This is the core modeling assumption of the paper, stated in the abstract. Its validity determines whether the approach is sound.
  • domain assumption Side information (lower-fidelity simulations, emulators, domain expertise) can be encoded as context for the bandit problem
    The abstract lists these as available side information but does not detail how they are encoded as context vectors.

pith-pipeline@v1.1.0-glm · 6616 in / 1946 out tokens · 63087 ms · 2026-07-05T02:08:00.422642+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 1 internal anchor

  1. [1]

    https://github.com/POptUS/BenDFO, 2022

    BenDFO : Code for benchmarking derivative-free optimization algorithms. https://github.com/POptUS/BenDFO, 2022

  2. [2]

    https://github.com/POptUS/IBCDFO, 2024

    IBCDFO : Interpolation-based composite derivative-free optimization. https://github.com/POptUS/IBCDFO, 2024

  3. [3]

    Competing in the dark: An efficient algorithm for bandit linear optimization

    Jacob D Abernethy, Elad Hazan, and Alexander Rakhlin. Competing in the dark: An efficient algorithm for bandit linear optimization. In Conference on Learning Theory , volume 110, 2009

  4. [4]

    Schapire

    Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, and Robert E. Schapire. Taming the monster: A fast and simple algorithm for contextual bandits. International Conference on Machine Learning (ICML) , 2014

  5. [5]

    Finite-time analysis of the multiarmed bandit problem

    Peter Auer, Nicol\`o Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning , 47(2--3):235--256, 2002

  6. [6]

    The nonstochastic multiarmed bandit problem

    Peter Auer, Nicol \`o Cesa-Bianchi, Yoav Freund, and Robert E Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing , 32(1):48--77, 2002

  7. [7]

    Convergence rate analysis of a stochastic trust-region method via supermartingales

    Jose Blanchet, Coralia Cartis, Matt Menickelly, and Katya Scheinberg. Convergence rate analysis of a stochastic trust-region method via supermartingales. INFORMS Journal on Optimization , 1(2):92--119, 2019

  8. [8]

    Raghu Bollapragada, Matt Menickelly, Witold Nazarewicz, Jared O'Neal, Paul-Gerhard Reinhard, and Stefan M. Wild. Optimization and supervised machine learning methods for fitting numerical physics models without derivatives. Journal of Physics G: Nuclear and Particle Physics , 48(2):024001, 2021

  9. [9]

    Zalan Borsos, Andreas Krause, and Kfir Y. Levy. Online variance reduction for stochastic optimization. In Sébastien Bubeck, Vianney Perchet, and Philippe Rigollet, editors, Proceedings of the 31st Conference On Learning Theory , volume 75 of Proceedings of Machine Learning Research , pages 324--357. PMLR, 06--09 Jul 2018

  10. [10]

    Regret analysis of stochastic and nonstochastic multi-armed bandit problems

    S\'ebastien Bubeck and Nicol\`o Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends in Machine Learning , 5(1):1--122, 2012

  11. [11]

    The best of both worlds: stochastic and adversarial bandits

    S\'ebastien Bubeck and Aleksandrs Slivkins. The best of both worlds: stochastic and adversarial bandits. In Conference on Learning Theory (COLT) , 2012

  12. [12]

    Stochastic optimization using a trust-region method and random models

    Ruobing Chen, Matt Menickelly, and Katya Scheinberg. Stochastic optimization using a trust-region method and random models. Mathematical Programming , 169(2):447--487, 2018

  13. [13]

    Conn, Katya Scheinberg, and Lu\'is N

    Andrew R. Conn, Katya Scheinberg, and Lu\'is N. Vicente. Introduction to Derivative-Free Optimization . SIAM, 2009

  14. [14]

    Importance sampling for minibatches

    Dominik Csiba and Peter Richt \'a rik. Importance sampling for minibatches. Journal of Machine Learning Research , 19(27):1--21, 2018

  15. [15]

    Bach, and Simon Lacoste - Julien

    Aaron Defazio, Francis R. Bach, and Simon Lacoste - Julien. SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In Zoubin Ghahramani, Max Welling, Corinna Cortes, Neil D. Lawrence, and Kilian Q. Weinberger, editors, Advances in Neural Information Processing Systems 27 , pages 1646--1654, 2014

  16. [16]

    Adaptive importance sampling for finite-sum optimization and sampling with decreasing step-sizes

    Ayoub El Hanchi and David Stephens. Adaptive importance sampling for finite-sum optimization and sampling with decreasing step-sizes. Advances in Neural Information Processing Systems , 33:15702--15713, 2020

  17. [17]

    Nonconvex variance reduced optimization with arbitrary sampling

    Samuel Horv \'a th and Peter Richtarik. Nonconvex variance reduced optimization with arbitrary sampling. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning , volume 97, pages 2781--2789. PMLR, 2019

  18. [18]

    Adam: A Method for Stochastic Optimization

    Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 , 2014

  19. [19]

    Asymptotically efficient adaptive allocation rules

    Tze Leung Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics , 6(1):4--22, 1985

  20. [20]

    The epoch-greedy algorithm for contextual multi-armed bandits

    John Langford and Tong Zhang. The epoch-greedy algorithm for contextual multi-armed bandits. In Advances in Neural Information Processing Systems (NeurIPS) , 2008

  21. [21]

    Structure-aware methods for expensive derivative-free nonsmooth composite optimization

    Jeffrey Larson and Matt Menickelly. Structure-aware methods for expensive derivative-free nonsmooth composite optimization. Mathematical Programming Computation , 16(1):1--36, 2024

  22. [22]

    Jeffrey Larson, Matt Menickelly, and Stefan M. Wild. Derivative-free optimization methods. Acta Numerica , 28:287--404, 2019

  23. [23]

    Schapire

    Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. A contextual-bandit approach to personalized news article recommendation. In International World Wide Web Conference (WWW) , 2010

  24. [24]

    Adam with bandit sampling for deep learning

    Rui Liu, Tianyi Wu, and Barzan Mozafari. Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems , 33:5393--5404, 2020

  25. [25]

    Marevi \'c , N

    P. Marevi \'c , N. Schunck, E. M. Ney, R. Navarro P \'e rez, M. Verriere, and J. O'Neal. Axially-deformed solution of the Skyrme-Hartree-Fock-Bogoliubov equations using the transformed harmonic oscillator basis ( IV ) HFBTHO (v4.0): A new version of the program. Comput. Phys. Commun. , 276:108367, 2022

  26. [26]

    Matt Menickelly and Stefan M. Wild. Stochastic average model methods. Computational Optimization and Applications , 88(2):405--442, 2024

  27. [27]

    Mor \'e and Stefan M

    Jorge J. Mor \'e and Stefan M. Wild. Benchmarking derivative-free optimization algorithms. SIAM Journal on Optimization , 20(1):172--191, 2009

  28. [28]

    Adaptive sampling probabilities for non-smooth optimization

    Hongseok Namkoong, Aman Sinha, Steve Yadlowsky, and John C Duchi. Adaptive sampling probabilities for non-smooth optimization. In International Conference on Machine Learning , pages 2574--2583. PMLR, 2017

  29. [29]

    Some aspects of the sequential design of experiments

    Herbert Robbins. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society , 58(5):527--535, 1952

  30. [30]

    Stochastic optimization with bandit sampling

    Farnood Salehi, L Elisa Celis, and Patrick Thiran. Stochastic optimization with bandit sampling. In Advances in Neural Information Processing Systems (NeurIPS) , 2017

  31. [31]

    Minimizing finite sums with the stochastic average gradient

    Mark Schmidt, Nicolas Le Roux, and Francis Bach. Minimizing finite sums with the stochastic average gradient. Mathematical Programming , 162(1-2):83--112, 2017

  32. [32]

    Stefan M. Wild. MNH: A derivative-free optimization algorithm using minimal norm Hessians . In Tenth Copper Mountain Conference on Iterative Methods , 2008

  33. [33]

    Stefan M. Wild. Solving derivative-free nonlinear least squares problems with POUNDERS . In Tamas Terlaky, Miguel F. Anjos, and Shabbir Ahmed, editors, Advances and Trends in Optimization with Engineering Applications , pages 529--540. SIAM, 2017

  34. [34]

    Adaptive client sampling in federated learning via online learning with bandit feedback

    Boxin Zhao, Lingxiao Wang, Ziqi Liu, Zhiqiang Zhang, Jun Zhou, Chaochao Chen, and Mladen Kolar. Adaptive client sampling in federated learning via online learning with bandit feedback. Journal of Machine Learning Research , 26(8):1--67, 2025