pith. machine review for the scientific record. sign in

arxiv: 2604.15525 · v1 · submitted 2026-04-16 · 🧮 math.OC · math.PR

Recognition: unknown

Complexity Guarantees for Zeroth-order Methods via Exponentially-shifted Gaussian Smoothing: Mitigating Dimension-dependence and Incorporating Decision-dependence

Mingrui Wang, Prakash Chakraborty, Uday V. Shanbhag

Authors on Pith no claims yet

Pith reviewed 2026-05-10 10:17 UTC · model grok-4.3

classification 🧮 math.OC math.PR
keywords zeroth-order optimizationGaussian smoothingstochastic optimizationdimension dependencedecision-dependent optimizationnonsmooth optimizationconvergence guarantees
0
0 comments X

The pith

Exponentially-shifted Gaussian smoothing reduces dimension dependence in zeroth-order stochastic optimization from quadratic to linear.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a new exponentially-shifted Gaussian smoothing estimator for zeroth-order methods applied to nonsmooth stochastic optimization. This estimator achieves moment bounds with linear dependence on problem dimension, in contrast to the quadratic dependence of standard Gaussian smoothing. The approach extends to decision-dependent regimes, whether the underlying densities are available in closed form or satisfy moment bounds. The resulting methods deliver convergence guarantees for strongly convex, convex, and nonconvex problems, with iteration complexities improved by a factor of the dimension while oracle complexities remain comparable. Additional high-probability and almost-sure convergence results are provided for convex and nonconvex cases.

Core claim

The central claim is that the exponentially-shifted Gaussian smoothing estimator yields gradient estimates whose moments depend linearly on dimension, enabling zeroth-order methods whose iteration complexities improve by a factor of n over prior Gaussian-smoothing schemes, with this property preserved under extensions to decision-dependent stochastic optimization where densities are either closed-form or obey moment bounds.

What carries the argument

The exponentially-shifted Gaussian smoothing (esGs) estimator, which applies an exponential shift to standard Gaussian smoothing to obtain moment bounds with only linear dimension dependence.

Load-bearing premise

The objective must have bounded moments after smoothing, and any decision-dependent densities must satisfy corresponding moment bounds.

What would settle it

Numerical tests on high-dimensional instances showing that the gradient estimator's moments scale quadratically with dimension or that iteration counts fail to improve by a factor of n.

Figures

Figures reproduced from arXiv: 2604.15525 by Mingrui Wang, Prakash Chakraborty, Uday V. Shanbhag.

Figure 1
Figure 1. Figure 1: Oracle schematic for decision dependent stochastic optimization with known struction of p(ξ | x) From (3.12), we get EV,Z,ξ[˜gη(x, V, Z, ξ)] = ∇fη(x). (3.14) We can also derive an analogous second moment bound on the estimator, which can also be seen to grow linearly with the dimension n, akin to the non-DD regime [PITH_FULL_IMAGE:figures/full_fig_p016_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Oracle schematic for decision dependent stochastic optimization with unknown struction of p(ξ | x) Examples of L2-Lipschitz random fields are aplenty. We provide two simple Gaussian examples. For further details please refer to [46]. (i) Let {ψk} n k=1 be deterministic continuous functions and {Zk} n k=1 be independent standard nor￾mals. If {ak} n k=1 denotes a set of scalars, then ξx = Xn k=1 akZkψk(x) is… view at source ↗
read the original abstract

In this paper, we consider two distinct challenges in the resolution of nonsmooth stochastic optimization. Of these, the first pertains to the pronounced dependence of dimension in Gaussian smoothing-enabled zeroth-order schemes, impeding applications to large-scale settings. Second, no unified analysis {exists} for smoothing-enabled stochastic zeroth-order methods, allowing for capturing standard and decision-dependent stochastic optimization. To contend with the first challenge, we introduce a new exponentially-shifted Gaussian smoothing {\bf esGs} estimator whose moment bounds enjoy a linear dependence on dimension (rather than a quadratic dependence as in standard Gaussian smoothing estimators). Second, we show that such an estimator can be extended in two distinct ways to address decision-dependent regimes where the underlying densities are either available in closed form or not. Notably, the resulting gradient estimators continue to display a linear dependence in dimension. We then develop an {\bf esGs}-enabled stochastic zeroth-order method applicable to nonsmooth strongly convex, convex, and {non-}convex regimes. Importantly, our guarantees provide improved dimension-dependence in iteration complexities (by a factor of problem dimension $n$) while maintaing similar oracle complexities. In addition, we also provide novel high probability guarantees and almost sure sublinear rate guarantees in convex settings, while a new subsequential almost sure convergence guarantee is provided in nonconvex regimes. Preliminary numerics support our theoretical findings show that the proposed schemes display improved computational times and more refined empirical accuracies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper introduces an exponentially-shifted Gaussian smoothing (esGs) estimator for zeroth-order stochastic optimization of nonsmooth objectives. The central claim is that the esGs estimator achieves second-moment bounds with only linear dependence on dimension n (versus quadratic for standard Gaussian smoothing), which yields iteration complexities improved by a factor of n while preserving comparable oracle complexity. The estimator is extended in two ways to decision-dependent regimes (closed-form densities or moment-bounded densities), with the linear dimension property preserved. The resulting stochastic zeroth-order algorithm is analyzed for nonsmooth strongly convex, convex, and nonconvex problems, delivering high-probability bounds, almost-sure sublinear rates in the convex case, and subsequential almost-sure convergence in the nonconvex case.

Significance. If the linear-in-n second-moment bound for the esGs estimator is rigorously established under the paper's stated assumptions, the work would meaningfully reduce the dimension dependence that currently limits zeroth-order methods in large-scale settings. The unified treatment of standard and decision-dependent stochastic optimization, together with the additional high-probability and almost-sure convergence results, would constitute a solid theoretical contribution to the field.

major comments (1)
  1. [Section defining the esGs estimator and its moment bounds (likely §3)] The load-bearing step is the claim that the esGs estimator satisfies E[||G_esGs||^2] = O(n) rather than O(n^2). The manuscript must supply the explicit moment calculation (presumably in the section defining the estimator and its properties) that shows how the exponential shift cancels the quadratic terms arising from the product of the smoothing kernel and the random direction. If this derivation relies on unstated restrictions on the objective or the decision-dependent measure beyond the “standard conditions” and “bounded moments after smoothing” mentioned in the abstract, the factor-n improvement in iteration complexity does not follow.
minor comments (1)
  1. [Abstract] Abstract: “maintaing” should be “maintaining”; the phrase “non- convex” contains an extraneous hyphen and brace that should be cleaned up for readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for the positive overall assessment of the contribution. We address the single major comment below and will revise the manuscript to improve clarity on the requested derivation.

read point-by-point responses
  1. Referee: [Section defining the esGs estimator and its moment bounds (likely §3)] The load-bearing step is the claim that the esGs estimator satisfies E[||G_esGs||^2] = O(n) rather than O(n^2). The manuscript must supply the explicit moment calculation (presumably in the section defining the estimator and its properties) that shows how the exponential shift cancels the quadratic terms arising from the product of the smoothing kernel and the random direction. If this derivation relies on unstated restrictions on the objective or the decision-dependent measure beyond the “standard conditions” and “bounded moments after smoothing” mentioned in the abstract, the factor-n improvement in iteration complexity does not follow.

    Authors: We agree that the second-moment bound is the central technical step and that its derivation should be fully explicit. The manuscript already contains the moment calculation in Section 3 (immediately following the definition of the esGs estimator), where we expand E[||G_esGs||^2] using the product of the exponentially-shifted kernel and the random direction, then apply the shift parameter choice to cancel the quadratic-in-n terms that appear in the standard Gaussian case. The resulting bound is O(n) under precisely the conditions stated in the paper: Lipschitz continuity of the objective (or its smoothed version) and bounded second moments of the stochastic oracle after smoothing. No additional restrictions on the decision-dependent measure are used beyond those already listed in the abstract and Section 2. If the current presentation is not sufficiently detailed for the referee, we will expand the derivation with an extra lemma and all intermediate steps in the revised version. revision: yes

Circularity Check

0 steps flagged

No circularity detected; derivation relies on new estimator definition and derived bounds

full rationale

The paper introduces the esGs estimator as a novel construction and asserts that its second-moment bound scales linearly in dimension under standard nonsmoothness plus bounded-moment assumptions after smoothing. This bound is then used to obtain the claimed iteration-complexity improvement. No equation reduces to a prior fitted quantity renamed as a prediction, no uniqueness theorem is imported from the authors' own prior work, and no ansatz is smuggled via self-citation. The central claims therefore rest on the explicit construction and moment analysis of the new estimator rather than on any self-referential loop.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claims rest on standard assumptions for nonsmooth stochastic optimization plus the new estimator construction; no explicit free parameters or invented entities are named in the abstract.

axioms (2)
  • domain assumption The objective satisfies standard bounded-moment conditions after smoothing that enable the linear dimension bound.
    Required for the moment bounds and complexity claims to hold.
  • domain assumption Decision-dependent densities satisfy moment conditions whether or not they are available in closed form.
    Needed for the extension to decision-dependent regimes.
invented entities (1)
  • exponentially-shifted Gaussian smoothing (esGs) estimator no independent evidence
    purpose: To achieve linear dimension dependence in gradient estimation for zeroth-order methods.
    New construction introduced to replace standard Gaussian smoothing.

pith-pipeline@v0.9.0 · 5579 in / 1312 out tokens · 25660 ms · 2026-05-10T10:17:21.725928+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 12 canonical work pages

  1. [1]

    A theoretical and empirical comparison of gradient approximations in derivative-free optimization.Founda- tions of Computational Mathematics, 22(2):507–560, 2022

    Albert S Berahas, Liyuan Cao, Krzysztof Choromanski, and Katya Scheinberg. A theoretical and empirical comparison of gradient approximations in derivative-free optimization.Founda- tions of Computational Mathematics, 22(2):507–560, 2022

  2. [2]

    SIAM Review , author =

    Léon Bottou, Frank E. Curtis, and Jorge Nocedal. Optimization methods for large-scale machine learning.SIAM Review, 60(2):223–311, 2018. doi: 10.1137/16M1080173. URL https://doi.org/10.1137/16M1080173

  3. [3]

    Boucheron, G

    S. Boucheron, G. Lugosi, and P. Massart.Concentration Inequalities: A Nonasymptotic Theory of Independence. OUP Oxford, 2013. ISBN 9780199535255. URLhttps://books.google.com/ books?id=koNqWRluhP0C

  4. [4]

    Performative prediction with bandit feedback: Learning through reparameterization.arXiv preprint arXiv:2305.01094, 2023

    Yatong Chen, Wei Tang, Chien-Ju Ho, and Yang Liu. Performative prediction with bandit feedback: Learning through reparameterization.arXiv preprint arXiv:2305.01094, 2024

  5. [5]

    F. H. Clarke, Yu. S. Ledyaev, R. J. Stern, and P. R. Wolenski.Nonsmooth analysis and control theory, volume 178 ofGraduate Texts in Mathematics. Springer-Verlag, New York, 1998. ISBN 0-387-98336-8

  6. [6]

    Complexity guarantees for an implicit smoothing-enabled method for stochastic mpecs.Mathematical Programming, 198(2):1153– 1225, 2023

    Shisheng Cui, Uday V Shanbhag, and Farzad Yousefian. Complexity guarantees for an implicit smoothing-enabled method for stochastic mpecs.Mathematical Programming, 198(2):1153– 1225, 2023

  7. [7]

    Stochastic approximation with decision- dependent distributions: asymptotic normality and optimality.Journal of Machine Learning Research, 25(90):1–49, 2024

    Joshua Cutler, Mateo Diaz, and Dmitriy Drusvyatskiy. Stochastic approximation with decision- dependent distributions: asymptotic normality and optimality.Journal of Machine Learning Research, 25(90):1–49, 2024

  8. [8]

    Stochastic optimization with decision-dependent distri- butions.Mathematics of Operations Research, 48(2):954–998, 2023

    Dmitriy Drusvyatskiy and Lin Xiao. Stochastic optimization with decision-dependent distri- butions.Mathematics of Operations Research, 48(2):954–998, 2023. 42

  9. [9]

    The minimization of semicontinuous functions: mollifier subgradients.Siam journal on control and optimization, 33(1):149–167, 1995

    Yuri M Ermoliev, Vladimir I Norkin, and Roger JB Wets. The minimization of semicontinuous functions: mollifier subgradients.Siam journal on control and optimization, 33(1):149–167, 1995

  10. [10]

    Concentration bounds for stochastic approximations

    Noufel Frikha and Stéphane Menozzi. Concentration bounds for stochastic approximations. Electronic Communications in Probability, 17:1–15, 2012. doi: 10.1214/ECP.v17-1952

  11. [11]

    Michael C. Fu. Chapter 19 gradient estimation. In Shane G. Henderson and Barry L. Nelson, editors,Simulation, volume 13 ofHandbooks in Operations Research and Management Science, pages 575–616. Elsevier, 2006

  12. [12]

    Some elementary inequalities relating to the gamma and incomplete gamma function.J

    Walter Gautschi. Some elementary inequalities relating to the gamma and incomplete gamma function.J. Math. Phys, 38(1):77–81, 1959

  13. [13]

    Stochastic first-and zeroth-order methods for nonconvex stochastic programming.SIAM journal on optimization, 23(4):2341–2368, 2013

    Saeed Ghadimi and Guanghui Lan. Stochastic first-and zeroth-order methods for nonconvex stochastic programming.SIAM journal on optimization, 23(4):2341–2368, 2013

  14. [14]

    Likelilood ratio gradient estimation: an overview

    Peter W Glynn. Likelilood ratio gradient estimation: an overview. InProceedings of the 19th conference on Winter simulation, pages 366–375, 1987

  15. [15]

    Likelihood ratio derviative estimators for stochastic systems

    Peter W Glynn. Likelihood ratio derviative estimators for stochastic systems. InProceedings of the 21st conference on Winter simulation, pages 374–380, 1989

  16. [16]

    A. A. Goldstein. Optimization of Lipschitz continuous functions.Math. Programming, 13(1): 14–22, 1977. ISSN 0025-5610. doi: 10.1007/BF01584320. URLhttps://doi.org/10.1007/ BF01584320

  17. [17]

    Decision-dependent stochastic optimization: The role of distribution dynamics.arXiv preprint arXiv:2503.07324, 2025

    Zhiyu He, Saverio Bolognani, Florian Dörfler, and Michael Muehlebach. Decision-dependent stochastic optimization: The role of distribution dynamics.arXiv preprint arXiv:2503.07324, 2025

  18. [18]

    Stochastic approach for price optimization problems with decision-dependent uncertainty.European Journal of Operational Research, 322:541–553, 2025

    Yuya Hikima and Akiko Takeda. Stochastic approach for price optimization problems with decision-dependent uncertainty.European Journal of Operational Research, 322:541–553, 2025

  19. [19]

    Zeroth-order methods for nonconvex stochastic problems with decision-dependent distributions

    Yuya Hikima and Akiko Takeda. Zeroth-order methods for nonconvex stochastic problems with decision-dependent distributions. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 17195–17203, 2025

  20. [20]

    How to learn when data reacts to your model: performative gradient descent

    Zachary Izzo, Lexing Ying, and James Zou. How to learn when data reacts to your model: performative gradient descent. InInternational Conference on Machine Learning, pages 4641–

  21. [21]

    Afrooz Jalilzadeh, Uday Shanbhag, Jose Blanchet, and Peter W. Glynn. Smoothed variable sample-size accelerated proximal methods for nonsmooth stochastic convex programs.Stoch. Syst., 12(4):373–410, 2022. ISSN 1946-5238. doi: 10.1287/stsy.2022.0095. URLhttps://doi. org/10.1287/stsy.2022.0095

  22. [22]

    Using importance samping in estimating weak derivative.arXiv preprint arXiv:2209.13184, 2022

    Cheng Jie and Michael C Fu. Using importance samping in estimating weak derivative.arXiv preprint arXiv:2209.13184, 2022

  23. [23]

    Large deviations of vector-valued martingales in 2-smooth normed spaces.arXiv preprint arXiv:0809.0813, 2008

    Anatoli Juditsky and Arkadii S Nemirovski. Large deviations of vector-valued martingales in 2-smooth normed spaces.arXiv preprint arXiv:0809.0813, 2008

  24. [24]

    Lakshmanan and D

    H. Lakshmanan and D. Farias. Decentralized recourse allocation in dynamic networks of agents. SIAM Journal on Optimization, 19(2):911–940, 2008

  25. [25]

    Stochastic optimization schemes for performative prediction with nonconvex loss.Advances in Neural Information Processing Systems, 37:8673–8697, 2024

    Qiang Li and Hoi-To Wai. Stochastic optimization schemes for performative prediction with nonconvex loss.Advances in Neural Information Processing Systems, 37:8673–8697, 2024

  26. [26]

    Two-timescale derivative free optimization for perfor- mative prediction with markovian data.arXiv preprint arXiv:2310.05792, 2023

    Haitong Liu, Qiang Li, and Hoi-To Wai. Two-timescale derivative free optimization for perfor- mative prediction with markovian data.arXiv preprint arXiv:2310.05792, 2023

  27. [27]

    Zeroth-order gradient and quasi- newton methods for nonsmooth nonconvex stochastic optimization.SIAM Journal on Opti- mization, 36(2):564–596, 2026

    Luke Marrinan, Uday V Shanbhag, and Farzad Yousefian. Zeroth-order gradient and quasi- newton methods for nonsmooth nonconvex stochastic optimization.SIAM Journal on Opti- mization, 36(2):564–596, 2026

  28. [28]

    Stochastic opti- mization for performative prediction.Advances in Neural Information Processing Systems, 33: 4929–4939, 2020

    Celestine Mendler-Dünner, Juan Perdomo, Tijana Zrnic, and Moritz Hardt. Stochastic opti- mization for performative prediction.Advances in Neural Information Processing Systems, 33: 4929–4939, 2020. 43

  29. [30]

    Outside the echo chamber: Optimizing the performative risk

    John P Miller, Juan C Perdomo, and Tijana Zrnic. Outside the echo chamber: Optimizing the performative risk. InInternational Conference on Machine Learning, pages 7710–7720. PMLR, 2021

  30. [31]

    Nemirovski and D

    A. Nemirovski and D. Yudin.Problem Complexity and Method Efficiency in Optimization. Wiley -Intersci. Ser. Discrete Math 15 John Wiley New York, 1983

  31. [32]

    Random gradient-free minimization of convex functions

    Yurii Nesterov and Vladimir Spokoiny. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17(2):527–566, 2017

  32. [33]

    Performative prediction

    Juan Perdomo, Tijana Zrnic, Celestine Mendler-Dünner, and Moritz Hardt. Performative prediction. InInternational Conference on Machine Learning, pages 7599–7609. PMLR, 2020

  33. [34]

    Polyak.Introduction to Optimization

    Boris T. Polyak.Introduction to Optimization. Optimization Software, New York, 1987

  34. [35]

    Zeroth-order methods for nondifferen- tiable, nonconvex, and hierarchical federated optimization.Advances in Neural Information Processing Systems, 36, 2023

    Yuyang Qiu, Uday Shanbhag, and Farzad Yousefian. Zeroth-order methods for nondifferen- tiable, nonconvex, and hierarchical federated optimization.Advances in Neural Information Processing Systems, 36, 2023

  35. [36]

    Decision-dependent risk minimization in geometrically decaying dynamic environments

    Mitas Ray, Lillian J Ratliff, Dmitriy Drusvyatskiy, and Maryam Fazel. Decision-dependent risk minimization in geometrically decaying dynamic environments. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 8081–8088, 2022

  36. [37]

    The Annals of Mathe- matical Statistics22(1), 79–86 (1951) https://doi.org/10.1214/aoms/1177729694

    Herbert Robbins and Sutton Monro. A Stochastic Approximation Method.The Annals of Mathematical Statistics, 22(3):400 – 407, 1951. doi: 10.1214/aoms/1177729586. URLhttps: //doi.org/10.1214/aoms/1177729586

  37. [38]

    A convergence theorem for non negative almost su- permartingales and some applications

    Herbert Robbins and David Siegmund. A convergence theorem for non negative almost su- permartingales and some applications. InOptimizing methods in statistics, pages 233–257. Elsevier, 1971

  38. [39]

    Zeroth-order randomized block methods for con- strained minimization of expectation-valued lipschitz continuous functions

    Uday V Shanbhag and Farzad Yousefian. Zeroth-order randomized block methods for con- strained minimization of expectation-valued lipschitz continuous functions. In2021 Seventh Indian Control Conference (ICC), pages 7–12. IEEE, 2021

  39. [40]

    Shapiro, D

    A. Shapiro, D. Dentcheva, and A. Ruszczyński.Lectures on Stochastic Programming: Modeling and Theory, volume 16. SIAM, 2014

  40. [41]

    Multivariate stochastic approximation using a simultaneous perturbation gra- dient approximation.IEEE transactions on automatic control, 37(3):332–341, 1992

    James C Spall. Multivariate stochastic approximation using a simultaneous perturbation gra- dient approximation.IEEE transactions on automatic control, 37(3):332–341, 1992

  41. [42]

    V. A. Steklov. Sur les expressions asymptotiques decertaines fonctions définies par les équations différentielles du second ordre et leers applications au problème du dévelopement d’une fonction arbitraire en séries procédant suivant les diverses fonctions.Comm. Charkov Math. Soc., 2(10): 97–199, 1907

  42. [43]

    Strong and weak convexity of sets and functions.Mathematics of Opera- tions Research, 8(2):231–259, 1983

    Jean-Philippe Vial. Strong and weak convexity of sets and functions.Mathematics of Opera- tions Research, 8(2):231–259, 1983. ISSN 0364765X, 15265471. URLhttp://www.jstor.org/ stable/3689591

  43. [44]

    Improving dimension depen- dence in complexity guarantees for zeroth-order methods via exponentially-shifted gaussian smoothing

    Mingrui Wang, Prakash Chakraborty, and Uday V Shanbhag. Improving dimension depen- dence in complexity guarantees for zeroth-order methods via exponentially-shifted gaussian smoothing. In2024 Winter Simulation Conference (WSC), pages 3193–3204. IEEE, 2024

  44. [45]

    Con- strained optimization with decision-dependent distributions.arXiv preprint arXiv:2310.02384, 2023

    Zifan Wang, Changxin Liu, Thomas Parisini, Michael M Zavlanos, and Karl H Johansson. Con- strained optimization with decision-dependent distributions.arXiv preprint arXiv:2310.02384, 2023

  45. [46]

    MIT press Cambridge, MA, 2006

    Christopher KI Williams and Carl Edward Rasmussen.Gaussian processes for machine learn- ing, volume 2. MIT press Cambridge, MA, 2006

  46. [47]

    Yousefian, A

    F. Yousefian, A. Nedić, and U. V. Shanbhag. On stochastic gradient and subgradient methods with adaptive steplength sequences.Automatica, 48(1):56–67, 2012. 44 (M. W ang) Harold and Inge Marcus Department of Industrial and Manuf acturing Engineering, The Pennsyl v ania State University, University Park, PA 16802 United States, Email address:mvw5822@psu.ed...