Recognition: unknown
Stochastic Non-Smooth Non-Convex Optimization with Decision-Dependent Distributions
Pith reviewed 2026-05-08 07:58 UTC · model grok-4.3
The pith
For non-smooth non-convex problems with decision-dependent distributions, a stochastic zeroth-order method finds (δ,ε)-Goldstein stationary points using O(d² δ^{-3} ε^{-3}) single-query oracle calls.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For non-smooth non-convex objectives with decision-dependent distributions, a stochastic zeroth-order method finds a (δ,ε)-Goldstein stationary point with SZO complexity O(d² δ^{-3} ε^{-3}) and does so with only one SZO feedback per iteration. The same framework yields complexity O(d² ε^{-6}) when the objective is smooth and O(d² ε^{-9/2}) when the objective is Hessian-Lipschitz, the latter improving the best-known ε dependence for decision-dependent zeroth-order methods by a factor of ε^{-1/2}.
What carries the argument
A single-call stochastic zeroth-order estimator that produces unbiased estimates of smoothed directional derivatives while controlling the bias introduced by the decision-dependent sampling law.
If this is right
- Non-smooth non-convex problems with shifting distributions become tractable under single-query zeroth-order access.
- The single-feedback property removes the need for multiple independent samples at each step.
- Smooth problems admit an O(d² ε^{-6}) guarantee under the same decision-dependent model.
- Hessian-Lipschitz problems admit an O(d² ε^{-9/2}) guarantee that is strictly better than prior decision-dependent zeroth-order bounds.
- The rates are dimension-dependent only through a quadratic factor in d.
Where Pith is reading between the lines
- The single-query result suggests the approach could be plugged into expensive black-box simulators without doubling evaluation cost.
- The improvement in the Hessian-Lipschitz regime hints that further smoothness assumptions may continue to reduce the exponent on ε.
- Applications such as policy optimization in environments whose transition law depends on the chosen policy become directly addressable by these rates.
Load-bearing premise
The distribution from which samples are drawn changes continuously or with bounded moments when the decision variable changes.
What would settle it
A concrete counter-example in which the sampling distribution jumps discontinuously with the decision and the observed iteration count to reach the target (δ,ε) exceeds the stated bound by more than a constant factor.
Figures
read the original abstract
We study stochastic zeroth-order optimization with decision-dependent distributions, where the sampling law depends on the current decision and only noisy function values are available. For the non-smooth non-convex setting, we establish an explicit convergence guarantee for finding a $(\delta,\epsilon)$-Goldstein stationary point with stochastic zeroth-order oracle (SZO) complexity of $\mathcal{O}(d^2\delta^{-3}\epsilon^{-3})$. In addition, we show that the above complexity can be achieved with single SZO feedback per iteration. We further extend the analysis to smooth and Hessian-Lipschitz objectives, obtaining complexities $\mathcal{O}(d^2\epsilon^{-6})$ and $\mathcal{O}(d^2\epsilon^{-9/2})$, respectively. In the Hessian-Lipschitz case, this improves the best-known dependence on $\epsilon$ for decision-dependent zeroth-order methods by a factor of $\epsilon^{-1/2}$.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript studies stochastic zeroth-order optimization (SZO) where the sampling distribution depends on the current decision variable and only noisy function values are available. For non-smooth non-convex objectives it establishes an explicit complexity of O(d² δ^{-3} ε^{-3}) to reach a (δ,ε)-Goldstein stationary point, achievable with a single SZO query per iteration. The analysis is extended to smooth objectives (O(d² ε^{-6})) and Hessian-Lipschitz objectives (O(d² ε^{-9/2})), the latter improving the best-known ε-dependence for decision-dependent zeroth-order methods by a factor of ε^{-1/2}.
Significance. If the stated rates hold under the paper's regularity conditions on the distribution map, the work supplies the first explicit SZO complexities for the non-smooth non-convex decision-dependent setting together with a single-query implementation and a concrete improvement in the Hessian-Lipschitz regime. These results are obtained by adapting standard one-point estimators while controlling the additional bias induced by distribution shift, which is a technically useful contribution for simulation-based and data-driven optimization problems.
major comments (2)
- [Theorem 3.2 and surrounding bias analysis] The single-SZO-per-iteration claim is central to the practical contribution. The proof must demonstrate that the bias term arising from the decision-dependent distribution shift remains controlled without requiring additional queries to estimate the distribution change; this control appears to rely on the Lipschitz continuity of the distribution map in total variation (or Wasserstein) distance together with uniform moment bounds, but the precise invocation of these assumptions in the one-point estimator analysis should be highlighted.
- [Section 5, Theorem 5.1] In the Hessian-Lipschitz extension the claimed O(d² ε^{-9/2}) rate improves prior work by ε^{-1/2}. The derivation of this improvement should be checked against the additional bias introduced by the decision-dependent sampling; if the improvement is obtained by a standard smoothing argument plus the distribution-Lipschitz assumption, the paper should state explicitly whether any extra logarithmic factors or dimension dependence appear.
minor comments (2)
- [Assumptions section] The precise regularity conditions on the decision-dependent distributions (Lipschitz continuity in total variation/Wasserstein distance and moment bounds) are stated in the full text but should be collected in a single assumption block immediately before the main theorems for easy reference.
- [Introduction] Notation for the Goldstein stationarity measure and the one-point estimator should be introduced with a short reminder in the introduction, as readers may not recall the exact definition of a (δ,ε)-Goldstein point.
Simulated Author's Rebuttal
We thank the referee for the positive assessment and recommendation for minor revision. We address the two major comments below and will incorporate clarifications to improve the exposition.
read point-by-point responses
-
Referee: [Theorem 3.2 and surrounding bias analysis] The single-SZO-per-iteration claim is central to the practical contribution. The proof must demonstrate that the bias term arising from the decision-dependent distribution shift remains controlled without requiring additional queries to estimate the distribution change; this control appears to rely on the Lipschitz continuity of the distribution map in total variation (or Wasserstein) distance together with uniform moment bounds, but the precise invocation of these assumptions in the one-point estimator analysis should be highlighted.
Authors: We agree that explicit highlighting improves clarity. In the proof of Theorem 3.2 the bias is controlled via the total-variation Lipschitz continuity of the distribution map (Assumption 2.3) together with the uniform moment bounds (Assumption 2.2); these allow the one-point estimator to be analyzed directly without extra queries. We will add a short remark immediately after the bias bound to reference these two assumptions at the relevant steps. revision: yes
-
Referee: [Section 5, Theorem 5.1] In the Hessian-Lipschitz extension the claimed O(d² ε^{-9/2}) rate improves prior work by ε^{-1/2}. The derivation of this improvement should be checked against the additional bias introduced by the decision-dependent sampling; if the improvement is obtained by a standard smoothing argument plus the distribution-Lipschitz assumption, the paper should state explicitly whether any extra logarithmic factors or dimension dependence appear.
Authors: The ε^{-9/2} dependence is obtained by the standard Hessian-Lipschitz smoothing argument together with our existing bias-control lemma; the distribution-Lipschitz assumption absorbs the additional bias into terms that do not worsen the ε exponent and introduce neither extra logarithmic factors nor further dimension dependence beyond the O(d²) already present. We will insert a clarifying sentence in the paragraph following Theorem 5.1 stating this explicitly. revision: yes
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper derives explicit SZO complexity bounds for reaching a (δ,ε)-Goldstein stationary point in the non-smooth non-convex case (and improved rates for smooth/Hessian-Lipschitz extensions) under decision-dependent distributions. These bounds are obtained by adapting standard one-point zeroth-order gradient estimators while controlling the additional bias induced by the distribution shift, using explicitly stated regularity conditions such as Lipschitz continuity of the distribution map in total variation or Wasserstein distance together with uniform moment bounds. The single-SZO-feedback-per-iteration claim is a direct consequence of the same analysis rather than an input assumption. No step reduces a claimed result to a fitted parameter, self-definition, or load-bearing self-citation chain; the argument is independent of the target complexities and relies on conventional proof techniques for zeroth-order stochastic optimization.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Highly-smooth zero-th order online optimization
Francis Bach and Vianney Perchet. Highly-smooth zero-th order online optimization. InConference on Learning Theory, pages 257–283. PMLR, 2016
2016
-
[2]
A theoretical and empirical comparison of gradient approximations in derivative-free optimization.Foundations of Computational Mathematics, 22(2):507–560, 2022
Albert S Berahas, Liyuan Cao, Krzysztof Choromanski, and Katya Scheinberg. A theoretical and empirical comparison of gradient approximations in derivative-free optimization.Foundations of Computational Mathematics, 22(2):507–560, 2022
2022
-
[3]
Faster gradient-free algorithms for nonsmooth nonconvex stochastic optimization
Lesi Chen, Jing Xu, and Luo Luo. Faster gradient-free algorithms for nonsmooth nonconvex stochastic optimization. InICML, 2023
2023
-
[4]
Yatong Chen, Wei Tang, Chien-Ju Ho, and Yang Liu. Performative prediction with bandit feedback: Learning through reparameterization.arXiv preprint arXiv:2305.01094, 2023
-
[5]
Clarke.Optimization and nonsmooth analysis
Frank H. Clarke.Optimization and nonsmooth analysis. SIAM, 1990
1990
-
[6]
Optimal stochastic non-smooth non-convex optimization through online-to-non-convex conversion
Ashok Cutkosky, Harsh Mehta, and Francesco Orabona. Optimal stochastic non-smooth non-convex optimization through online-to-non-convex conversion. InICML, 2023
2023
-
[7]
Stochastic optimization with decision-dependent distributions
Dmitriy Drusvyatskiy and Lin Xiao. Stochastic optimization with decision-dependent distributions. Mathematics of Operations Research, 48(2):954–998, 2023. doi: 10.1287/moor.2022.1287
-
[8]
Duchi, Peter L
John C. Duchi, Peter L. Bartlett, and Martin J. Wainwright. Randomized smoothing for stochastic optimization.SIAM Journal on Optimization, 2012
2012
-
[9]
Flaxman, Adam Tauman Kalai, and H
Abraham D. Flaxman, Adam Tauman Kalai, and H. Brendan McMahan. Online convex optimization in the bandit setting: gradient descent without a gradient. InSODA, 2005
2005
-
[10]
Stochastic first-and zeroth-order methods for nonconvex stochastic programming.SIAM journal on optimization, 23(4):2341–2368, 2013
Saeed Ghadimi and Guanghui Lan. Stochastic first-and zeroth-order methods for nonconvex stochastic programming.SIAM journal on optimization, 23(4):2341–2368, 2013
2013
-
[11]
Goldstein
A. Goldstein. Optimization of lipschitz continuous functions.Mathematical Programming, 1977
1977
-
[12]
On graduated optimization for stochastic non-convex problems
Elad Hazan, Kfir Yehuda Levy, and Shai Shalev-Shwartz. On graduated optimization for stochastic non-convex problems. InInternational conference on machine learning, pages 1833–1841. PMLR, 2016
2016
-
[13]
Zeroth-order methods for nonconvex stochastic problems with decision- dependent distributions
Yuya Hikima and Akiko Takeda. Zeroth-order methods for nonconvex stochastic problems with decision- dependent distributions. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 17195–17203, 2025
2025
-
[14]
Yuya Hikima and Akiko Takeda. Zeroth-order gradient estimators for stochastic problems with decision- dependent distributions.arXiv preprint arXiv:2510.24929, 2025
-
[15]
Guided zeroth-order methods for stochastic non- convex problems with decision-dependent distributions
Yuya Hikima, Hiroto Sawada, and Akihiro Fujino. Guided zeroth-order methods for stochastic non- convex problems with decision-dependent distributions. InProceedings of the International Conference on Machine Learning, 2025. 10
2025
-
[16]
An algorithm with optimal dimension-dependence for zero-order nonsmooth nonconvex stochastic optimization.JMLR, 2024
Guy Kornowski and Ohad Shamir. An algorithm with optimal dimension-dependence for zero-order nonsmooth nonconvex stochastic optimization.JMLR, 2024
2024
-
[17]
Derivative-free optimization methods.Acta Numerica, 28:287–404, 2019
Jeffrey Larson, Matt Menickelly, and Stefan M Wild. Derivative-free optimization methods.Acta Numerica, 28:287–404, 2019
2019
-
[18]
Strategic classification made practical
Sagi Levanon and Nir Rosenfeld. Strategic classification made practical. InInternational Conference on Machine Learning, pages 6243–6253. PMLR, 2021
2021
-
[19]
Tianyi Lin, Zeyu Zheng, and Michael I. Jordan. Gradient-free methods for deterministic and stochastic nonsmooth nonconvex optimization. InNeurIPS, 2022
2022
-
[20]
Two-timescale derivative free optimization for performative prediction with markovian data
Honghao Liu, Qianxiao Li, and Hoi-To Wai. Two-timescale derivative free optimization for performative prediction with markovian data. InProceedings of the International Conference on Machine Learning, pages 31425–31450, 2024
2024
-
[21]
Zeroth-order methods for constrained nonconvex nonsmooth stochastic optimization
Zhuanghua Liu, Cheng Chen, Luo Luo, and Bryan Kian Hsiang Low. Zeroth-order methods for constrained nonconvex nonsmooth stochastic optimization. InForty-first International Conference on Machine Learning, 2024
2024
-
[22]
Gradient-free methods for nonconvex nonsmooth stochastic compositional optimization.Advances in Neural Information Processing Systems, 37:45438– 45461, 2024
Zhuanghua Liu, Luo Luo, and Bryan Kian Hsiang Low. Gradient-free methods for nonconvex nonsmooth stochastic compositional optimization.Advances in Neural Information Processing Systems, 37:45438– 45461, 2024
2024
-
[23]
Stochastic optimization for performative prediction.Advances in Neural Information Processing Systems, 33:4929–4939, 2020
Celestine Mendler-Dünner, Juan Perdomo, Tijana Zrnic, and Moritz Hardt. Stochastic optimization for performative prediction.Advances in Neural Information Processing Systems, 33:4929–4939, 2020
2020
-
[24]
Random gradient-free minimization of convex functions.Founda- tions of Computational Mathematics, 17(2):527–566, 2017
Yurii Nesterov and Vladimir Spokoiny. Random gradient-free minimization of convex functions.Founda- tions of Computational Mathematics, 17(2):527–566, 2017
2017
-
[25]
Performative prediction
Juan Perdomo, Tijana Zrnic, Celestine Mendler-Dünner, and Moritz Hardt. Performative prediction. In International Conference on Machine Learning, pages 7599–7609. PMLR, 2020
2020
-
[26]
Decision-dependent risk mini- mization in geometrically decaying dynamic environments
Mitas Ray, Lillian J Ratliff, Dmitriy Drusvyatskiy, and Maryam Fazel. Decision-dependent risk mini- mization in geometrically decaying dynamic environments. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 8081–8088, 2022
2022
-
[27]
On the complexity of bandit and derivative-free stochastic convex optimization
Ohad Shamir. On the complexity of bandit and derivative-free stochastic convex optimization. In Conference on learning theory, pages 3–24. PMLR, 2013
2013
-
[28]
An optimal algorithm for bandit and zero-order convex optimization with two-point feedback.JMLR, 2017
Ohad Shamir. An optimal algorithm for bandit and zero-order convex optimization with two-point feedback.JMLR, 2017
2017
-
[29]
Actionable recourse in linear classification
Berk Ustun, Alexander Spangher, and Yang Liu. Actionable recourse in linear classification. FAT* ’19, page 10–19, New York, NY, USA, 2019. Association for Computing Machinery. ISBN 9781450361255
2019
-
[30]
Cambridge university press, 2019
Martin J Wainwright.High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019
2019
-
[31]
Haishan Ye and Xiangyu Chang. Can a one-point feedback zeroth-order algorithm achieve linear dimension dependent sample complexity?arXiv preprint arXiv:2508.12228, 2025
-
[32]
Shanbhag
Farzad Yousefian, Angelia Nedić, and Uday V. Shanbhag. On stochastic gradient and subgradient methods with adaptive steplength sequences.Automatica, 2012
2012
-
[33]
Complexity of finding stationary points of nonsmooth nonconvex functions
Jingzhao Zhang, Hongzhou Lin, Stefanie Jegelka, Suvrit Sra, and Ali Jadbabaie. Complexity of finding stationary points of nonsmooth nonconvex functions. InICML, 2020
2020
-
[34]
Yan Zhang, Yi Zhou, Kaiyi Ji, and Michael M Zavlanos. A new one-point residual-feedback oracle for black-box learning and control.Automatica, 136:110006, 2022. 11 A The Proof of Section 2 A.1 The Proof of Proposition 2.1 Proof.Part (a) follows from Assumption 2.2: |fδ(x)−f(x)|= Eu∼Unif(B d) f(x+δu)−f(x) ≤E |f(x+δu)−f(x)| ≤LδE[∥u∥]≤Lδ. Similarly, for anyx,...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.