arxiv: 2605.06549 · v1 · submitted 2026-05-07 · 🧮 math.OC

Recognition: unknown

Stochastic Non-Smooth Non-Convex Optimization with Decision-Dependent Distributions

Chengchang Liu, Haishan Ye, John C.S. Lui, Zongqi Wan

Pith reviewed 2026-05-08 07:58 UTC · model grok-4.3

classification 🧮 math.OC

keywords stochastic zeroth-order optimizationdecision-dependent distributionsnon-smooth non-convex optimizationGoldstein stationary pointsconvergence complexitysingle oracle queryHessian-Lipschitz objectives

0 comments

The pith

For non-smooth non-convex problems with decision-dependent distributions, a stochastic zeroth-order method finds (δ,ε)-Goldstein stationary points using O(d² δ^{-3} ε^{-3}) single-query oracle calls.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to optimize non-smooth non-convex functions when the only access is noisy function values and the distribution from which those values are drawn changes with the current decision. It gives an explicit rate to reach an approximate stationary point defined via the Goldstein notion, and proves the same rate works with only one oracle call per iteration. The analysis is then extended to smooth objectives and to objectives whose Hessians are Lipschitz, producing concrete complexity bounds in each case. A reader would care because many practical tasks, from simulation-based design to certain learning problems, naturally have both non-smoothness and decision-dependent noise, and zeroth-order methods are the only option when gradients cannot be computed.

Core claim

For non-smooth non-convex objectives with decision-dependent distributions, a stochastic zeroth-order method finds a (δ,ε)-Goldstein stationary point with SZO complexity O(d² δ^{-3} ε^{-3}) and does so with only one SZO feedback per iteration. The same framework yields complexity O(d² ε^{-6}) when the objective is smooth and O(d² ε^{-9/2}) when the objective is Hessian-Lipschitz, the latter improving the best-known ε dependence for decision-dependent zeroth-order methods by a factor of ε^{-1/2}.

What carries the argument

A single-call stochastic zeroth-order estimator that produces unbiased estimates of smoothed directional derivatives while controlling the bias introduced by the decision-dependent sampling law.

If this is right

Non-smooth non-convex problems with shifting distributions become tractable under single-query zeroth-order access.
The single-feedback property removes the need for multiple independent samples at each step.
Smooth problems admit an O(d² ε^{-6}) guarantee under the same decision-dependent model.
Hessian-Lipschitz problems admit an O(d² ε^{-9/2}) guarantee that is strictly better than prior decision-dependent zeroth-order bounds.
The rates are dimension-dependent only through a quadratic factor in d.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The single-query result suggests the approach could be plugged into expensive black-box simulators without doubling evaluation cost.
The improvement in the Hessian-Lipschitz regime hints that further smoothness assumptions may continue to reduce the exponent on ε.
Applications such as policy optimization in environments whose transition law depends on the chosen policy become directly addressable by these rates.

Load-bearing premise

The distribution from which samples are drawn changes continuously or with bounded moments when the decision variable changes.

What would settle it

A concrete counter-example in which the sampling distribution jumps discontinuously with the decision and the observed iteration count to reach the target (δ,ε) exceeds the stated bound by more than a constant factor.

Figures

Figures reproduced from arXiv: 2605.06549 by Chengchang Liu, Haishan Ye, John C.S. Lui, Zongqi Wan.

**Figure 1.** Figure 1: Strategic-classification train loss over oracle queries. view at source ↗

read the original abstract

We study stochastic zeroth-order optimization with decision-dependent distributions, where the sampling law depends on the current decision and only noisy function values are available. For the non-smooth non-convex setting, we establish an explicit convergence guarantee for finding a $(\delta,\epsilon)$-Goldstein stationary point with stochastic zeroth-order oracle (SZO) complexity of $\mathcal{O}(d^2\delta^{-3}\epsilon^{-3})$. In addition, we show that the above complexity can be achieved with single SZO feedback per iteration. We further extend the analysis to smooth and Hessian-Lipschitz objectives, obtaining complexities $\mathcal{O}(d^2\epsilon^{-6})$ and $\mathcal{O}(d^2\epsilon^{-9/2})$, respectively. In the Hessian-Lipschitz case, this improves the best-known dependence on $\epsilon$ for decision-dependent zeroth-order methods by a factor of $\epsilon^{-1/2}$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives new explicit SZO complexity bounds for non-smooth non-convex and Hessian-Lipschitz cases under decision-dependent sampling, with a concrete improvement in the latter.

read the letter

The main thing to know is that this work supplies explicit stochastic zeroth-order complexities for finding approximate stationary points when the sampling distribution itself depends on the current decision. For non-smooth non-convex problems they obtain O(d² δ^{-3} ε^{-3}) queries to reach a (δ,ε)-Goldstein point and show this is possible with a single SZO call per iteration. They also recover O(d² ε^{-6}) for smooth objectives and O(d² ε^{-9/2}) for Hessian-Lipschitz ones, the last of which improves the prior best dependence on ε by a factor of ε^{-1/2}.

Referee Report

2 major / 2 minor

Summary. The manuscript studies stochastic zeroth-order optimization (SZO) where the sampling distribution depends on the current decision variable and only noisy function values are available. For non-smooth non-convex objectives it establishes an explicit complexity of O(d² δ^{-3} ε^{-3}) to reach a (δ,ε)-Goldstein stationary point, achievable with a single SZO query per iteration. The analysis is extended to smooth objectives (O(d² ε^{-6})) and Hessian-Lipschitz objectives (O(d² ε^{-9/2})), the latter improving the best-known ε-dependence for decision-dependent zeroth-order methods by a factor of ε^{-1/2}.

Significance. If the stated rates hold under the paper's regularity conditions on the distribution map, the work supplies the first explicit SZO complexities for the non-smooth non-convex decision-dependent setting together with a single-query implementation and a concrete improvement in the Hessian-Lipschitz regime. These results are obtained by adapting standard one-point estimators while controlling the additional bias induced by distribution shift, which is a technically useful contribution for simulation-based and data-driven optimization problems.

major comments (2)

[Theorem 3.2 and surrounding bias analysis] The single-SZO-per-iteration claim is central to the practical contribution. The proof must demonstrate that the bias term arising from the decision-dependent distribution shift remains controlled without requiring additional queries to estimate the distribution change; this control appears to rely on the Lipschitz continuity of the distribution map in total variation (or Wasserstein) distance together with uniform moment bounds, but the precise invocation of these assumptions in the one-point estimator analysis should be highlighted.
[Section 5, Theorem 5.1] In the Hessian-Lipschitz extension the claimed O(d² ε^{-9/2}) rate improves prior work by ε^{-1/2}. The derivation of this improvement should be checked against the additional bias introduced by the decision-dependent sampling; if the improvement is obtained by a standard smoothing argument plus the distribution-Lipschitz assumption, the paper should state explicitly whether any extra logarithmic factors or dimension dependence appear.

minor comments (2)

[Assumptions section] The precise regularity conditions on the decision-dependent distributions (Lipschitz continuity in total variation/Wasserstein distance and moment bounds) are stated in the full text but should be collected in a single assumption block immediately before the main theorems for easy reference.
[Introduction] Notation for the Goldstein stationarity measure and the one-point estimator should be introduced with a short reminder in the introduction, as readers may not recall the exact definition of a (δ,ε)-Goldstein point.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and recommendation for minor revision. We address the two major comments below and will incorporate clarifications to improve the exposition.

read point-by-point responses

Referee: [Theorem 3.2 and surrounding bias analysis] The single-SZO-per-iteration claim is central to the practical contribution. The proof must demonstrate that the bias term arising from the decision-dependent distribution shift remains controlled without requiring additional queries to estimate the distribution change; this control appears to rely on the Lipschitz continuity of the distribution map in total variation (or Wasserstein) distance together with uniform moment bounds, but the precise invocation of these assumptions in the one-point estimator analysis should be highlighted.

Authors: We agree that explicit highlighting improves clarity. In the proof of Theorem 3.2 the bias is controlled via the total-variation Lipschitz continuity of the distribution map (Assumption 2.3) together with the uniform moment bounds (Assumption 2.2); these allow the one-point estimator to be analyzed directly without extra queries. We will add a short remark immediately after the bias bound to reference these two assumptions at the relevant steps. revision: yes
Referee: [Section 5, Theorem 5.1] In the Hessian-Lipschitz extension the claimed O(d² ε^{-9/2}) rate improves prior work by ε^{-1/2}. The derivation of this improvement should be checked against the additional bias introduced by the decision-dependent sampling; if the improvement is obtained by a standard smoothing argument plus the distribution-Lipschitz assumption, the paper should state explicitly whether any extra logarithmic factors or dimension dependence appear.

Authors: The ε^{-9/2} dependence is obtained by the standard Hessian-Lipschitz smoothing argument together with our existing bias-control lemma; the distribution-Lipschitz assumption absorbs the additional bias into terms that do not worsen the ε exponent and introduce neither extra logarithmic factors nor further dimension dependence beyond the O(d²) already present. We will insert a clarifying sentence in the paragraph following Theorem 5.1 stating this explicitly. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper derives explicit SZO complexity bounds for reaching a (δ,ε)-Goldstein stationary point in the non-smooth non-convex case (and improved rates for smooth/Hessian-Lipschitz extensions) under decision-dependent distributions. These bounds are obtained by adapting standard one-point zeroth-order gradient estimators while controlling the additional bias induced by the distribution shift, using explicitly stated regularity conditions such as Lipschitz continuity of the distribution map in total variation or Wasserstein distance together with uniform moment bounds. The single-SZO-feedback-per-iteration claim is a direct consequence of the same analysis rather than an input assumption. No step reduces a claimed result to a fitted parameter, self-definition, or load-bearing self-citation chain; the argument is independent of the target complexities and relies on conventional proof techniques for zeroth-order stochastic optimization.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; standard optimization assumptions such as bounded noise moments and decision-continuity of the distribution are likely required but unlisted.

pith-pipeline@v0.9.0 · 5462 in / 1138 out tokens · 53554 ms · 2026-05-08T07:58:36.307484+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 5 canonical work pages

[1]

Highly-smooth zero-th order online optimization

Francis Bach and Vianney Perchet. Highly-smooth zero-th order online optimization. InConference on Learning Theory, pages 257–283. PMLR, 2016

2016
[2]

A theoretical and empirical comparison of gradient approximations in derivative-free optimization.Foundations of Computational Mathematics, 22(2):507–560, 2022

Albert S Berahas, Liyuan Cao, Krzysztof Choromanski, and Katya Scheinberg. A theoretical and empirical comparison of gradient approximations in derivative-free optimization.Foundations of Computational Mathematics, 22(2):507–560, 2022

2022
[3]

Faster gradient-free algorithms for nonsmooth nonconvex stochastic optimization

Lesi Chen, Jing Xu, and Luo Luo. Faster gradient-free algorithms for nonsmooth nonconvex stochastic optimization. InICML, 2023

2023
[4]

Performative prediction with bandit feedback: Learning through reparameterization.arXiv preprint arXiv:2305.01094, 2023

Yatong Chen, Wei Tang, Chien-Ju Ho, and Yang Liu. Performative prediction with bandit feedback: Learning through reparameterization.arXiv preprint arXiv:2305.01094, 2023

work page arXiv 2023
[5]

Clarke.Optimization and nonsmooth analysis

Frank H. Clarke.Optimization and nonsmooth analysis. SIAM, 1990

1990
[6]

Optimal stochastic non-smooth non-convex optimization through online-to-non-convex conversion

Ashok Cutkosky, Harsh Mehta, and Francesco Orabona. Optimal stochastic non-smooth non-convex optimization through online-to-non-convex conversion. InICML, 2023

2023
[7]

Stochastic optimization with decision-dependent distributions

Dmitriy Drusvyatskiy and Lin Xiao. Stochastic optimization with decision-dependent distributions. Mathematics of Operations Research, 48(2):954–998, 2023. doi: 10.1287/moor.2022.1287

work page doi:10.1287/moor.2022.1287 2023
[8]

Duchi, Peter L

John C. Duchi, Peter L. Bartlett, and Martin J. Wainwright. Randomized smoothing for stochastic optimization.SIAM Journal on Optimization, 2012

2012
[9]

Flaxman, Adam Tauman Kalai, and H

Abraham D. Flaxman, Adam Tauman Kalai, and H. Brendan McMahan. Online convex optimization in the bandit setting: gradient descent without a gradient. InSODA, 2005

2005
[10]

Stochastic first-and zeroth-order methods for nonconvex stochastic programming.SIAM journal on optimization, 23(4):2341–2368, 2013

Saeed Ghadimi and Guanghui Lan. Stochastic first-and zeroth-order methods for nonconvex stochastic programming.SIAM journal on optimization, 23(4):2341–2368, 2013

2013
[11]

Goldstein

A. Goldstein. Optimization of lipschitz continuous functions.Mathematical Programming, 1977

1977
[12]

On graduated optimization for stochastic non-convex problems

Elad Hazan, Kfir Yehuda Levy, and Shai Shalev-Shwartz. On graduated optimization for stochastic non-convex problems. InInternational conference on machine learning, pages 1833–1841. PMLR, 2016

2016
[13]

Zeroth-order methods for nonconvex stochastic problems with decision- dependent distributions

Yuya Hikima and Akiko Takeda. Zeroth-order methods for nonconvex stochastic problems with decision- dependent distributions. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 17195–17203, 2025

2025
[14]

Zeroth-order gradient estimators for stochastic problems with decision- dependent distributions.arXiv preprint arXiv:2510.24929, 2025

Yuya Hikima and Akiko Takeda. Zeroth-order gradient estimators for stochastic problems with decision- dependent distributions.arXiv preprint arXiv:2510.24929, 2025

work page arXiv 2025
[15]

Guided zeroth-order methods for stochastic non- convex problems with decision-dependent distributions

Yuya Hikima, Hiroto Sawada, and Akihiro Fujino. Guided zeroth-order methods for stochastic non- convex problems with decision-dependent distributions. InProceedings of the International Conference on Machine Learning, 2025. 10

2025
[16]

An algorithm with optimal dimension-dependence for zero-order nonsmooth nonconvex stochastic optimization.JMLR, 2024

Guy Kornowski and Ohad Shamir. An algorithm with optimal dimension-dependence for zero-order nonsmooth nonconvex stochastic optimization.JMLR, 2024

2024
[17]

Derivative-free optimization methods.Acta Numerica, 28:287–404, 2019

Jeffrey Larson, Matt Menickelly, and Stefan M Wild. Derivative-free optimization methods.Acta Numerica, 28:287–404, 2019

2019
[18]

Strategic classification made practical

Sagi Levanon and Nir Rosenfeld. Strategic classification made practical. InInternational Conference on Machine Learning, pages 6243–6253. PMLR, 2021

2021
[19]

Tianyi Lin, Zeyu Zheng, and Michael I. Jordan. Gradient-free methods for deterministic and stochastic nonsmooth nonconvex optimization. InNeurIPS, 2022

2022
[20]

Two-timescale derivative free optimization for performative prediction with markovian data

Honghao Liu, Qianxiao Li, and Hoi-To Wai. Two-timescale derivative free optimization for performative prediction with markovian data. InProceedings of the International Conference on Machine Learning, pages 31425–31450, 2024

2024
[21]

Zeroth-order methods for constrained nonconvex nonsmooth stochastic optimization

Zhuanghua Liu, Cheng Chen, Luo Luo, and Bryan Kian Hsiang Low. Zeroth-order methods for constrained nonconvex nonsmooth stochastic optimization. InForty-first International Conference on Machine Learning, 2024

2024
[22]

Gradient-free methods for nonconvex nonsmooth stochastic compositional optimization.Advances in Neural Information Processing Systems, 37:45438– 45461, 2024

Zhuanghua Liu, Luo Luo, and Bryan Kian Hsiang Low. Gradient-free methods for nonconvex nonsmooth stochastic compositional optimization.Advances in Neural Information Processing Systems, 37:45438– 45461, 2024

2024
[23]

Stochastic optimization for performative prediction.Advances in Neural Information Processing Systems, 33:4929–4939, 2020

Celestine Mendler-Dünner, Juan Perdomo, Tijana Zrnic, and Moritz Hardt. Stochastic optimization for performative prediction.Advances in Neural Information Processing Systems, 33:4929–4939, 2020

2020
[24]

Random gradient-free minimization of convex functions.Founda- tions of Computational Mathematics, 17(2):527–566, 2017

Yurii Nesterov and Vladimir Spokoiny. Random gradient-free minimization of convex functions.Founda- tions of Computational Mathematics, 17(2):527–566, 2017

2017
[25]

Performative prediction

Juan Perdomo, Tijana Zrnic, Celestine Mendler-Dünner, and Moritz Hardt. Performative prediction. In International Conference on Machine Learning, pages 7599–7609. PMLR, 2020

2020
[26]

Decision-dependent risk mini- mization in geometrically decaying dynamic environments

Mitas Ray, Lillian J Ratliff, Dmitriy Drusvyatskiy, and Maryam Fazel. Decision-dependent risk mini- mization in geometrically decaying dynamic environments. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 8081–8088, 2022

2022
[27]

On the complexity of bandit and derivative-free stochastic convex optimization

Ohad Shamir. On the complexity of bandit and derivative-free stochastic convex optimization. In Conference on learning theory, pages 3–24. PMLR, 2013

2013
[28]

An optimal algorithm for bandit and zero-order convex optimization with two-point feedback.JMLR, 2017

Ohad Shamir. An optimal algorithm for bandit and zero-order convex optimization with two-point feedback.JMLR, 2017

2017
[29]

Actionable recourse in linear classification

Berk Ustun, Alexander Spangher, and Yang Liu. Actionable recourse in linear classification. FAT* ’19, page 10–19, New York, NY, USA, 2019. Association for Computing Machinery. ISBN 9781450361255

2019
[30]

Cambridge university press, 2019

Martin J Wainwright.High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019

2019
[31]

Can a one-point feedback zeroth-order algorithm achieve linear dimension dependent sample complexity?arXiv preprint arXiv:2508.12228, 2025

Haishan Ye and Xiangyu Chang. Can a one-point feedback zeroth-order algorithm achieve linear dimension dependent sample complexity?arXiv preprint arXiv:2508.12228, 2025

work page arXiv 2025
[32]

Shanbhag

Farzad Yousefian, Angelia Nedić, and Uday V. Shanbhag. On stochastic gradient and subgradient methods with adaptive steplength sequences.Automatica, 2012

2012
[33]

Complexity of finding stationary points of nonsmooth nonconvex functions

Jingzhao Zhang, Hongzhou Lin, Stefanie Jegelka, Suvrit Sra, and Ali Jadbabaie. Complexity of finding stationary points of nonsmooth nonconvex functions. InICML, 2020

2020
[34]

A new one-point residual-feedback oracle for black-box learning and control.Automatica, 136:110006, 2022

Yan Zhang, Yi Zhou, Kaiyi Ji, and Michael M Zavlanos. A new one-point residual-feedback oracle for black-box learning and control.Automatica, 136:110006, 2022. 11 A The Proof of Section 2 A.1 The Proof of Proposition 2.1 Proof.Part (a) follows from Assumption 2.2: |fδ(x)−f(x)|= Eu∼Unif(B d) f(x+δu)−f(x) ≤E |f(x+δu)−f(x)| ≤LδE[∥u∥]≤Lδ. Similarly, for anyx,...

work page arXiv 2022