Unregularized limit of stochastic gradient method for Wasserstein distributionally robust optimization

Tam Le (LPSM (UMR\_8001); UPCit\'e)

arxiv: 2506.04948 · v2 · pith:I5NNE2TLnew · submitted 2025-06-05 · 🧮 math.OC · stat.ML

Unregularized limit of stochastic gradient method for Wasserstein distributionally robust optimization

Tam Le (LPSM (UMR\_8001) , UPCit\'e) This is my paper

classification 🧮 math.OC stat.ML

keywords convergencedistributionallyobjectiveoptimizationrobustunderunregularizedwasserstein

0 comments

read the original abstract

Wasserstein distributionally robust optimization offers a framework for model fitting in machine learning under potential shifts in the data distribution. We study a regularized variant of this problem in which entropic smoothing produces a sampled approximation of the original objective. We establish convergence of the approximate gradients to subgradients of the unregularized objective as the regularization parameter vanishes, enabling convergence guarantees for stochastic gradient methods. We obtain qualitative convergence results under general assumptions, then we provide convergence rates under additional regularity. In particular, we prove rates for the convergence of the unregularized objective values, up to sampling errors, when the regularization level is decreased across iterations. Our analysis yields byproducts of independent interest, including approximation results for smoothing of maximum functions subdifferentials and empirical lower bounds for dual solutions of Wasserstein distributionally robust optimization.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Unlearning with Asymmetric Sources: Improved Unlearning-Utility Trade-off with Public Data
cs.LG 2026-05 unverdicted novelty 7.0

Asymmetric Langevin Unlearning uses public data to suppress unlearning noise costs by O(1/n_pub²), enabling practical mass unlearning with preserved utility under distribution mismatch.