SGD's stationary distribution is Boltzmann-Gibbs with temperature equal to step-size, concentrating exponentially on minimum-energy critical points.
and Mertikopoulos, P
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
verdicts
UNVERDICTED 3representative citing papers
Derives KL and TV error bounds for kTULA and tRLMC schemes, giving near-optimal ilde O(ε^{-1/2}) complexity for kTULA and ilde O(ε^{-1}) for tRLMC under log-Sobolev sampling.
New RSLMC sampling algorithms achieve uniform-in-time W2 error bounds of order O(sqrt(d) h) under gradient Lipschitz and log-Sobolev assumptions, with modified versions for superlinear gradient growth and supporting numerical examples.
citing papers explorer
-
What is the long-run distribution of stochastic gradient descent? A large deviations analysis
SGD's stationary distribution is Boltzmann-Gibbs with temperature equal to step-size, concentrating exponentially on minimum-energy critical points.