arxiv: 2601.08184 · v3 · submitted 2026-01-13 · 🧮 math.PR · cs.LG· stat.ML

Recognition: 1 theorem link

· Lean Theorem

Wasserstein-p Central Limit Theorem Rates: From Local Dependence to Markov Chains

Yixuan Zhang , Qiaomin Xie

Authors on Pith no claims yet

Pith reviewed 2026-05-16 15:17 UTC · model grok-4.3

classification 🧮 math.PR cs.LGstat.ML

keywords Wasserstein distanceCentral limit theoremMarkov chainsLocal dependenceU-statisticsGaussian approximationRegeneration

0 comments

The pith

Optimal rates of order n to the power of negative one half are established for Wasserstein-1 central limit theorems under local dependence and for geometrically ergodic Markov chains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves the first optimal convergence rate of order one over the square root of n in Wasserstein-1 distance for central limit theorems applied to multivariate sequences that are either locally dependent or generated by geometrically ergodic Markov chains. It also gives the first Wasserstein-p rates for p at least 2 under only mild moment conditions. These bounds improve existing results for the kinds of dependent data that appear in machine learning and operations research. An auxiliary bound on Wasserstein-1 Gaussian approximation error and a geometric tail result for regeneration times of split chains are the main technical tools.

Core claim

We establish the first optimal O(n^{-1/2}) rate in W_1 distance, as well as the first W_p (p≥2) CLT rates under mild moment assumptions, for multivariate data under local dependence and under geometric ergodicity of Markov chains, substantially improving prior bounds; the results rest on a tractable auxiliary bound for W_1 Gaussian approximation errors and a geometric tail for the regeneration time of the associated split chain.

What carries the argument

Auxiliary bound for W_1 Gaussian approximation errors together with the geometric tail bound on regeneration time of the split chain.

If this is right

The optimal W_1 rate for locally dependent sequences yields the first optimal W_1 CLT rate for multivariate U-statistics.
Non-asymptotic error bounds of order n^{-1/2} become available for dependent data in settings that previously required slower rates.
The same regeneration-time control applies directly to any geometrically ergodic chain without extra aperiodicity assumptions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The auxiliary W_1 bound may extend to other weak dependence notions such as beta-mixing with suitable decay rates.
Quantitative regeneration tails could be combined with existing Berry-Esseen machinery to obtain explicit constants for finite-state chains.
The W_p rates for p greater than or equal to 2 open the door to moment-based concentration inequalities that scale correctly with dimension for dependent samples.

Load-bearing premise

Geometric ergodicity of the Markov chain together with the mild moment conditions needed to control regeneration time tails and Gaussian approximation errors.

What would settle it

A concrete counter-example Markov chain that satisfies geometric ergodicity and the stated moments but whose W_1 distance to Gaussianity decays slower than n to the power of negative one half.

read the original abstract

Non-asymptotic central limit theorem (CLT) rates play a central role in modern machine learning and operations research. In this paper, we study CLT rates for multivariate dependent data in Wasserstein-$p$ ($W_p$) distance, for general $p\ge 1$. We focus on two fundamental dependence structures that commonly arise in practice: locally dependent sequences and geometrically ergodic Markov chains. In both settings, we establish the first optimal $\mathcal O(n^{-1/2})$ rate in $W_1$, as well as the first $W_p$ ($p\ge 2$) CLT rates under mild moment assumptions, substantially improving the best previously known bounds in these dependent-data regimes. As an application of our optimal $W_1$ rate for locally dependent sequences, we further obtain the first optimal $W_1$-CLT rate for multivariate $U$-statistics. On the technical side, we derive a tractable auxiliary bound for $W_1$ Gaussian approximation errors that is well suited for studying dependent data. For Markov chains, we further prove that the regeneration time of the split chain associated with a geometrically ergodic chain has a geometric tail without assuming strong aperiodicity or other restrictive conditions. These tools may be of independent interests and enable our optimal $W_1$ rates and underpin our $W_p$ ($p\ge 2$) results.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gets the first optimal O(n^{-1/2}) W1 rate and first Wp rates for CLTs under local dependence and geometric ergodicity, backed by a new W1 approximation bound and a geometric tail result for split-chain regeneration.

read the letter

The main thing to know is that the authors deliver optimal W1 rates and new Wp rates for two common dependence settings that show up in time series and high-dimensional stats. They also get an optimal W1 rate for multivariate U-statistics as a quick application. The technical pieces that make this work are an auxiliary bound on W1 Gaussian approximation error suited to dependence, plus a proof that the regeneration time of the split chain has a geometric tail under plain geometric ergodicity without needing strong aperiodicity. That last step looks load-bearing for the Markov chain results and seems to relax conditions that earlier papers required. The rates improve on what was known, and the tools could be reusable. The soft spot is that everything rests on those new auxiliary bounds and the regeneration argument; the abstract states the claims clearly but the full error terms and block-length choices need the proofs to confirm they hold uniformly. The stress-test worry about subgeometric tails does not appear to be an issue here because the paper explicitly proves the geometric tail under the stated assumptions. This is for readers who need sharp non-asymptotic bounds on dependent data, such as people doing inference or optimization with Markov chains or locally dependent samples. It deserves a serious referee because the results are stated as firsts with concrete improvements and the technical steps are laid out enough to check.

Referee Report

2 major / 3 minor

Summary. The paper claims to establish the first optimal O(n^{-1/2}) rates in W_1 distance for CLTs under local dependence and for geometrically ergodic Markov chains, together with the first W_p (p≥2) rates under mild moment assumptions. It introduces an auxiliary bound on W_1 Gaussian approximation error suited to dependent data and proves that the regeneration time of the split chain for a geometrically ergodic Markov chain has a geometric tail without requiring strong aperiodicity. An application yields the first optimal W_1-CLT rate for multivariate U-statistics.

Significance. If the central claims hold, the results would meaningfully advance non-asymptotic CLT theory for dependent data by delivering optimal W_1 rates and new W_p rates where prior bounds were weaker. The auxiliary W_1 bound and the regeneration-tail argument are presented as reusable tools, which could benefit related work in probability and statistics. The explicit rates under standard ergodicity and moment conditions increase applicability to machine-learning and operations-research settings.

major comments (2)

[§4] §4 (Markov-chain CLT): the O(n^{-1/2}) W_1 rate is obtained by regeneration blocking; the geometric tail of the split-chain regeneration time is therefore load-bearing. The manuscript asserts this tail is geometric under geometric ergodicity alone, but the minorization construction must be shown to produce a period-independent geometric rate; otherwise the admissible block length cannot be chosen uniformly and the claimed rate fails.
[Theorem 3.2] Theorem 3.2 / auxiliary W_1 bound: the bound is used to control the Gaussian approximation error on each regeneration block. The error term must be stated explicitly (including dependence on the block length and the local-dependence or mixing parameters) so that the final O(n^{-1/2}) rate can be verified by direct substitution; without the explicit form the improvement over earlier bounds cannot be confirmed.

minor comments (3)

[§3] The precise definition of the local-dependence neighborhood size should be restated at the beginning of §3 for readability.
[Introduction] A short comparison table of the new rates versus the best previously known bounds (including the moment assumptions) would help readers assess the improvement.
[Notation] Notation for the Wasserstein-p distance and the regeneration time should be made consistent between the statements of the main theorems and the proofs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below and will revise the paper accordingly to improve clarity and explicitness of the bounds.

read point-by-point responses

Referee: [§4] §4 (Markov-chain CLT): the O(n^{-1/2}) W_1 rate is obtained by regeneration blocking; the geometric tail of the split-chain regeneration time is therefore load-bearing. The manuscript asserts this tail is geometric under geometric ergodicity alone, but the minorization construction must be shown to produce a period-independent geometric rate; otherwise the admissible block length cannot be chosen uniformly and the claimed rate fails.

Authors: We thank the referee for highlighting this point. Geometric ergodicity directly yields a minorization condition on a set C with positive measure and constant δ > 0 (see the standard construction in Meyn and Tweedie). In Lemma 4.3 we apply the split-chain construction and derive P(τ > k) ≤ r^k for r = 1 - δ/2, where the rate r is independent of any periodicity because the minorization resets the chain upon hitting the atom, regardless of the original chain's period. The admissible block length m can therefore be chosen uniformly (e.g., m = O(1) or slowly growing) solely from the ergodicity parameters. We will add a short remark after Lemma 4.3 explicitly noting the period-independence and the resulting uniform choice of m. revision: partial
Referee: [Theorem 3.2] Theorem 3.2 / auxiliary W_1 bound: the bound is used to control the Gaussian approximation error on each regeneration block. The error term must be stated explicitly (including dependence on the block length and the local-dependence or mixing parameters) so that the final O(n^{-1/2}) rate can be verified by direct substitution; without the explicit form the improvement over earlier bounds cannot be confirmed.

Authors: We agree that an expanded statement will make the substitution transparent. Theorem 3.2 currently bounds the W_1 Gaussian approximation error for a locally dependent block of length m by C(β_m + m^{-1/2} + α(m)), where β_m collects moment terms and α(m) is the local-dependence coefficient over distance m. We will revise the theorem statement to display this dependence on m explicitly and add a short calculation in the proof of Theorem 4.1 showing that, after summing over n/m blocks, the total error is O(n^{-1/2}) once m is fixed from the geometric rate. This will also clarify the improvement relative to prior bounds that carried extra logarithmic or polynomial factors. revision: yes

Circularity Check

0 steps flagged

No circularity: derivations use independent auxiliary bounds and standard regeneration arguments

full rationale

The paper establishes W_p CLT rates via a new W_1 Gaussian approximation bound and a regeneration-time tail for split chains under geometric ergodicity. Neither step reduces to a fitted parameter or self-defined quantity; the tail bound is proved directly from the minorization condition without invoking the target rate. No self-citation is load-bearing for the central claims, and all steps remain falsifiable against external Markov-chain examples.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on geometric ergodicity of the Markov chain and mild moment assumptions; no free parameters are introduced and no new entities are postulated.

axioms (2)

domain assumption The Markov chain is geometrically ergodic
Invoked to guarantee geometric tail on regeneration time of the split chain without strong aperiodicity.
domain assumption Mild moment assumptions on the underlying random variables
Required to obtain the W_p rates for p greater than or equal to 2.

pith-pipeline@v0.9.0 · 5554 in / 1312 out tokens · 24907 ms · 2026-05-16T15:17:32.486762+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

76 extracted references · 76 canonical work pages · 3 internal anchors

[1]

A tail inequality for suprema of unbounded empirical processes with applications to markov chains

Radoslaw Adamczak. A tail inequality for suprema of unbounded empirical processes with applications to markov chains. 2008

work page 2008
[2]

Exponential concentration inequalities for additive functionals of markov chains.ESAIM: Probability and Statistics, 19:440–481, 2015

Radosław Adamczak and Witold Bednorz. Exponential concentration inequalities for additive functionals of markov chains.ESAIM: Probability and Statistics, 19:440–481, 2015

work page 2015
[3]

A normal approximation for the number of local maxima of a random function on a graph

Pierre Baldi, Yosef Rinott, and Charles Stein. A normal approximation for the number of local maxima of a random function on a graph. InProbability, statistics, and mathematics, pages 59–81. Elsevier, 1989

work page 1989
[4]

Asymptotic expansions based on smooth functions in the central limit theorem.Probability Theory and Related Fields, 72(2):289–303, 1986

Andrew D Barbour. Asymptotic expansions based on smooth functions in the central limit theorem.Probability Theory and Related Fields, 72(2):289–303, 1986

work page 1986
[5]

Renewal theory and computable convergence rates for geometrically ergodic markov chains

Peter H Baxendale. Renewal theory and computable convergence rates for geometrically ergodic markov chains. 2005

work page 2005
[6]

On the dependence of the berry–esseen bound on dimension.Journal of Statistical Planning and Inference, 113(2):385–402, 2003

Vidmantas Bentkus. On the dependence of the berry–esseen bound on dimension.Journal of Statistical Planning and Inference, 113(2):385–402, 2003

work page 2003
[7]

New bernstein and hoeffding type inequalities for regenerative markov chains

Patrice Bertail and Gabriela Ciołek. New bernstein and hoeffding type inequalities for regenerative markov chains. 2018

work page 2018
[8]

Covariance regularization by thresholding

Peter J Bickel and Elizaveta Levina. Covariance regularization by thresholding. 2008

work page 2008
[9]

Regularized estimation of large covariance matrices

Peter J Bickel and Elizaveta Levina. Regularized estimation of large covariance matrices. 2008

work page 2008
[10]

Entropic approach to e

Sergey G Bobkov. Entropic approach to e. rio’s central limit theorem for w2 transport distance.Statistics & Probability Letters, 83(7):1644–1648, 2013

work page 2013
[11]

Quantified cram\’er-wold continuity theorem for the kantorovich transport distance.arXiv preprint arXiv:2412.10276, 2024

Sergey G Bobkov and Friedrich Götze. Quantified cram\’er-wold continuity theorem for the kantorovich transport distance.arXiv preprint arXiv:2412.10276, 2024

work page arXiv 2024
[12]

Fourier analytic bounds for zolotarev distances, and applications to empirical measures.Preprint, 2024

Sergey G Bobkov and Michel Ledoux. Fourier analytic bounds for zolotarev distances, and applications to empirical measures.Preprint, 2024

work page 2024
[13]

The berry-esseen theorem for functionals of discrete markov chains.Zeitschrift für Wahrschein- lichkeitstheorie und verwandte Gebiete, 54(1):59–73, 1980

Erwin Bolthausen. The berry-esseen theorem for functionals of discrete markov chains.Zeitschrift für Wahrschein- lichkeitstheorie und verwandte Gebiete, 54(1):59–73, 1980

work page 1980
[14]

The berry-esseen theorem for strongly mixing harris recurrent markov chains.Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 60(3):283–289, 1982

Erwin Bolthausen. The berry-esseen theorem for strongly mixing harris recurrent markov chains.Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 60(3):283–289, 1982

work page 1982
[15]

Thomas Bonis. Stein’s method for normal approximation in wasserstein distances with application to the multivariate central limit theorem.Probability Theory and Related Fields, 178(3):827–860, 2020

work page 2020
[16]

Bagging predictors.Machine learning, 24(2):123–140, 1996

Leo Breiman. Bagging predictors.Machine learning, 24(2):123–140, 1996

work page 1996
[17]

Random forests.Machine learning, 45(1):5–32, 2001

Leo Breiman. Random forests.Machine learning, 45(1):5–32, 2001

work page 2001
[18]

The berry-esseen theorem for u-statistics.The Annals of Statistics, pages 417–421, 1978

Herman Callaert and Paul Janssen. The berry-esseen theorem for u-statistics.The Annals of Statistics, pages 417–421, 1978

work page 1978
[19]

Springer Science & Business Media, 2010

Louis HY Chen, Larry Goldstein, and Qi-Man Shao.Normal approximation by Stein’s method. Springer Science & Business Media, 2010

work page 2010
[20]

Gaussian and bootstrap approximations for high-dimensional u-statistics and their applications

Xiaohui Chen. Gaussian and bootstrap approximations for high-dimensional u-statistics and their applications. 2018

work page 2018
[21]

Randomized incomplete u-statistics in high dimensions

Xiaohui Chen and Kengo Kato. Randomized incomplete u-statistics in high dimensions. 2019

work page 2019
[22]

Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors.The Annals of Statistics, pages 2786–2819, 2013

Victor Chernozhukov, Denis Chetverikov, and Kengo Kato. Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors.The Annals of Statistics, pages 2786–2819, 2013

work page 2013
[23]

Renewal theory.Chapman and Hall, 1962

David Roxbee Cox. Renewal theory.Chapman and Hall, 1962

work page 1962
[24]

Routledge, 2017

David Roxbee Cox.The theory of stochastic processes. Routledge, 2017

work page 2017
[25]

Covariance selection.Biometrics, pages 157–175, 1972

Arthur P Dempster. Covariance selection.Biometrics, pages 157–175, 1972

work page 1972
[26]

The central limit theorem for m-dependent variables

PH Diananda. The central limit theorem for m-dependent variables. InMathematical Proceedings of the Cambridge Philosophical Society, volume 51, pages 92–95. Cambridge University Press, 1955

work page 1955
[27]

Bridging the gap between constant step size stochastic gradient descent and markov chains

Aymeric Dieuleveut, Alain Durmus, and Francis Bach. Bridging the gap between constant step size stochastic gradient descent and markov chains. 2020

work page 2020
[28]

Bounds on regeneration times and limit theorems for subgeometric markov chains

Randal Douc, Arnaud Guillin, and Eric Moulines. Bounds on regeneration times and limit theorems for subgeometric markov chains. InAnnales de l’IHP Probabilités et statistiques, volume 44, pages 239–257, 2008

work page 2008
[29]

Springer

Randal Douc, Eric Moulines, Pierre Priouret, and Philippe Soulier.Markov chains, volume 4. Springer

work page
[30]

Wasserstein-2 bounds in normal approximation under local dependence

Xiao Fang. Wasserstein-2 bounds in normal approximation under local dependence. 2019

work page 2019
[31]

From p-wasserstein bounds to moderate deviations.Electronic Journal of Probability, 28:1–52, 2023

Xiao Fang and Yuta Koike. From p-wasserstein bounds to moderate deviations.Electronic Journal of Probability, 28:1–52, 2023

work page 2023
[32]

Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts

Gerald B Folland.Real analysis: modern techniques and their applications. Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts. John Wiley & Sons, Nashville, TN, 2 edition, March 1999

work page 1999
[33]

Regularity of solutions of the Stein equation and rates in the multivariate central limit theorem

Thomas Gallouët, Guillaume Mijoule, and Yvik Swan. Regularity of solutions of the stein equation and rates in the multivariate central limit theorem.arXiv preprint arXiv:1805.01720, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[34]

Multivariate normal approximations by stein’s method and size bias couplings

Larry Goldstein and Yosef Rinott. Multivariate normal approximations by stein’s method and size bias couplings. Journal of Applied Probability, 33(1):1–17, 1996. , Vol. 1, No. 1, Article . Publication date: January 2026. 22 Yixuan Zhang and Qiaomin Xie

work page 1996
[35]

Maximal coupling.Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 46(2):193–204, 1979

Sheldon Goldstein. Maximal coupling.Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 46(2):193–204, 1979

work page 1979
[36]

Approximations for multivariate u-statistics.Journal of multivariate analysis, 22(2):212–229, 1987

Friedrich Götze. Approximations for multivariate u-statistics.Journal of multivariate analysis, 22(2):212–229, 1987

work page 1987
[37]

Inductive representation learning on large graphs.Advances in neural information processing systems, 30, 2017

Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs.Advances in neural information processing systems, 30, 2017

work page 2017
[38]

Monte carlo sampling methods using markov chains and their applications

W Keith Hastings. Monte carlo sampling methods using markov chains and their applications. 1970

work page 1970
[39]

Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

work page 2020
[40]

A class of statistics with asymptotically normal distribution

Wassily Hoeffding. A class of statistics with asymptotically normal distribution. InBreakthroughs in statistics: Foundations and basic theory, pages 308–334. Springer, 1992

work page 1992
[41]

The collusion of memory and nonlinearity in stochastic approximation with constant stepsize.Advances in Neural Information Processing Systems, 37:21699–21762, 2024

Dongyan Lucy Huo, Yixuan Zhang, Yudong Chen, and Qiaomin Xie. The collusion of memory and nonlinearity in stochastic approximation with constant stepsize.Advances in Neural Information Processing Systems, 37:21699–21762, 2024

work page 2024
[42]

On the accuracy of gaussian approximation to the distribution functions of sums of independent variables.Theory of Probability & Its Applications, 11(4):559–579, 1966

IA Ibragimov. On the accuracy of gaussian approximation to the distribution functions of sums of independent variables.Theory of Probability & Its Applications, 11(4):559–579, 1966

work page 1966
[43]

Honest exploration of intractable probability distributions via markov chain monte carlo.Statistical Science, pages 312–334, 2001

Galin L Jones and James P Hobert. Honest exploration of intractable probability distributions via markov chain monte carlo.Statistical Science, pages 312–334, 2001

work page 2001
[44]

Semi-Supervised Classification with Graph Convolutional Networks

TN Kipf. Semi-supervised classification with graph convolutional networks.arXiv preprint arXiv:1609.02907, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[45]

Wasserstein distributionally robust optimization: Theory and applications in machine learning

Daniel Kuhn, Peyman Mohajerin Esfahani, Viet Anh Nguyen, and Soroosh Shafieezadeh-Abadeh. Wasserstein distributionally robust optimization: Theory and applications in machine learning. InOperations research & management science in the age of analytics, pages 130–166. Informs, 2019

work page 2019
[46]

General bernstein-like inequality for additive functionals of markov chains.Journal of Theoretical Probability, 34(3):1426–1454, 2021

Michał Lemańczyk. General bernstein-like inequality for additive functionals of markov chains.Journal of Theoretical Probability, 34(3):1426–1454, 2021

work page 2021
[47]

Wasserstein-p bounds in the central limit theorem under local dependence.Electronic Journal of Probability, 28:1–47, 2023

Tianle Liu and Morgane Austern. Wasserstein-p bounds in the central limit theorem under local dependence.Electronic Journal of Probability, 28:1–47, 2023

work page 2023
[48]

Chapman and Hall/CRC, 2021

Vsevolod K Malinovskii.Level-crossing problems and inverse Gaussian distributions: closed-form results and approxima- tions. Chapman and Hall/CRC, 2021

work page 2021
[49]

A family of nonlinear fourth order equations of gradient flow type.Communications in Partial Differential Equations, 34(11):1352–1397, 2009

Daniel Matthes, Robert J McCann, and Giuseppe Savaré. A family of nonlinear fourth order equations of gradient flow type.Communications in Partial Differential Equations, 34(11):1352–1397, 2009

work page 2009
[50]

Quantifying uncertainty in random forests via confidence intervals and hypothesis tests.Journal of Machine Learning Research, 17(26):1–41, 2016

Lucas Mentch and Giles Hooker. Quantifying uncertainty in random forests via confidence intervals and hypothesis tests.Journal of Machine Learning Research, 17(26):1–41, 2016

work page 2016
[51]

Meyn and Richard L

Sean P. Meyn and Richard L. Tweedie.Markov Chains and Stochastic Stability. Cambridge Mathematical Library. Cambridge University Press, Cambridge, 2nd edition, 2009

work page 2009
[52]

Integral probability metrics and their generating classes of functions.Advances in applied probability, 29(2):429–443, 1997

Alfred Müller. Integral probability metrics and their generating classes of functions.Advances in applied probability, 29(2):429–443, 1997

work page 1997
[53]

Regeneration in markov chain samplers.Journal of the American Statistical Association, 90(429):233–241, 1995

Per Mykland, Luke Tierney, and Bin Yu. Regeneration in markov chain samplers.Journal of the American Statistical Association, 90(429):233–241, 1995

work page 1995
[54]

Cambridge University Press, 2012

Ivan Nourdin and Giovanni Peccati.Normal approximations with Malliavin calculus: from Stein’s method to universality, volume 192. Cambridge University Press, 2012

work page 2012
[55]

A splitting technique for harris recurrent markov chains.Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 43(4):309–318, 1978

Esa Nummelin. A splitting technique for harris recurrent markov chains.Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 43(4):309–318, 1978

work page 1978
[56]

Statistical aspects of wasserstein distances.Annual review of statistics and its application, 6(1):405–431, 2019

Victor M Panaretos and Yoav Zemel. Statistical aspects of wasserstein distances.Annual review of statistics and its application, 6(1):405–431, 2019

work page 2019
[57]

Springer Science & Business Media, 2012

Valentin V Petrov.Sums of independent random variables, volume 82. Springer Science & Business Media, 2012

work page 2012
[58]

Optimum bounds for the distributions of martingales in banach spaces.The Annals of Probability, pages 1679–1706, 1994

Iosif Pinelis. Optimum bounds for the distributions of martingales in banach spaces.The Annals of Probability, pages 1679–1706, 1994

work page 1994
[59]

Free states of the canonical anticommutation relations.Communications in Mathematical Physics, 16(1):1–33, 1970

Robert T Powers and Erling Størmer. Free states of the canonical anticommutation relations.Communications in Mathematical Physics, 16(1):1–33, 1970

work page 1970
[60]

John Wiley & Sons, 2014

Martin L Puterman.Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014

work page 2014
[61]

A multivariate clt for decomposable random vectors with finite second moments.Journal of Theoretical Probability, 17(3):573–603, 2004

Martin Raič. A multivariate clt for decomposable random vectors with finite second moments.Journal of Theoretical Probability, 17(3):573–603, 2004

work page 2004
[62]

A multivariate central limit theorem for Lipschitz and smooth test functions

Martin Raič. A multivariate central limit theorem for lipschitz and smooth test functions.arXiv preprint arXiv:1812.08268, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[63]

On normal approximation rates for certain sums of dependent random variables.Journal of Computational and Applied Mathematics, 55(2):135–143, 1994

Yosef Rinott. On normal approximation rates for certain sums of dependent random variables.Journal of Computational and Applied Mathematics, 55(2):135–143, 1994

work page 1994
[64]

Upper bounds for minimal distances in the central limit theorem

Emmanuel Rio. Upper bounds for minimal distances in the central limit theorem. InAnnales de l’IHP Probabilités et statistiques, volume 45, pages 802–817, 2009. , Vol. 1, No. 1, Article . Publication date: January 2026. Wasserstein-p Central Limit Theorem Rates: From Local Dependence to Markov Chains 23

work page 2009
[65]

Bounds on regeneration times and convergence rates for markov chains

Gareth O Roberts and Richard L Tweedie. Bounds on regeneration times and convergence rates for markov chains. Stochastic Processes and their applications, 80(2):211–229, 1999

work page 1999
[66]

A more general central limit theorem for m-dependent random variables with unbounded m.Statistics & probability letters, 47(2):115–124, 2000

Joseph P Romano and Michael Wolf. A more general central limit theorem for m-dependent random variables with unbounded m.Statistics & probability letters, 47(2):115–124, 2000

work page 2000
[67]

Minorization conditions and convergence rates for markov chain monte carlo.Journal of the American Statistical Association, 90(430):558–566, 1995

Jeffrey S Rosenthal. Minorization conditions and convergence rates for markov chain monte carlo.Journal of the American Statistical Association, 90(430):558–566, 1995

work page 1995
[68]

Gaussian approximation and multiplier bootstrap for polyak-ruppert averaged linear stochastic approximation with applications to td learning

Sergey Samsonov, Eric Moulines, Qi-Man Shao, Zhuo-Song Zhang, and Alexey Naumov. Gaussian approximation and multiplier bootstrap for polyak-ruppert averaged linear stochastic approximation with applications to td learning. Advances in Neural Information Processing Systems, 37:12408–12460, 2024

work page 2024
[69]

Rates of convergence in the central limit theorem for markov chains, with an application to td learning

R Srikant. Rates of convergence in the central limit theorem for markov chains, with an application to td learning. arXiv preprint arXiv:2401.15719, 2024

work page arXiv 2024
[70]

MIT press Cambridge, 1998

Richard S Sutton, Andrew G Barto, et al.Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998

work page 1998
[71]

Springer, 2008

Cédric Villani et al.Optimal transport: old and new, volume 338. Springer, 2008

work page 2008
[72]

Uncertainty quantification for markov chains with application to temporal difference learning.arXiv preprint arXiv:2502.13822, 2025

Weichen Wu, Yuting Wei, and Alessandro Rinaldo. Uncertainty quantification for markov chains with application to temporal difference learning.arXiv preprint arXiv:2502.13822, 2025

work page arXiv 2025
[73]

An analysis of constant step size sgd in the non-convex regime: Asymptotic normality and bias.Advances in Neural Information Processing Systems, 34:4234–4248, 2021

Lu Yu, Krishnakumar Balasubramanian, Stanislav Volgushev, and Murat A Erdogdu. An analysis of constant step size sgd in the non-convex regime: Asymptotic normality and bias.Advances in Neural Information Processing Systems, 34:4234–4248, 2021

work page 2021
[74]

Prelimit coupling and steady-state convergence of constant-stepsize nonsmooth contractive sa.arXiv preprint arXiv:2404.06023, 2024

Yixuan Zhang, Dongyan Huo, Yudong Chen, and Qiaomin Xie. Prelimit coupling and steady-state convergence of constant-stepsize nonsmooth contractive sa.arXiv preprint arXiv:2404.06023, 2024

work page arXiv 2024
[75]

A piecewise lyapunov analysis of sub-quadratic sgd: Applications to robust and quantile regression.ACM SIGMETRICS Performance Evaluation Review, 53(1):85–87, 2025

Yixuan Zhang, Dongyan Huo, Yudong Chen, and Qiaomin Xie. A piecewise lyapunov analysis of sub-quadratic sgd: Applications to robust and quantile regression.ACM SIGMETRICS Performance Evaluation Review, 53(1):85–87, 2025

work page 2025
[76]

Constant stepsize q-learning: Distributional convergence, bias and extrapolation

Yixuan Zhang and Qiaomin Xie. Constant stepsize q-learning: Distributional convergence, bias and extrapolation. arXiv preprint arXiv:2401.13884, 2024. A Proof of Theorem 1 In this section we complete the proof of Theorem 1 by providing only the additional details omitted from the proof outline in Sections 3.2 and 3.3. A.1 Proof of Proposition 1 By definit...

work page arXiv 2024