pith. machine review for the scientific record. sign in

arxiv: 2601.08184 · v3 · submitted 2026-01-13 · 🧮 math.PR · cs.LG· stat.ML

Recognition: 1 theorem link

· Lean Theorem

Wasserstein-p Central Limit Theorem Rates: From Local Dependence to Markov Chains

Authors on Pith no claims yet

Pith reviewed 2026-05-16 15:17 UTC · model grok-4.3

classification 🧮 math.PR cs.LGstat.ML
keywords Wasserstein distanceCentral limit theoremMarkov chainsLocal dependenceU-statisticsGaussian approximationRegeneration
0
0 comments X

The pith

Optimal rates of order n to the power of negative one half are established for Wasserstein-1 central limit theorems under local dependence and for geometrically ergodic Markov chains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves the first optimal convergence rate of order one over the square root of n in Wasserstein-1 distance for central limit theorems applied to multivariate sequences that are either locally dependent or generated by geometrically ergodic Markov chains. It also gives the first Wasserstein-p rates for p at least 2 under only mild moment conditions. These bounds improve existing results for the kinds of dependent data that appear in machine learning and operations research. An auxiliary bound on Wasserstein-1 Gaussian approximation error and a geometric tail result for regeneration times of split chains are the main technical tools.

Core claim

We establish the first optimal O(n^{-1/2}) rate in W_1 distance, as well as the first W_p (p≥2) CLT rates under mild moment assumptions, for multivariate data under local dependence and under geometric ergodicity of Markov chains, substantially improving prior bounds; the results rest on a tractable auxiliary bound for W_1 Gaussian approximation errors and a geometric tail for the regeneration time of the associated split chain.

What carries the argument

Auxiliary bound for W_1 Gaussian approximation errors together with the geometric tail bound on regeneration time of the split chain.

If this is right

  • The optimal W_1 rate for locally dependent sequences yields the first optimal W_1 CLT rate for multivariate U-statistics.
  • Non-asymptotic error bounds of order n^{-1/2} become available for dependent data in settings that previously required slower rates.
  • The same regeneration-time control applies directly to any geometrically ergodic chain without extra aperiodicity assumptions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The auxiliary W_1 bound may extend to other weak dependence notions such as beta-mixing with suitable decay rates.
  • Quantitative regeneration tails could be combined with existing Berry-Esseen machinery to obtain explicit constants for finite-state chains.
  • The W_p rates for p greater than or equal to 2 open the door to moment-based concentration inequalities that scale correctly with dimension for dependent samples.

Load-bearing premise

Geometric ergodicity of the Markov chain together with the mild moment conditions needed to control regeneration time tails and Gaussian approximation errors.

What would settle it

A concrete counter-example Markov chain that satisfies geometric ergodicity and the stated moments but whose W_1 distance to Gaussianity decays slower than n to the power of negative one half.

read the original abstract

Non-asymptotic central limit theorem (CLT) rates play a central role in modern machine learning and operations research. In this paper, we study CLT rates for multivariate dependent data in Wasserstein-$p$ ($W_p$) distance, for general $p\ge 1$. We focus on two fundamental dependence structures that commonly arise in practice: locally dependent sequences and geometrically ergodic Markov chains. In both settings, we establish the first optimal $\mathcal O(n^{-1/2})$ rate in $W_1$, as well as the first $W_p$ ($p\ge 2$) CLT rates under mild moment assumptions, substantially improving the best previously known bounds in these dependent-data regimes. As an application of our optimal $W_1$ rate for locally dependent sequences, we further obtain the first optimal $W_1$-CLT rate for multivariate $U$-statistics. On the technical side, we derive a tractable auxiliary bound for $W_1$ Gaussian approximation errors that is well suited for studying dependent data. For Markov chains, we further prove that the regeneration time of the split chain associated with a geometrically ergodic chain has a geometric tail without assuming strong aperiodicity or other restrictive conditions. These tools may be of independent interests and enable our optimal $W_1$ rates and underpin our $W_p$ ($p\ge 2$) results.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper claims to establish the first optimal O(n^{-1/2}) rates in W_1 distance for CLTs under local dependence and for geometrically ergodic Markov chains, together with the first W_p (p≥2) rates under mild moment assumptions. It introduces an auxiliary bound on W_1 Gaussian approximation error suited to dependent data and proves that the regeneration time of the split chain for a geometrically ergodic Markov chain has a geometric tail without requiring strong aperiodicity. An application yields the first optimal W_1-CLT rate for multivariate U-statistics.

Significance. If the central claims hold, the results would meaningfully advance non-asymptotic CLT theory for dependent data by delivering optimal W_1 rates and new W_p rates where prior bounds were weaker. The auxiliary W_1 bound and the regeneration-tail argument are presented as reusable tools, which could benefit related work in probability and statistics. The explicit rates under standard ergodicity and moment conditions increase applicability to machine-learning and operations-research settings.

major comments (2)
  1. [§4] §4 (Markov-chain CLT): the O(n^{-1/2}) W_1 rate is obtained by regeneration blocking; the geometric tail of the split-chain regeneration time is therefore load-bearing. The manuscript asserts this tail is geometric under geometric ergodicity alone, but the minorization construction must be shown to produce a period-independent geometric rate; otherwise the admissible block length cannot be chosen uniformly and the claimed rate fails.
  2. [Theorem 3.2] Theorem 3.2 / auxiliary W_1 bound: the bound is used to control the Gaussian approximation error on each regeneration block. The error term must be stated explicitly (including dependence on the block length and the local-dependence or mixing parameters) so that the final O(n^{-1/2}) rate can be verified by direct substitution; without the explicit form the improvement over earlier bounds cannot be confirmed.
minor comments (3)
  1. [§3] The precise definition of the local-dependence neighborhood size should be restated at the beginning of §3 for readability.
  2. [Introduction] A short comparison table of the new rates versus the best previously known bounds (including the moment assumptions) would help readers assess the improvement.
  3. [Notation] Notation for the Wasserstein-p distance and the regeneration time should be made consistent between the statements of the main theorems and the proofs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below and will revise the paper accordingly to improve clarity and explicitness of the bounds.

read point-by-point responses
  1. Referee: [§4] §4 (Markov-chain CLT): the O(n^{-1/2}) W_1 rate is obtained by regeneration blocking; the geometric tail of the split-chain regeneration time is therefore load-bearing. The manuscript asserts this tail is geometric under geometric ergodicity alone, but the minorization construction must be shown to produce a period-independent geometric rate; otherwise the admissible block length cannot be chosen uniformly and the claimed rate fails.

    Authors: We thank the referee for highlighting this point. Geometric ergodicity directly yields a minorization condition on a set C with positive measure and constant δ > 0 (see the standard construction in Meyn and Tweedie). In Lemma 4.3 we apply the split-chain construction and derive P(τ > k) ≤ r^k for r = 1 - δ/2, where the rate r is independent of any periodicity because the minorization resets the chain upon hitting the atom, regardless of the original chain's period. The admissible block length m can therefore be chosen uniformly (e.g., m = O(1) or slowly growing) solely from the ergodicity parameters. We will add a short remark after Lemma 4.3 explicitly noting the period-independence and the resulting uniform choice of m. revision: partial

  2. Referee: [Theorem 3.2] Theorem 3.2 / auxiliary W_1 bound: the bound is used to control the Gaussian approximation error on each regeneration block. The error term must be stated explicitly (including dependence on the block length and the local-dependence or mixing parameters) so that the final O(n^{-1/2}) rate can be verified by direct substitution; without the explicit form the improvement over earlier bounds cannot be confirmed.

    Authors: We agree that an expanded statement will make the substitution transparent. Theorem 3.2 currently bounds the W_1 Gaussian approximation error for a locally dependent block of length m by C(β_m + m^{-1/2} + α(m)), where β_m collects moment terms and α(m) is the local-dependence coefficient over distance m. We will revise the theorem statement to display this dependence on m explicitly and add a short calculation in the proof of Theorem 4.1 showing that, after summing over n/m blocks, the total error is O(n^{-1/2}) once m is fixed from the geometric rate. This will also clarify the improvement relative to prior bounds that carried extra logarithmic or polynomial factors. revision: yes

Circularity Check

0 steps flagged

No circularity: derivations use independent auxiliary bounds and standard regeneration arguments

full rationale

The paper establishes W_p CLT rates via a new W_1 Gaussian approximation bound and a regeneration-time tail for split chains under geometric ergodicity. Neither step reduces to a fitted parameter or self-defined quantity; the tail bound is proved directly from the minorization condition without invoking the target rate. No self-citation is load-bearing for the central claims, and all steps remain falsifiable against external Markov-chain examples.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on geometric ergodicity of the Markov chain and mild moment assumptions; no free parameters are introduced and no new entities are postulated.

axioms (2)
  • domain assumption The Markov chain is geometrically ergodic
    Invoked to guarantee geometric tail on regeneration time of the split chain without strong aperiodicity.
  • domain assumption Mild moment assumptions on the underlying random variables
    Required to obtain the W_p rates for p greater than or equal to 2.

pith-pipeline@v0.9.0 · 5554 in / 1312 out tokens · 24907 ms · 2026-05-16T15:17:32.486762+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

76 extracted references · 76 canonical work pages · 3 internal anchors

  1. [1]

    A tail inequality for suprema of unbounded empirical processes with applications to markov chains

    Radoslaw Adamczak. A tail inequality for suprema of unbounded empirical processes with applications to markov chains. 2008

  2. [2]

    Exponential concentration inequalities for additive functionals of markov chains.ESAIM: Probability and Statistics, 19:440–481, 2015

    Radosław Adamczak and Witold Bednorz. Exponential concentration inequalities for additive functionals of markov chains.ESAIM: Probability and Statistics, 19:440–481, 2015

  3. [3]

    A normal approximation for the number of local maxima of a random function on a graph

    Pierre Baldi, Yosef Rinott, and Charles Stein. A normal approximation for the number of local maxima of a random function on a graph. InProbability, statistics, and mathematics, pages 59–81. Elsevier, 1989

  4. [4]

    Asymptotic expansions based on smooth functions in the central limit theorem.Probability Theory and Related Fields, 72(2):289–303, 1986

    Andrew D Barbour. Asymptotic expansions based on smooth functions in the central limit theorem.Probability Theory and Related Fields, 72(2):289–303, 1986

  5. [5]

    Renewal theory and computable convergence rates for geometrically ergodic markov chains

    Peter H Baxendale. Renewal theory and computable convergence rates for geometrically ergodic markov chains. 2005

  6. [6]

    On the dependence of the berry–esseen bound on dimension.Journal of Statistical Planning and Inference, 113(2):385–402, 2003

    Vidmantas Bentkus. On the dependence of the berry–esseen bound on dimension.Journal of Statistical Planning and Inference, 113(2):385–402, 2003

  7. [7]

    New bernstein and hoeffding type inequalities for regenerative markov chains

    Patrice Bertail and Gabriela Ciołek. New bernstein and hoeffding type inequalities for regenerative markov chains. 2018

  8. [8]

    Covariance regularization by thresholding

    Peter J Bickel and Elizaveta Levina. Covariance regularization by thresholding. 2008

  9. [9]

    Regularized estimation of large covariance matrices

    Peter J Bickel and Elizaveta Levina. Regularized estimation of large covariance matrices. 2008

  10. [10]

    Entropic approach to e

    Sergey G Bobkov. Entropic approach to e. rio’s central limit theorem for w2 transport distance.Statistics & Probability Letters, 83(7):1644–1648, 2013

  11. [11]

    Quantified cram\’er-wold continuity theorem for the kantorovich transport distance.arXiv preprint arXiv:2412.10276, 2024

    Sergey G Bobkov and Friedrich Götze. Quantified cram\’er-wold continuity theorem for the kantorovich transport distance.arXiv preprint arXiv:2412.10276, 2024

  12. [12]

    Fourier analytic bounds for zolotarev distances, and applications to empirical measures.Preprint, 2024

    Sergey G Bobkov and Michel Ledoux. Fourier analytic bounds for zolotarev distances, and applications to empirical measures.Preprint, 2024

  13. [13]

    The berry-esseen theorem for functionals of discrete markov chains.Zeitschrift für Wahrschein- lichkeitstheorie und verwandte Gebiete, 54(1):59–73, 1980

    Erwin Bolthausen. The berry-esseen theorem for functionals of discrete markov chains.Zeitschrift für Wahrschein- lichkeitstheorie und verwandte Gebiete, 54(1):59–73, 1980

  14. [14]

    The berry-esseen theorem for strongly mixing harris recurrent markov chains.Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 60(3):283–289, 1982

    Erwin Bolthausen. The berry-esseen theorem for strongly mixing harris recurrent markov chains.Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 60(3):283–289, 1982

  15. [15]

    Thomas Bonis. Stein’s method for normal approximation in wasserstein distances with application to the multivariate central limit theorem.Probability Theory and Related Fields, 178(3):827–860, 2020

  16. [16]

    Bagging predictors.Machine learning, 24(2):123–140, 1996

    Leo Breiman. Bagging predictors.Machine learning, 24(2):123–140, 1996

  17. [17]

    Random forests.Machine learning, 45(1):5–32, 2001

    Leo Breiman. Random forests.Machine learning, 45(1):5–32, 2001

  18. [18]

    The berry-esseen theorem for u-statistics.The Annals of Statistics, pages 417–421, 1978

    Herman Callaert and Paul Janssen. The berry-esseen theorem for u-statistics.The Annals of Statistics, pages 417–421, 1978

  19. [19]

    Springer Science & Business Media, 2010

    Louis HY Chen, Larry Goldstein, and Qi-Man Shao.Normal approximation by Stein’s method. Springer Science & Business Media, 2010

  20. [20]

    Gaussian and bootstrap approximations for high-dimensional u-statistics and their applications

    Xiaohui Chen. Gaussian and bootstrap approximations for high-dimensional u-statistics and their applications. 2018

  21. [21]

    Randomized incomplete u-statistics in high dimensions

    Xiaohui Chen and Kengo Kato. Randomized incomplete u-statistics in high dimensions. 2019

  22. [22]

    Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors.The Annals of Statistics, pages 2786–2819, 2013

    Victor Chernozhukov, Denis Chetverikov, and Kengo Kato. Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors.The Annals of Statistics, pages 2786–2819, 2013

  23. [23]

    Renewal theory.Chapman and Hall, 1962

    David Roxbee Cox. Renewal theory.Chapman and Hall, 1962

  24. [24]

    Routledge, 2017

    David Roxbee Cox.The theory of stochastic processes. Routledge, 2017

  25. [25]

    Covariance selection.Biometrics, pages 157–175, 1972

    Arthur P Dempster. Covariance selection.Biometrics, pages 157–175, 1972

  26. [26]

    The central limit theorem for m-dependent variables

    PH Diananda. The central limit theorem for m-dependent variables. InMathematical Proceedings of the Cambridge Philosophical Society, volume 51, pages 92–95. Cambridge University Press, 1955

  27. [27]

    Bridging the gap between constant step size stochastic gradient descent and markov chains

    Aymeric Dieuleveut, Alain Durmus, and Francis Bach. Bridging the gap between constant step size stochastic gradient descent and markov chains. 2020

  28. [28]

    Bounds on regeneration times and limit theorems for subgeometric markov chains

    Randal Douc, Arnaud Guillin, and Eric Moulines. Bounds on regeneration times and limit theorems for subgeometric markov chains. InAnnales de l’IHP Probabilités et statistiques, volume 44, pages 239–257, 2008

  29. [29]

    Springer

    Randal Douc, Eric Moulines, Pierre Priouret, and Philippe Soulier.Markov chains, volume 4. Springer

  30. [30]

    Wasserstein-2 bounds in normal approximation under local dependence

    Xiao Fang. Wasserstein-2 bounds in normal approximation under local dependence. 2019

  31. [31]

    From p-wasserstein bounds to moderate deviations.Electronic Journal of Probability, 28:1–52, 2023

    Xiao Fang and Yuta Koike. From p-wasserstein bounds to moderate deviations.Electronic Journal of Probability, 28:1–52, 2023

  32. [32]

    Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts

    Gerald B Folland.Real analysis: modern techniques and their applications. Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts. John Wiley & Sons, Nashville, TN, 2 edition, March 1999

  33. [33]

    Regularity of solutions of the Stein equation and rates in the multivariate central limit theorem

    Thomas Gallouët, Guillaume Mijoule, and Yvik Swan. Regularity of solutions of the stein equation and rates in the multivariate central limit theorem.arXiv preprint arXiv:1805.01720, 2018

  34. [34]

    Multivariate normal approximations by stein’s method and size bias couplings

    Larry Goldstein and Yosef Rinott. Multivariate normal approximations by stein’s method and size bias couplings. Journal of Applied Probability, 33(1):1–17, 1996. , Vol. 1, No. 1, Article . Publication date: January 2026. 22 Yixuan Zhang and Qiaomin Xie

  35. [35]

    Maximal coupling.Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 46(2):193–204, 1979

    Sheldon Goldstein. Maximal coupling.Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 46(2):193–204, 1979

  36. [36]

    Approximations for multivariate u-statistics.Journal of multivariate analysis, 22(2):212–229, 1987

    Friedrich Götze. Approximations for multivariate u-statistics.Journal of multivariate analysis, 22(2):212–229, 1987

  37. [37]

    Inductive representation learning on large graphs.Advances in neural information processing systems, 30, 2017

    Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs.Advances in neural information processing systems, 30, 2017

  38. [38]

    Monte carlo sampling methods using markov chains and their applications

    W Keith Hastings. Monte carlo sampling methods using markov chains and their applications. 1970

  39. [39]

    Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

  40. [40]

    A class of statistics with asymptotically normal distribution

    Wassily Hoeffding. A class of statistics with asymptotically normal distribution. InBreakthroughs in statistics: Foundations and basic theory, pages 308–334. Springer, 1992

  41. [41]

    The collusion of memory and nonlinearity in stochastic approximation with constant stepsize.Advances in Neural Information Processing Systems, 37:21699–21762, 2024

    Dongyan Lucy Huo, Yixuan Zhang, Yudong Chen, and Qiaomin Xie. The collusion of memory and nonlinearity in stochastic approximation with constant stepsize.Advances in Neural Information Processing Systems, 37:21699–21762, 2024

  42. [42]

    On the accuracy of gaussian approximation to the distribution functions of sums of independent variables.Theory of Probability & Its Applications, 11(4):559–579, 1966

    IA Ibragimov. On the accuracy of gaussian approximation to the distribution functions of sums of independent variables.Theory of Probability & Its Applications, 11(4):559–579, 1966

  43. [43]

    Honest exploration of intractable probability distributions via markov chain monte carlo.Statistical Science, pages 312–334, 2001

    Galin L Jones and James P Hobert. Honest exploration of intractable probability distributions via markov chain monte carlo.Statistical Science, pages 312–334, 2001

  44. [44]

    Semi-Supervised Classification with Graph Convolutional Networks

    TN Kipf. Semi-supervised classification with graph convolutional networks.arXiv preprint arXiv:1609.02907, 2016

  45. [45]

    Wasserstein distributionally robust optimization: Theory and applications in machine learning

    Daniel Kuhn, Peyman Mohajerin Esfahani, Viet Anh Nguyen, and Soroosh Shafieezadeh-Abadeh. Wasserstein distributionally robust optimization: Theory and applications in machine learning. InOperations research & management science in the age of analytics, pages 130–166. Informs, 2019

  46. [46]

    General bernstein-like inequality for additive functionals of markov chains.Journal of Theoretical Probability, 34(3):1426–1454, 2021

    Michał Lemańczyk. General bernstein-like inequality for additive functionals of markov chains.Journal of Theoretical Probability, 34(3):1426–1454, 2021

  47. [47]

    Wasserstein-p bounds in the central limit theorem under local dependence.Electronic Journal of Probability, 28:1–47, 2023

    Tianle Liu and Morgane Austern. Wasserstein-p bounds in the central limit theorem under local dependence.Electronic Journal of Probability, 28:1–47, 2023

  48. [48]

    Chapman and Hall/CRC, 2021

    Vsevolod K Malinovskii.Level-crossing problems and inverse Gaussian distributions: closed-form results and approxima- tions. Chapman and Hall/CRC, 2021

  49. [49]

    A family of nonlinear fourth order equations of gradient flow type.Communications in Partial Differential Equations, 34(11):1352–1397, 2009

    Daniel Matthes, Robert J McCann, and Giuseppe Savaré. A family of nonlinear fourth order equations of gradient flow type.Communications in Partial Differential Equations, 34(11):1352–1397, 2009

  50. [50]

    Quantifying uncertainty in random forests via confidence intervals and hypothesis tests.Journal of Machine Learning Research, 17(26):1–41, 2016

    Lucas Mentch and Giles Hooker. Quantifying uncertainty in random forests via confidence intervals and hypothesis tests.Journal of Machine Learning Research, 17(26):1–41, 2016

  51. [51]

    Meyn and Richard L

    Sean P. Meyn and Richard L. Tweedie.Markov Chains and Stochastic Stability. Cambridge Mathematical Library. Cambridge University Press, Cambridge, 2nd edition, 2009

  52. [52]

    Integral probability metrics and their generating classes of functions.Advances in applied probability, 29(2):429–443, 1997

    Alfred Müller. Integral probability metrics and their generating classes of functions.Advances in applied probability, 29(2):429–443, 1997

  53. [53]

    Regeneration in markov chain samplers.Journal of the American Statistical Association, 90(429):233–241, 1995

    Per Mykland, Luke Tierney, and Bin Yu. Regeneration in markov chain samplers.Journal of the American Statistical Association, 90(429):233–241, 1995

  54. [54]

    Cambridge University Press, 2012

    Ivan Nourdin and Giovanni Peccati.Normal approximations with Malliavin calculus: from Stein’s method to universality, volume 192. Cambridge University Press, 2012

  55. [55]

    A splitting technique for harris recurrent markov chains.Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 43(4):309–318, 1978

    Esa Nummelin. A splitting technique for harris recurrent markov chains.Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 43(4):309–318, 1978

  56. [56]

    Statistical aspects of wasserstein distances.Annual review of statistics and its application, 6(1):405–431, 2019

    Victor M Panaretos and Yoav Zemel. Statistical aspects of wasserstein distances.Annual review of statistics and its application, 6(1):405–431, 2019

  57. [57]

    Springer Science & Business Media, 2012

    Valentin V Petrov.Sums of independent random variables, volume 82. Springer Science & Business Media, 2012

  58. [58]

    Optimum bounds for the distributions of martingales in banach spaces.The Annals of Probability, pages 1679–1706, 1994

    Iosif Pinelis. Optimum bounds for the distributions of martingales in banach spaces.The Annals of Probability, pages 1679–1706, 1994

  59. [59]

    Free states of the canonical anticommutation relations.Communications in Mathematical Physics, 16(1):1–33, 1970

    Robert T Powers and Erling Størmer. Free states of the canonical anticommutation relations.Communications in Mathematical Physics, 16(1):1–33, 1970

  60. [60]

    John Wiley & Sons, 2014

    Martin L Puterman.Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014

  61. [61]

    A multivariate clt for decomposable random vectors with finite second moments.Journal of Theoretical Probability, 17(3):573–603, 2004

    Martin Raič. A multivariate clt for decomposable random vectors with finite second moments.Journal of Theoretical Probability, 17(3):573–603, 2004

  62. [62]

    A multivariate central limit theorem for Lipschitz and smooth test functions

    Martin Raič. A multivariate central limit theorem for lipschitz and smooth test functions.arXiv preprint arXiv:1812.08268, 2018

  63. [63]

    On normal approximation rates for certain sums of dependent random variables.Journal of Computational and Applied Mathematics, 55(2):135–143, 1994

    Yosef Rinott. On normal approximation rates for certain sums of dependent random variables.Journal of Computational and Applied Mathematics, 55(2):135–143, 1994

  64. [64]

    Upper bounds for minimal distances in the central limit theorem

    Emmanuel Rio. Upper bounds for minimal distances in the central limit theorem. InAnnales de l’IHP Probabilités et statistiques, volume 45, pages 802–817, 2009. , Vol. 1, No. 1, Article . Publication date: January 2026. Wasserstein-p Central Limit Theorem Rates: From Local Dependence to Markov Chains 23

  65. [65]

    Bounds on regeneration times and convergence rates for markov chains

    Gareth O Roberts and Richard L Tweedie. Bounds on regeneration times and convergence rates for markov chains. Stochastic Processes and their applications, 80(2):211–229, 1999

  66. [66]

    A more general central limit theorem for m-dependent random variables with unbounded m.Statistics & probability letters, 47(2):115–124, 2000

    Joseph P Romano and Michael Wolf. A more general central limit theorem for m-dependent random variables with unbounded m.Statistics & probability letters, 47(2):115–124, 2000

  67. [67]

    Minorization conditions and convergence rates for markov chain monte carlo.Journal of the American Statistical Association, 90(430):558–566, 1995

    Jeffrey S Rosenthal. Minorization conditions and convergence rates for markov chain monte carlo.Journal of the American Statistical Association, 90(430):558–566, 1995

  68. [68]

    Gaussian approximation and multiplier bootstrap for polyak-ruppert averaged linear stochastic approximation with applications to td learning

    Sergey Samsonov, Eric Moulines, Qi-Man Shao, Zhuo-Song Zhang, and Alexey Naumov. Gaussian approximation and multiplier bootstrap for polyak-ruppert averaged linear stochastic approximation with applications to td learning. Advances in Neural Information Processing Systems, 37:12408–12460, 2024

  69. [69]

    Rates of convergence in the central limit theorem for markov chains, with an application to td learning

    R Srikant. Rates of convergence in the central limit theorem for markov chains, with an application to td learning. arXiv preprint arXiv:2401.15719, 2024

  70. [70]

    MIT press Cambridge, 1998

    Richard S Sutton, Andrew G Barto, et al.Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998

  71. [71]

    Springer, 2008

    Cédric Villani et al.Optimal transport: old and new, volume 338. Springer, 2008

  72. [72]

    Uncertainty quantification for markov chains with application to temporal difference learning.arXiv preprint arXiv:2502.13822, 2025

    Weichen Wu, Yuting Wei, and Alessandro Rinaldo. Uncertainty quantification for markov chains with application to temporal difference learning.arXiv preprint arXiv:2502.13822, 2025

  73. [73]

    An analysis of constant step size sgd in the non-convex regime: Asymptotic normality and bias.Advances in Neural Information Processing Systems, 34:4234–4248, 2021

    Lu Yu, Krishnakumar Balasubramanian, Stanislav Volgushev, and Murat A Erdogdu. An analysis of constant step size sgd in the non-convex regime: Asymptotic normality and bias.Advances in Neural Information Processing Systems, 34:4234–4248, 2021

  74. [74]

    Prelimit coupling and steady-state convergence of constant-stepsize nonsmooth contractive sa.arXiv preprint arXiv:2404.06023, 2024

    Yixuan Zhang, Dongyan Huo, Yudong Chen, and Qiaomin Xie. Prelimit coupling and steady-state convergence of constant-stepsize nonsmooth contractive sa.arXiv preprint arXiv:2404.06023, 2024

  75. [75]

    A piecewise lyapunov analysis of sub-quadratic sgd: Applications to robust and quantile regression.ACM SIGMETRICS Performance Evaluation Review, 53(1):85–87, 2025

    Yixuan Zhang, Dongyan Huo, Yudong Chen, and Qiaomin Xie. A piecewise lyapunov analysis of sub-quadratic sgd: Applications to robust and quantile regression.ACM SIGMETRICS Performance Evaluation Review, 53(1):85–87, 2025

  76. [76]

    Constant stepsize q-learning: Distributional convergence, bias and extrapolation

    Yixuan Zhang and Qiaomin Xie. Constant stepsize q-learning: Distributional convergence, bias and extrapolation. arXiv preprint arXiv:2401.13884, 2024. A Proof of Theorem 1 In this section we complete the proof of Theorem 1 by providing only the additional details omitted from the proof outline in Sections 3.2 and 3.3. A.1 Proof of Proposition 1 By definit...