Recognition: 1 theorem link
· Lean TheoremWasserstein-p Central Limit Theorem Rates: From Local Dependence to Markov Chains
Pith reviewed 2026-05-16 15:17 UTC · model grok-4.3
The pith
Optimal rates of order n to the power of negative one half are established for Wasserstein-1 central limit theorems under local dependence and for geometrically ergodic Markov chains.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We establish the first optimal O(n^{-1/2}) rate in W_1 distance, as well as the first W_p (p≥2) CLT rates under mild moment assumptions, for multivariate data under local dependence and under geometric ergodicity of Markov chains, substantially improving prior bounds; the results rest on a tractable auxiliary bound for W_1 Gaussian approximation errors and a geometric tail for the regeneration time of the associated split chain.
What carries the argument
Auxiliary bound for W_1 Gaussian approximation errors together with the geometric tail bound on regeneration time of the split chain.
If this is right
- The optimal W_1 rate for locally dependent sequences yields the first optimal W_1 CLT rate for multivariate U-statistics.
- Non-asymptotic error bounds of order n^{-1/2} become available for dependent data in settings that previously required slower rates.
- The same regeneration-time control applies directly to any geometrically ergodic chain without extra aperiodicity assumptions.
Where Pith is reading between the lines
- The auxiliary W_1 bound may extend to other weak dependence notions such as beta-mixing with suitable decay rates.
- Quantitative regeneration tails could be combined with existing Berry-Esseen machinery to obtain explicit constants for finite-state chains.
- The W_p rates for p greater than or equal to 2 open the door to moment-based concentration inequalities that scale correctly with dimension for dependent samples.
Load-bearing premise
Geometric ergodicity of the Markov chain together with the mild moment conditions needed to control regeneration time tails and Gaussian approximation errors.
What would settle it
A concrete counter-example Markov chain that satisfies geometric ergodicity and the stated moments but whose W_1 distance to Gaussianity decays slower than n to the power of negative one half.
read the original abstract
Non-asymptotic central limit theorem (CLT) rates play a central role in modern machine learning and operations research. In this paper, we study CLT rates for multivariate dependent data in Wasserstein-$p$ ($W_p$) distance, for general $p\ge 1$. We focus on two fundamental dependence structures that commonly arise in practice: locally dependent sequences and geometrically ergodic Markov chains. In both settings, we establish the first optimal $\mathcal O(n^{-1/2})$ rate in $W_1$, as well as the first $W_p$ ($p\ge 2$) CLT rates under mild moment assumptions, substantially improving the best previously known bounds in these dependent-data regimes. As an application of our optimal $W_1$ rate for locally dependent sequences, we further obtain the first optimal $W_1$-CLT rate for multivariate $U$-statistics. On the technical side, we derive a tractable auxiliary bound for $W_1$ Gaussian approximation errors that is well suited for studying dependent data. For Markov chains, we further prove that the regeneration time of the split chain associated with a geometrically ergodic chain has a geometric tail without assuming strong aperiodicity or other restrictive conditions. These tools may be of independent interests and enable our optimal $W_1$ rates and underpin our $W_p$ ($p\ge 2$) results.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to establish the first optimal O(n^{-1/2}) rates in W_1 distance for CLTs under local dependence and for geometrically ergodic Markov chains, together with the first W_p (p≥2) rates under mild moment assumptions. It introduces an auxiliary bound on W_1 Gaussian approximation error suited to dependent data and proves that the regeneration time of the split chain for a geometrically ergodic Markov chain has a geometric tail without requiring strong aperiodicity. An application yields the first optimal W_1-CLT rate for multivariate U-statistics.
Significance. If the central claims hold, the results would meaningfully advance non-asymptotic CLT theory for dependent data by delivering optimal W_1 rates and new W_p rates where prior bounds were weaker. The auxiliary W_1 bound and the regeneration-tail argument are presented as reusable tools, which could benefit related work in probability and statistics. The explicit rates under standard ergodicity and moment conditions increase applicability to machine-learning and operations-research settings.
major comments (2)
- [§4] §4 (Markov-chain CLT): the O(n^{-1/2}) W_1 rate is obtained by regeneration blocking; the geometric tail of the split-chain regeneration time is therefore load-bearing. The manuscript asserts this tail is geometric under geometric ergodicity alone, but the minorization construction must be shown to produce a period-independent geometric rate; otherwise the admissible block length cannot be chosen uniformly and the claimed rate fails.
- [Theorem 3.2] Theorem 3.2 / auxiliary W_1 bound: the bound is used to control the Gaussian approximation error on each regeneration block. The error term must be stated explicitly (including dependence on the block length and the local-dependence or mixing parameters) so that the final O(n^{-1/2}) rate can be verified by direct substitution; without the explicit form the improvement over earlier bounds cannot be confirmed.
minor comments (3)
- [§3] The precise definition of the local-dependence neighborhood size should be restated at the beginning of §3 for readability.
- [Introduction] A short comparison table of the new rates versus the best previously known bounds (including the moment assumptions) would help readers assess the improvement.
- [Notation] Notation for the Wasserstein-p distance and the regeneration time should be made consistent between the statements of the main theorems and the proofs.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below and will revise the paper accordingly to improve clarity and explicitness of the bounds.
read point-by-point responses
-
Referee: [§4] §4 (Markov-chain CLT): the O(n^{-1/2}) W_1 rate is obtained by regeneration blocking; the geometric tail of the split-chain regeneration time is therefore load-bearing. The manuscript asserts this tail is geometric under geometric ergodicity alone, but the minorization construction must be shown to produce a period-independent geometric rate; otherwise the admissible block length cannot be chosen uniformly and the claimed rate fails.
Authors: We thank the referee for highlighting this point. Geometric ergodicity directly yields a minorization condition on a set C with positive measure and constant δ > 0 (see the standard construction in Meyn and Tweedie). In Lemma 4.3 we apply the split-chain construction and derive P(τ > k) ≤ r^k for r = 1 - δ/2, where the rate r is independent of any periodicity because the minorization resets the chain upon hitting the atom, regardless of the original chain's period. The admissible block length m can therefore be chosen uniformly (e.g., m = O(1) or slowly growing) solely from the ergodicity parameters. We will add a short remark after Lemma 4.3 explicitly noting the period-independence and the resulting uniform choice of m. revision: partial
-
Referee: [Theorem 3.2] Theorem 3.2 / auxiliary W_1 bound: the bound is used to control the Gaussian approximation error on each regeneration block. The error term must be stated explicitly (including dependence on the block length and the local-dependence or mixing parameters) so that the final O(n^{-1/2}) rate can be verified by direct substitution; without the explicit form the improvement over earlier bounds cannot be confirmed.
Authors: We agree that an expanded statement will make the substitution transparent. Theorem 3.2 currently bounds the W_1 Gaussian approximation error for a locally dependent block of length m by C(β_m + m^{-1/2} + α(m)), where β_m collects moment terms and α(m) is the local-dependence coefficient over distance m. We will revise the theorem statement to display this dependence on m explicitly and add a short calculation in the proof of Theorem 4.1 showing that, after summing over n/m blocks, the total error is O(n^{-1/2}) once m is fixed from the geometric rate. This will also clarify the improvement relative to prior bounds that carried extra logarithmic or polynomial factors. revision: yes
Circularity Check
No circularity: derivations use independent auxiliary bounds and standard regeneration arguments
full rationale
The paper establishes W_p CLT rates via a new W_1 Gaussian approximation bound and a regeneration-time tail for split chains under geometric ergodicity. Neither step reduces to a fitted parameter or self-defined quantity; the tail bound is proved directly from the minorization condition without invoking the target rate. No self-citation is load-bearing for the central claims, and all steps remain falsifiable against external Markov-chain examples.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The Markov chain is geometrically ergodic
- domain assumption Mild moment assumptions on the underlying random variables
Reference graph
Works this paper leans on
-
[1]
A tail inequality for suprema of unbounded empirical processes with applications to markov chains
Radoslaw Adamczak. A tail inequality for suprema of unbounded empirical processes with applications to markov chains. 2008
work page 2008
-
[2]
Radosław Adamczak and Witold Bednorz. Exponential concentration inequalities for additive functionals of markov chains.ESAIM: Probability and Statistics, 19:440–481, 2015
work page 2015
-
[3]
A normal approximation for the number of local maxima of a random function on a graph
Pierre Baldi, Yosef Rinott, and Charles Stein. A normal approximation for the number of local maxima of a random function on a graph. InProbability, statistics, and mathematics, pages 59–81. Elsevier, 1989
work page 1989
-
[4]
Andrew D Barbour. Asymptotic expansions based on smooth functions in the central limit theorem.Probability Theory and Related Fields, 72(2):289–303, 1986
work page 1986
-
[5]
Renewal theory and computable convergence rates for geometrically ergodic markov chains
Peter H Baxendale. Renewal theory and computable convergence rates for geometrically ergodic markov chains. 2005
work page 2005
-
[6]
Vidmantas Bentkus. On the dependence of the berry–esseen bound on dimension.Journal of Statistical Planning and Inference, 113(2):385–402, 2003
work page 2003
-
[7]
New bernstein and hoeffding type inequalities for regenerative markov chains
Patrice Bertail and Gabriela Ciołek. New bernstein and hoeffding type inequalities for regenerative markov chains. 2018
work page 2018
-
[8]
Covariance regularization by thresholding
Peter J Bickel and Elizaveta Levina. Covariance regularization by thresholding. 2008
work page 2008
-
[9]
Regularized estimation of large covariance matrices
Peter J Bickel and Elizaveta Levina. Regularized estimation of large covariance matrices. 2008
work page 2008
-
[10]
Sergey G Bobkov. Entropic approach to e. rio’s central limit theorem for w2 transport distance.Statistics & Probability Letters, 83(7):1644–1648, 2013
work page 2013
-
[11]
Sergey G Bobkov and Friedrich Götze. Quantified cram\’er-wold continuity theorem for the kantorovich transport distance.arXiv preprint arXiv:2412.10276, 2024
-
[12]
Sergey G Bobkov and Michel Ledoux. Fourier analytic bounds for zolotarev distances, and applications to empirical measures.Preprint, 2024
work page 2024
-
[13]
Erwin Bolthausen. The berry-esseen theorem for functionals of discrete markov chains.Zeitschrift für Wahrschein- lichkeitstheorie und verwandte Gebiete, 54(1):59–73, 1980
work page 1980
-
[14]
Erwin Bolthausen. The berry-esseen theorem for strongly mixing harris recurrent markov chains.Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 60(3):283–289, 1982
work page 1982
-
[15]
Thomas Bonis. Stein’s method for normal approximation in wasserstein distances with application to the multivariate central limit theorem.Probability Theory and Related Fields, 178(3):827–860, 2020
work page 2020
-
[16]
Bagging predictors.Machine learning, 24(2):123–140, 1996
Leo Breiman. Bagging predictors.Machine learning, 24(2):123–140, 1996
work page 1996
-
[17]
Random forests.Machine learning, 45(1):5–32, 2001
Leo Breiman. Random forests.Machine learning, 45(1):5–32, 2001
work page 2001
-
[18]
The berry-esseen theorem for u-statistics.The Annals of Statistics, pages 417–421, 1978
Herman Callaert and Paul Janssen. The berry-esseen theorem for u-statistics.The Annals of Statistics, pages 417–421, 1978
work page 1978
-
[19]
Springer Science & Business Media, 2010
Louis HY Chen, Larry Goldstein, and Qi-Man Shao.Normal approximation by Stein’s method. Springer Science & Business Media, 2010
work page 2010
-
[20]
Gaussian and bootstrap approximations for high-dimensional u-statistics and their applications
Xiaohui Chen. Gaussian and bootstrap approximations for high-dimensional u-statistics and their applications. 2018
work page 2018
-
[21]
Randomized incomplete u-statistics in high dimensions
Xiaohui Chen and Kengo Kato. Randomized incomplete u-statistics in high dimensions. 2019
work page 2019
-
[22]
Victor Chernozhukov, Denis Chetverikov, and Kengo Kato. Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors.The Annals of Statistics, pages 2786–2819, 2013
work page 2013
-
[23]
Renewal theory.Chapman and Hall, 1962
David Roxbee Cox. Renewal theory.Chapman and Hall, 1962
work page 1962
- [24]
-
[25]
Covariance selection.Biometrics, pages 157–175, 1972
Arthur P Dempster. Covariance selection.Biometrics, pages 157–175, 1972
work page 1972
-
[26]
The central limit theorem for m-dependent variables
PH Diananda. The central limit theorem for m-dependent variables. InMathematical Proceedings of the Cambridge Philosophical Society, volume 51, pages 92–95. Cambridge University Press, 1955
work page 1955
-
[27]
Bridging the gap between constant step size stochastic gradient descent and markov chains
Aymeric Dieuleveut, Alain Durmus, and Francis Bach. Bridging the gap between constant step size stochastic gradient descent and markov chains. 2020
work page 2020
-
[28]
Bounds on regeneration times and limit theorems for subgeometric markov chains
Randal Douc, Arnaud Guillin, and Eric Moulines. Bounds on regeneration times and limit theorems for subgeometric markov chains. InAnnales de l’IHP Probabilités et statistiques, volume 44, pages 239–257, 2008
work page 2008
- [29]
-
[30]
Wasserstein-2 bounds in normal approximation under local dependence
Xiao Fang. Wasserstein-2 bounds in normal approximation under local dependence. 2019
work page 2019
-
[31]
From p-wasserstein bounds to moderate deviations.Electronic Journal of Probability, 28:1–52, 2023
Xiao Fang and Yuta Koike. From p-wasserstein bounds to moderate deviations.Electronic Journal of Probability, 28:1–52, 2023
work page 2023
-
[32]
Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts
Gerald B Folland.Real analysis: modern techniques and their applications. Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts. John Wiley & Sons, Nashville, TN, 2 edition, March 1999
work page 1999
-
[33]
Regularity of solutions of the Stein equation and rates in the multivariate central limit theorem
Thomas Gallouët, Guillaume Mijoule, and Yvik Swan. Regularity of solutions of the stein equation and rates in the multivariate central limit theorem.arXiv preprint arXiv:1805.01720, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[34]
Multivariate normal approximations by stein’s method and size bias couplings
Larry Goldstein and Yosef Rinott. Multivariate normal approximations by stein’s method and size bias couplings. Journal of Applied Probability, 33(1):1–17, 1996. , Vol. 1, No. 1, Article . Publication date: January 2026. 22 Yixuan Zhang and Qiaomin Xie
work page 1996
-
[35]
Sheldon Goldstein. Maximal coupling.Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 46(2):193–204, 1979
work page 1979
-
[36]
Approximations for multivariate u-statistics.Journal of multivariate analysis, 22(2):212–229, 1987
Friedrich Götze. Approximations for multivariate u-statistics.Journal of multivariate analysis, 22(2):212–229, 1987
work page 1987
-
[37]
Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs.Advances in neural information processing systems, 30, 2017
work page 2017
-
[38]
Monte carlo sampling methods using markov chains and their applications
W Keith Hastings. Monte carlo sampling methods using markov chains and their applications. 1970
work page 1970
-
[39]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020
work page 2020
-
[40]
A class of statistics with asymptotically normal distribution
Wassily Hoeffding. A class of statistics with asymptotically normal distribution. InBreakthroughs in statistics: Foundations and basic theory, pages 308–334. Springer, 1992
work page 1992
-
[41]
Dongyan Lucy Huo, Yixuan Zhang, Yudong Chen, and Qiaomin Xie. The collusion of memory and nonlinearity in stochastic approximation with constant stepsize.Advances in Neural Information Processing Systems, 37:21699–21762, 2024
work page 2024
-
[42]
IA Ibragimov. On the accuracy of gaussian approximation to the distribution functions of sums of independent variables.Theory of Probability & Its Applications, 11(4):559–579, 1966
work page 1966
-
[43]
Galin L Jones and James P Hobert. Honest exploration of intractable probability distributions via markov chain monte carlo.Statistical Science, pages 312–334, 2001
work page 2001
-
[44]
Semi-Supervised Classification with Graph Convolutional Networks
TN Kipf. Semi-supervised classification with graph convolutional networks.arXiv preprint arXiv:1609.02907, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[45]
Wasserstein distributionally robust optimization: Theory and applications in machine learning
Daniel Kuhn, Peyman Mohajerin Esfahani, Viet Anh Nguyen, and Soroosh Shafieezadeh-Abadeh. Wasserstein distributionally robust optimization: Theory and applications in machine learning. InOperations research & management science in the age of analytics, pages 130–166. Informs, 2019
work page 2019
-
[46]
Michał Lemańczyk. General bernstein-like inequality for additive functionals of markov chains.Journal of Theoretical Probability, 34(3):1426–1454, 2021
work page 2021
-
[47]
Tianle Liu and Morgane Austern. Wasserstein-p bounds in the central limit theorem under local dependence.Electronic Journal of Probability, 28:1–47, 2023
work page 2023
-
[48]
Vsevolod K Malinovskii.Level-crossing problems and inverse Gaussian distributions: closed-form results and approxima- tions. Chapman and Hall/CRC, 2021
work page 2021
-
[49]
Daniel Matthes, Robert J McCann, and Giuseppe Savaré. A family of nonlinear fourth order equations of gradient flow type.Communications in Partial Differential Equations, 34(11):1352–1397, 2009
work page 2009
-
[50]
Lucas Mentch and Giles Hooker. Quantifying uncertainty in random forests via confidence intervals and hypothesis tests.Journal of Machine Learning Research, 17(26):1–41, 2016
work page 2016
-
[51]
Sean P. Meyn and Richard L. Tweedie.Markov Chains and Stochastic Stability. Cambridge Mathematical Library. Cambridge University Press, Cambridge, 2nd edition, 2009
work page 2009
-
[52]
Alfred Müller. Integral probability metrics and their generating classes of functions.Advances in applied probability, 29(2):429–443, 1997
work page 1997
-
[53]
Per Mykland, Luke Tierney, and Bin Yu. Regeneration in markov chain samplers.Journal of the American Statistical Association, 90(429):233–241, 1995
work page 1995
-
[54]
Cambridge University Press, 2012
Ivan Nourdin and Giovanni Peccati.Normal approximations with Malliavin calculus: from Stein’s method to universality, volume 192. Cambridge University Press, 2012
work page 2012
-
[55]
Esa Nummelin. A splitting technique for harris recurrent markov chains.Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 43(4):309–318, 1978
work page 1978
-
[56]
Victor M Panaretos and Yoav Zemel. Statistical aspects of wasserstein distances.Annual review of statistics and its application, 6(1):405–431, 2019
work page 2019
-
[57]
Springer Science & Business Media, 2012
Valentin V Petrov.Sums of independent random variables, volume 82. Springer Science & Business Media, 2012
work page 2012
-
[58]
Iosif Pinelis. Optimum bounds for the distributions of martingales in banach spaces.The Annals of Probability, pages 1679–1706, 1994
work page 1994
-
[59]
Robert T Powers and Erling Størmer. Free states of the canonical anticommutation relations.Communications in Mathematical Physics, 16(1):1–33, 1970
work page 1970
-
[60]
Martin L Puterman.Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014
work page 2014
-
[61]
Martin Raič. A multivariate clt for decomposable random vectors with finite second moments.Journal of Theoretical Probability, 17(3):573–603, 2004
work page 2004
-
[62]
A multivariate central limit theorem for Lipschitz and smooth test functions
Martin Raič. A multivariate central limit theorem for lipschitz and smooth test functions.arXiv preprint arXiv:1812.08268, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[63]
Yosef Rinott. On normal approximation rates for certain sums of dependent random variables.Journal of Computational and Applied Mathematics, 55(2):135–143, 1994
work page 1994
-
[64]
Upper bounds for minimal distances in the central limit theorem
Emmanuel Rio. Upper bounds for minimal distances in the central limit theorem. InAnnales de l’IHP Probabilités et statistiques, volume 45, pages 802–817, 2009. , Vol. 1, No. 1, Article . Publication date: January 2026. Wasserstein-p Central Limit Theorem Rates: From Local Dependence to Markov Chains 23
work page 2009
-
[65]
Bounds on regeneration times and convergence rates for markov chains
Gareth O Roberts and Richard L Tweedie. Bounds on regeneration times and convergence rates for markov chains. Stochastic Processes and their applications, 80(2):211–229, 1999
work page 1999
-
[66]
Joseph P Romano and Michael Wolf. A more general central limit theorem for m-dependent random variables with unbounded m.Statistics & probability letters, 47(2):115–124, 2000
work page 2000
-
[67]
Jeffrey S Rosenthal. Minorization conditions and convergence rates for markov chain monte carlo.Journal of the American Statistical Association, 90(430):558–566, 1995
work page 1995
-
[68]
Sergey Samsonov, Eric Moulines, Qi-Man Shao, Zhuo-Song Zhang, and Alexey Naumov. Gaussian approximation and multiplier bootstrap for polyak-ruppert averaged linear stochastic approximation with applications to td learning. Advances in Neural Information Processing Systems, 37:12408–12460, 2024
work page 2024
-
[69]
R Srikant. Rates of convergence in the central limit theorem for markov chains, with an application to td learning. arXiv preprint arXiv:2401.15719, 2024
-
[70]
Richard S Sutton, Andrew G Barto, et al.Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998
work page 1998
-
[71]
Cédric Villani et al.Optimal transport: old and new, volume 338. Springer, 2008
work page 2008
-
[72]
Weichen Wu, Yuting Wei, and Alessandro Rinaldo. Uncertainty quantification for markov chains with application to temporal difference learning.arXiv preprint arXiv:2502.13822, 2025
-
[73]
Lu Yu, Krishnakumar Balasubramanian, Stanislav Volgushev, and Murat A Erdogdu. An analysis of constant step size sgd in the non-convex regime: Asymptotic normality and bias.Advances in Neural Information Processing Systems, 34:4234–4248, 2021
work page 2021
-
[74]
Yixuan Zhang, Dongyan Huo, Yudong Chen, and Qiaomin Xie. Prelimit coupling and steady-state convergence of constant-stepsize nonsmooth contractive sa.arXiv preprint arXiv:2404.06023, 2024
-
[75]
Yixuan Zhang, Dongyan Huo, Yudong Chen, and Qiaomin Xie. A piecewise lyapunov analysis of sub-quadratic sgd: Applications to robust and quantile regression.ACM SIGMETRICS Performance Evaluation Review, 53(1):85–87, 2025
work page 2025
-
[76]
Constant stepsize q-learning: Distributional convergence, bias and extrapolation
Yixuan Zhang and Qiaomin Xie. Constant stepsize q-learning: Distributional convergence, bias and extrapolation. arXiv preprint arXiv:2401.13884, 2024. A Proof of Theorem 1 In this section we complete the proof of Theorem 1 by providing only the additional details omitted from the proof outline in Sections 3.2 and 3.3. A.1 Proof of Proposition 1 By definit...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.