pith. machine review for the scientific record. sign in

arxiv: 2603.18640 · v3 · submitted 2026-03-19 · 📊 stat.ML · cs.LG· math.PR

Recognition: no theorem link

A Theoretical Comparison of No-U-Turn Sampler Variants: Necessary and Sufficient Convergence Conditions and Mixing Time Analysis under Gaussian Targets

Authors on Pith no claims yet

Pith reviewed 2026-05-15 08:58 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.PR
keywords No-U-Turn SamplerNUTSgeometric ergodicitymixing timeMarkov chain Monte CarloGaussian target
0
0 comments X

The pith

NUTS-mul and NUTS-BPS achieve O(d^{1/4}) mixing times on standard Gaussians with strictly smaller constants for NUTS-BPS.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares the two main variants of the No-U-Turn Sampler, which differ in how they select the next state from a trajectory using either multinomial or biased progressive sampling. It derives necessary conditions for geometric ergodicity that apply to both variants and sufficient conditions that apply to the multinomial version. For the standard Gaussian target, when the sampler begins inside the typical set, the mixing time of each variant scales as O(d^{1/4}) up to logarithmic factors, yet the leading constants remain strictly smaller for the biased progressive version. These results supply the first mixing-time bound for the biased progressive sampler and the first sufficient ergodicity conditions for the multinomial sampler.

Core claim

We derive the first necessary conditions for geometric ergodicity for both NUTS-mul and NUTS-BPS. We establish the first sufficient conditions for geometric ergodicity and ergodicity for NUTS-mul. We obtain the first mixing time result for NUTS-BPS on a standard Gaussian distribution. Our results show that NUTS-mul and NUTS-BPS exhibit nearly identical qualitative behavior, with geometric ergodicity depending on the tail properties of the target distribution. When initialized in the typical set of the canonical Gaussian measure, the mixing times of both variants scale as O(d^{1/4}) up to logarithmic factors, with strictly smaller associated constants for NUTS-BPS.

What carries the argument

The index-selection rule inside each NUTS trajectory, comparing multinomial sampling against biased progressive sampling, which governs how the next state is drawn from the set of points generated by the leapfrog integrator.

If this is right

  • Geometric ergodicity of both variants requires the target to possess sufficiently light tails.
  • Sufficient conditions for geometric ergodicity are now available for the multinomial variant.
  • Both variants deliver mixing times of order O(d^{1/4}) on the standard Gaussian when started in the typical set.
  • The leading constants in these mixing-time bounds are strictly smaller for the biased progressive variant than for the multinomial variant.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The quantitative edge of the biased progressive variant may favor its use in high-dimensional problems where the number of gradient evaluations per effective sample is the dominant cost.
  • Because the qualitative convergence behavior is nearly the same, software implementations can select between the two variants primarily on the basis of coding simplicity or numerical stability rather than fundamental ergodicity differences.
  • Analogous mixing-time statements may hold for targets whose tails match those of the Gaussian, such as certain elliptical distributions.

Load-bearing premise

The target must satisfy specific tail properties or be exactly the standard Gaussian, and the chain must begin inside the typical set, for the mixing-time bounds to apply.

What would settle it

A direct simulation of either variant on a d-dimensional standard Gaussian, started from the typical set, that produces mixing time growing faster than C d^{1/4} log d for any fixed C would falsify the claimed scaling.

Figures

Figures reproduced from arXiv: 2603.18640 by Fares Guehtar, Hadrien Duval-decaix, Kyurae Kim, Pac\^ome Trautmann, Samuel Gruffaz.

Figure 1
Figure 1. Figure 1: Scheme of a dynamic HMC algorithm Definition 1 (From [19]). We define the dynamic HMC scheme associated to an orbit selection kernel Ph and index selection kernel Qh as the Markov chain (Qk)k∈N defined by the following steps that define Qk+1 given Qk: (1) Sample Pk+1 with distribution N(0,Id). (2) Sample Ik+1 with distribution Ph(· | Qk, Pk+1). (3) Sample Jk+1 with distribution Qh(· | Ik+1, Qk, Pk+1). (4) … view at source ↗
Figure 2
Figure 2. Figure 2: Scheme of the construction of the index set sampled with [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Construction of probabilities q BPS h in the example of sampling q0, p0 and Φ ◦(3) h (q0, p0) with q BPS h (·|I − 3, Φ ◦(3) h (q0, p0)). Intuitive understanding and comparison of qMUL h , q BPS h . We aknowledge that (4) is not self-explanatory, thus we consider the ideal case where H(Φ◦(T) h (q0, p0)) = H(q0, p0) for any T ∈ Z, i.e. there is no integration error. In this ideal case, as already explained i… view at source ↗
Figure 4
Figure 4. Figure 4: Lebesgue density related to the limit time distribution [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: For an index set I = [−1 : 6], the multinouilli distribution Multinouilli((qi)i∈I) with normalized weights P i∈I qi = 1 is split into its maximal uniform part on I last, i.e., |I last| mini∈I last qiU(I last) (shown in blue) and the remaning part (1 − |I last| mini∈I last qi)multinouilli((ai)i∈I) with ai = qi − 1I last (i) mini∈I last qi . This principle is applied to qi = q BPS h (i|I, q, p)/ P j∈I q BPS … view at source ↗
Figure 6
Figure 6. Figure 6: Graphs of the functions involved in (18). h : t ∈ [−π, π] → π − |t| π2 g : t ∈ [−π, π] → cot(t) π − |t| π2 t h t g g [PITH_FULL_IMAGE:figures/full_fig_p034_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Graphs of the functions involved in (17). A.6 Proof of Theorem 11 The sketch of the proof is already provided after Theorem 9. The remaining task is to determine how to choose the constants so that (14) and (31) are satisfied, which is given by Lemma 7 and completed by Lemma 2 and Lemma 1. B Proof of Section 5: Necessary conditions for geometric ergodicity. B.1 Mistake in [41, Corollary 2.3] [41, Corollary… view at source ↗
read the original abstract

The No-U-Turn Sampler (NUTS) is the computational workhorse of modern Bayesian software libraries, yet its qualitative and quantitative convergence guarantees were established only recently. A significant gap remains in the theoretical comparison of its two main variants: NUTS-mul and NUTS-BPS, which use multinomial sampling and biased progressive sampling, respectively, for index selection. In this paper, we address this gap in three contributions. First, we derive the first necessary conditions for geometric ergodicity for both variants. Second, we establish the first sufficient conditions for geometric ergodicity and ergodicity for NUTS-mul. Third, we obtain the first mixing time result for NUTS-BPS on a standard Gaussian distribution. Our results show that NUTS-mul and NUTS-BPS exhibit nearly identical qualitative behavior, with geometric ergodicity depending on the tail properties of the target distribution. However, they differ quantitatively in their convergence rates. More precisely, when initialized in the typical set of the canonical Gaussian measure, the mixing times of both NUTS-mul and NUTS-BPS scale as $O(d^{1/4})$ up to logarithmic factors, where $d$ denotes the dimension. Nevertheless, the associated constants are strictly smaller for NUTS-BPS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper derives the first necessary conditions for geometric ergodicity of both NUTS-mul and NUTS-BPS, the first sufficient conditions for geometric ergodicity and ergodicity of NUTS-mul, and the first mixing-time bounds for NUTS-BPS (and a comparison for NUTS-mul) on standard Gaussian targets. It shows that both variants exhibit similar qualitative behavior governed by tail properties of the target, but that, when initialized in the typical set, both achieve mixing times of order O(d^{1/4}) up to logarithmic factors, with strictly smaller constants for NUTS-BPS.

Significance. If the derivations hold, the work supplies the first quantitative mixing-time results for NUTS-BPS together with a direct comparison of the two index-selection mechanisms that are used in all modern NUTS implementations. This fills a notable gap in the theoretical analysis of the most widely deployed MCMC algorithm in Bayesian software.

major comments (1)
  1. [Abstract and mixing-time analysis] The O(d^{1/4}) mixing-time claim (Abstract and the mixing-time section) is explicitly conditioned on initialization inside the typical set of the canonical Gaussian. No quantitative bound is supplied on the time required to reach this set from an arbitrary starting point; if the tree-construction or acceptance mechanism causes this burn-in phase to scale worse than d^{1/4}, the headline rate does not govern practical performance from cold starts.
minor comments (1)
  1. Notation for the two variants (NUTS-mul vs. NUTS-BPS) and for the typical-set radius should be introduced once and used consistently; occasional switches between “multinomial” and “mul” slow reading.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment of our contributions and for highlighting this important point on the scope of the mixing-time results. We address the comment below.

read point-by-point responses
  1. Referee: The O(d^{1/4}) mixing-time claim (Abstract and the mixing-time section) is explicitly conditioned on initialization inside the typical set of the canonical Gaussian. No quantitative bound is supplied on the time required to reach this set from an arbitrary starting point; if the tree-construction or acceptance mechanism causes this burn-in phase to scale worse than d^{1/4}, the headline rate does not govern practical performance from cold starts.

    Authors: We agree that the O(d^{1/4}) mixing-time bounds (up to log factors) are derived under the explicit assumption of initialization inside the typical set, as stated in the abstract and the mixing-time section. This conditioning is standard in high-dimensional MCMC theory because the typical set concentrates the target measure (probability approaching 1 as d grows) and represents the regime in which the sampler operates after a short transient. For the standard Gaussian, the probability of lying outside the typical set decays exponentially in d, so the time to enter it is expected to be negligible compared to the mixing time; however, we do not supply a rigorous quantitative bound on this burn-in phase. We will revise the manuscript to add a clarifying paragraph in the introduction and conclusion that (i) reiterates the conditioning, (ii) explains why the typical-set regime is the relevant one for the headline rate, and (iii) flags a complete cold-start analysis as an interesting direction for future work. This constitutes a partial revision: we strengthen the exposition without claiming a new bound that is absent from the current derivations. revision: partial

Circularity Check

0 steps flagged

No circularity; mixing-time bounds derived from explicit tail assumptions and typical-set initialization

full rationale

The paper's derivation chain establishes necessary conditions for geometric ergodicity from tail properties of the target, sufficient conditions for NUTS-mul, and quantitative mixing-time bounds O(d^{1/4}) for both variants on the standard Gaussian, all conditioned on initialization in the typical set. These steps rely on standard Markov-chain drift and minorization arguments applied to the NUTS transition kernel; no quantity is defined in terms of itself, no fitted parameter is relabeled as a prediction, and no load-bearing premise reduces to a self-citation. The explicit conditioning on the typical set is stated up front rather than smuggled in, so the claimed scaling is a genuine consequence of the stated assumptions rather than a tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The analysis rests on standard domain assumptions from MCMC theory about the regularity and tail behavior of the target distribution, particularly for the Gaussian case.

axioms (1)
  • domain assumption Target distribution has log-density satisfying regularity and tail decay conditions sufficient for geometric ergodicity
    Invoked to derive the necessary and sufficient conditions stated in the abstract.

pith-pipeline@v0.9.0 · 5556 in / 1244 out tokens · 40503 ms · 2026-05-15T08:58:26.182696+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · 1 internal anchor

  1. [1]

    Hamiltonian monte carlo for efficient gaussian sampling: long and random steps.Journal of Machine Learning Research, 25(348):1– 30, 2024

    Simon Apers, Sander Gribling, and Dániel Szilágyi. Hamiltonian monte carlo for efficient gaussian sampling: long and random steps.Journal of Machine Learning Research, 25(348):1– 30, 2024

  2. [2]

    Adaptive stereographic mcmc

    Cameron Bell, Krzystof Łatuszyński, and Gareth O Roberts. Adaptive stereographic mcmc. arXiv preprint arXiv:2408.11780, 2024

  3. [3]

    A Conceptual Introduction to Hamiltonian Monte Carlo

    Michael Betancourt. A conceptual introduction to Hamiltonian Monte Carlo.arXiv preprint arXiv:1701.02434, 2017

  4. [4]

    Pyro: Deep universal probabilistic programming.Journal of machine learning research, 20(28):1–6, 2019

    Eli Bingham, Jonathan P Chen, Martin Jankowiak, Fritz Obermeyer, Neeraj Pradhan, Theo- fanis Karaletsos, Rohit Singh, Paul Szerlip, Paul Horsfall, and Noah D Goodman. Pyro: Deep universal probabilistic programming.Journal of machine learning research, 20(28):1–6, 2019

  5. [5]

    The within-orbit adaptive leapfrog no-u-turn sampler.arXiv preprint arXiv:2506.18746, 2025

    Nawaf Bou-Rabee, Bob Carpenter, Tore Selland Kleppe, and Sifan Liu. The within-orbit adaptive leapfrog no-u-turn sampler.arXiv preprint arXiv:2506.18746, 2025

  6. [6]

    Incorporating local step-size adaptivity into the no-u-turn sampler using gibbs self-tuning.The Journal of Chemical Physics, 163(8), 2025

    Nawaf Bou-Rabee, Bob Carpenter, Tore Selland Kleppe, and Milo Marsden. Incorporating local step-size adaptivity into the no-u-turn sampler using gibbs self-tuning.The Journal of Chemical Physics, 163(8), 2025

  7. [7]

    Gist: Gibbs self-tuning for locally adaptive hamiltonian monte carlo.arXiv preprint arXiv:2404.15253, 2024

    Nawaf Bou-Rabee, Bob Carpenter, and Milo Marsden. Gist: Gibbs self-tuning for locally adaptive hamiltonian monte carlo.arXiv preprint arXiv:2404.15253, 2024

  8. [8]

    Mixing time guarantees for unadjusted hamiltonian monte carlo.Bernoulli, 29(1):75–104, 2023

    Nawaf Bou-Rabee and Andreas Eberle. Mixing time guarantees for unadjusted hamiltonian monte carlo.Bernoulli, 29(1):75–104, 2023

  9. [9]

    Coupling and convergence for hamiltonian monte carlo.The Annals of applied probability, 30(3):1209–1250, 2020

    Nawaf Bou-Rabee, Andreas Eberle, and Raphael Zimmer. Coupling and convergence for hamiltonian monte carlo.The Annals of applied probability, 30(3):1209–1250, 2020. 24

  10. [10]

    Mixing of the no-u-turn sampler and the geometry of gaussian concentration.arXiv preprint arXiv:2410.06978, 2024

    Nawaf Bou-Rabee and Stefan Oberdörster. Mixing of the no-u-turn sampler and the geometry of gaussian concentration.arXiv preprint arXiv:2410.06978, 2024

  11. [11]

    Randomized Hamiltonian Monte Carlo.The Annals of Applied Probability, 27(4):2159–2194, 2017

    Nawaf Bou-Rabee and Jesús María Sanz-Serna. Randomized Hamiltonian Monte Carlo.The Annals of Applied Probability, 27(4):2159–2194, 2017

  12. [12]

    Geometric integrators and the Hamiltonian Monte Carlo method.Acta Numerica, 27:113–206, 2018

    Nawaf Bou-Rabee and Jesús María Sanz-Serna. Geometric integrators and the Hamiltonian Monte Carlo method.Acta Numerica, 27:113–206, 2018

  13. [13]

    Stan: A probabilistic programming language.Journal of statistical software, 76(1), 2017

    Bob Carpenter, Andrew Gelman, Matthew D Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. Stan: A probabilistic programming language.Journal of statistical software, 76(1), 2017

  14. [14]

    Fast mixing of metropolized hamiltonian monte carlo: Benefits of multi-step gradients.Journal of Machine Learning Research, 21(92):1–72, 2020

    Yuansi Chen, Raaz Dwivedi, Martin J Wainwright, and Bin Yu. Fast mixing of metropolized hamiltonian monte carlo: Benefits of multi-step gradients.Journal of Machine Learning Research, 21(92):1–72, 2020

  15. [15]

    Probabilistic magnetotelluric inversion with adaptive regularisation using the no-u-turns sam- pler.Pure and Applied Geophysics, 175(8):2881–2894, 2018

    Dennis Conway, Janelle Simpson, Yohannes Didana, Joseph Rugari, and Graham Heinson. Probabilistic magnetotelluric inversion with adaptive regularisation using the no-u-turns sam- pler.Pure and Applied Geophysics, 175(8):2881–2894, 2018

  16. [16]

    PhD thesis, University of Deusto

    Miguel Fernández de Retana Uribe.A Bayesian Approach to Modeling Tamoxifen Resistance in Breast Cancer Cells Through Adaptive Hamiltonian Monte Carlo Posterior Sampling. PhD thesis, University of Deusto

  17. [17]

    Springer, 2018

    Randal Douc, Eric Moulines, Pierre Priouret, and Philippe Soulier.Markov chains. Springer, 2018

  18. [18]

    Hybrid Monte Carlo.Physics letters B, 195(2):216–222, 1987

    Simon Duane, Anthony D Kennedy, Brian J Pendleton, and Duncan Roweth. Hybrid Monte Carlo.Physics letters B, 195(2):216–222, 1987

  19. [19]

    On the convergence of dynamic implementations of hamiltonian monte carlo and no u-turn samplers

    Alain Durmus, Samuel Gruffaz, Miika Kailas, Eero Saksman, and Matti Vihola. On the convergence of dynamic implementations of hamiltonian monte carlo and no u-turn samplers. arXiv preprint arXiv:2307.03460, 2023

  20. [20]

    On the convergence of Hamiltonian Monte Carlo.The Annals of Statistics, April 2017

    Alain Durmus, Eric Moulines, and Eero Saksman. On the convergence of Hamiltonian Monte Carlo.The Annals of Statistics, April 2017

  21. [21]

    Turing: a language for flexible probabilistic inference

    Hong Ge, Kai Xu, and Zoubin Ghahramani. Turing: a language for flexible probabilistic inference. InInternational Conference on Artificial Intelligence and Statistics, AISTATS 2018, 9-11 April 2018, Playa Blanca, Lanzarote, Canary Islands, Spain, pages 1682–1690, 2018

  22. [22]

    Riemann manifold langevin and Hamiltonian Monte Carlo methods.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(2):123–214, 2011

    Mark Girolami and Ben Calderhead. Riemann manifold langevin and Hamiltonian Monte Carlo methods.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(2):123–214, 2011

  23. [23]

    Riemannian metric learning: Closer to you than you imagine

    Samuel Gruffaz and Josua Sassen. Riemannian metric learning: Closer to you than you imagine. 2025

  24. [24]

    Monte carlo sampling methods using markov chains and their applications

    W Keith Hastings. Monte carlo sampling methods using markov chains and their applications. 1970

  25. [25]

    The No-U-Turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo.J

    Matthew D Hoffman and Andrew Gelman. The No-U-Turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo.J. Mach. Learn. Res., 15(1):1593–1623, 2014. 25

  26. [26]

    Solving poisson’s equation for wasserstein contractive markov chains.arXiv preprint arXiv:2602.19119, 2026

    Julian Hofstadler. Solving poisson’s equation for wasserstein contractive markov chains.arXiv preprint arXiv:2602.19119, 2026

  27. [27]

    Necessary conditions for geometric and polynomial ergodicity of random-walk-type.Bernoulli, 9(4):559–578, 2003

    Søren F Jarner and Richard L Tweedie. Necessary conditions for geometric and polynomial ergodicity of random-walk-type.Bernoulli, 9(4):559–578, 2003

  28. [28]

    Lower bounds on metropolized sampling meth- ods for well-conditioned distributions.Advances in Neural Information Processing Systems, 34:18812–18824, 2021

    Yin Tat Lee, Ruoqi Shen, and Kevin Tian. Lower bounds on metropolized sampling meth- ods for well-conditioned distributions.Advances in Neural Information Processing Systems, 34:18812–18824, 2021

  29. [29]

    Springer, 2001

    Jun S Liu and Jun S Liu.Monte Carlo strategies in scientific computing, volume 75. Springer, 2001

  30. [30]

    The barker proposal: Combining robustness and efficiency in gradient-based mcmc.Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(2):496–523, 2022

    Samuel Livingstone and Giacomo Zanella. The barker proposal: Combining robustness and efficiency in gradient-based mcmc.Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(2):496–523, 2022

  31. [31]

    Mixing of hamiltonian monte carlo on strongly log-concave distributions 2: Numerical integrators

    Oren Mangoubi and Aaron Smith. Mixing of hamiltonian monte carlo on strongly log-concave distributions 2: Numerical integrators. InThe 22nd international conference on artificial intelligence and statistics, pages 586–595. PMLR, 2019

  32. [32]

    Mixing of hamiltonian monte carlo on strongly log-concave distributions: Continuous dynamics.The Annals of Applied Probability, 31(5):2019–2045, 2021

    Oren Mangoubi and Aaron Smith. Mixing of hamiltonian monte carlo on strongly log-concave distributions: Continuous dynamics.The Annals of Applied Probability, 31(5):2019–2045, 2021

  33. [33]

    Equation of state calculations by fast computing machines.The journal of chemical physics, 21(6):1087–1092, 1953

    Nicholas Metropolis, Arianna W Rosenbluth, Marshall N Rosenbluth, Augusta H Teller, and Edward Teller. Equation of state calculations by fast computing machines.The journal of chemical physics, 21(6):1087–1092, 1953

  34. [34]

    Springer Science & Business Media, 2012

    Sean P Meyn and Richard L Tweedie.Markov chains and stochastic stability. Springer Science & Business Media, 2012

  35. [35]

    Bayesian learning via stochastic dynamics.Advances in neural information processing systems, 5, 1992

    Radford Neal. Bayesian learning via stochastic dynamics.Advances in neural information processing systems, 5, 1992

  36. [36]

    An improved acceptance procedure for the hybrid monte carlo algorithm

    Radford M Neal. An improved acceptance procedure for the hybrid monte carlo algorithm. Journal of Computational Physics, 111(1):194–203, 1994

  37. [37]

    Radford M. Neal. MCMC using Hamiltonian Dynamics. In Steve Brooks, Andrew Gelman, Galin Jones, and Xiao-Li Meng, editors,Handbook of Markov chain Monte Carlo. Chapman & Hall/CRC PRess, 2011

  38. [38]

    On accelerated mixing of the no-u-turn sampler.arXiv preprint arXiv:2507.13259, 2025

    Stefan Oberdörster. On accelerated mixing of the no-u-turn sampler.arXiv preprint arXiv:2507.13259, 2025

  39. [39]

    Geometric convergence and central limit theorems for multidimensional hastings and metropolis algorithms.Biometrika, 83(1):95–110, 1996

    Gareth O Roberts and Richard L Tweedie. Geometric convergence and central limit theorems for multidimensional hastings and metropolis algorithms.Biometrika, 83(1):95–110, 1996

  40. [40]

    Wiecki, and Christopher Fonnesbeck

    John Salvatier, Thomas V. Wiecki, and Christopher Fonnesbeck. Probabilistic programming in Python using PyMC3.PeerJ Computer Science, 2:e55, apr 2016

  41. [41]

    On the geometric ergodicity of Hamiltonian Monte Carlo.Bernoulli, 25(4A):3109–3138, 2019

    Simon Byrne Samuel Livingstone, Michael Betancourt and Mark Girolami. On the geometric ergodicity of Hamiltonian Monte Carlo.Bernoulli, 25(4A):3109–3138, 2019

  42. [42]

    high enough mass

    Christof Seiler, Simon Rubinstein-Salzedo, and Susan Holmes. Positive curvature and hamil- tonian monte carlo.Advances in Neural Information Processing Systems, 27, 2014. 26 A Proof of Section 4: Mixing time. A.1 Proof of Proposition 10 Proof.(15) (16) are respecitively given by [10, Lemma 7] and [10, Lemma 8]. Note that the authors assume thatπ=N(0, I d)...

  43. [43]

    max( √ d, α0)Eb−1 log(2ϵ−1)3/2 ≤d Proof.Usingr= √ d log(ϵ−132) + 8 log(H) 1/2 = √ d log(ϵ−132) + 8 log(Eb−1 log(2ϵ−1)) 1/2 , we have EX0∼µ PQk∼Kk h(X0,·), k≥0(min{k≥0 :Q k /∈DαH } ≤H) =E X0∼µ 1Dα0 (X0)PQk∼Kk h(X0,·), k≥0(min{k≥0 :Q k /∈DαH } ≤H) 28 +E X0∼µ 1Dcα0 (X0)PQk∼Kk h(X0,·), k≥0(min{k≥0 :Q k /∈DαH } ≤H) ≤4Hexp(− r2 8d) + ϵ 8 ≤ ϵ 8 + ϵ 8 = ϵ 4 Using...

  44. [44]

    max( √ d, α0)Eb−1 log(2ϵ−1)3/2 In the following, we setD=D α withα= (1 + √

  45. [45]

    max( √ d, α0)Eb−1 log(2ϵ−1)3/2 in order to verify the exit time condition. First note that in the case of NUTS-mul or NUTS-BPS, by using that for anyα≤d, we have√ d≤diam(D α)≤2 √ 2d,A2 is verified as long as 2E 4 exp(−r2 8d) +β2∆ +C reg2 √ 2dexp(−ρ(E−1)) +c≤1−b ,(31) 2∆ = h2 max(α, r) + h4d 4 whereβ= 1for NUTS-mul andβ= 2for NUTS-BPS. First note that to c...

  46. [46]

    max( √ d, α0) 1/2 The functionF ∗ :η∈(0,1/2)→(1−3η−ϵ/4) 3/2η1/2 is maximized withη ∗ = (1−ϵ/4)/12and F ∗(η∗)≥0.186whenϵ≤0.01. Proof.We useE=C ′ log(d)and r= √ d log(ϵ−132) + 8 log(Eb−1 log(2ϵ−1)) 1/2 , ϵ −1 ≤exp( b √ d (C ′ log(d)(1 + √ 2)) !2/3 )/2 such that 8Eexp(− r2 8d)≤ϵ/4blog(ϵ −1)−1 ≤ϵ/4,max( √ d, r)≤α= (1+ √

  47. [47]

    max( √ d, α0)Eb−1 log(2ϵ−1)3/2 ≤d Foranyα≤d, wehavediam(D α)≤2 √ d,C ′ isselectedsuchthatC reg2 √ 2dexp(−ρ(E−1))≤η, i.e., C ′ = 1 2ρ + log(ηCreg2 exp(ρ) √ 2)/log(d) 30 Choosingh≤ ¯h=c ′ max(α, √ d)−1/2 log(d)−1/2 withc ′ <1such that, 2βE(h2 max(α, r) + h4d 4 )≤2βC ′((c′)2 + (c′)4 log(d)−1)≤4βC ′(c′)2 , and choosingc ′ such that4C ′(c′)2 ≤η, i.e.c ′ = (η/4...

  48. [48]

    Hence, there existsR ′′ >0such that for any(q, p)∈B(q 0, R′′)×B(p 0, R′′), M(q, p)≥ 1 2 M(q0, p0)>0

    Finally,(q, p)7→M(q, p) is continuous. Hence, there existsR ′′ >0such that for any(q, p)∈B(q 0, R′′)×B(p 0, R′′), M(q, p)≥ 1 2 M(q0, p0)>0. LetR= min(R ′, R′′), then for any(q, p)∈B(q 0, R)×B(p 0, R), ˜KMUL h ((q, p),E)≥ 1 4 M(q0, p0)>0 Therefore, for anyq∈B(q 0, R),K MUL(q,E)≥M(q 0, p0) R B(p0,R) ρ0(p) dp/4. Proposition 22.AssumeH1, for anyR >0,B(0, R)is...

  49. [49]

    Applying the expression of the leapfrog scheme yields q−1 =q ′ 1 , p −1 =−p ′ −1 , and then H(q −1, p−1) =H(q ′ 1,−p ′

  50. [50]

    =H(q ′ 1, p′ 1). For anyR >0, ifq 0, p0 satisfy that|q 0| ≥Rand|p 0| ≤ |q 0|γ, then it is the same forq0,−p 0 and applying again the result at the end of the proof of [20, Proposition 7 or 8], we have (26). D.3 Proof of Theorem 19 Proof.Noticing thatH3 andH4 impliesH1 separately, the ergodicity ofK MUL is given by Theo- rem 17 and it remains to show the F...