pith. sign in

arxiv: 2110.12907 · v2 · submitted 2021-10-21 · 📊 stat.ML · cs.LG· math.PR· math.ST· stat.TH

Hamiltonian Monte Carlo with Asymmetrical Momentum Distributions

Pith reviewed 2026-05-24 13:18 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.PRmath.STstat.TH
keywords Hamiltonian Monte Carloasymmetrical momentum distributionsgeometric convergenceWasserstein distanceAlternating Direction HMCleapfrog integrator
0
0 comments X

The pith

Alternating Direction HMC restores self-adjointness to allow geometric convergence with asymmetrical momentum distributions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that standard HMC loses a required self-adjointness property when momentum variables follow asymmetrical distributions. It introduces Alternating Direction HMC, which alternates the direction of the Hamiltonian dynamics to recover that property. Under sufficient conditions the modified algorithm is shown to converge geometrically in Wasserstein distance. The same guarantee is proved for the version that replaces exact Hamiltonian flow with leapfrog steps plus a Metropolis-Hastings correction. Experiments indicate that the approach can extend existing dynamic auxiliary schemes and improve sampling relative to Gaussian-momentum HMC.

Core claim

AD-HMC exhibits geometric convergence in Wasserstein distance when auxiliary momentum distributions are allowed to be asymmetrical, provided the alternating-direction modification is used to restore self-adjointness; the result extends to leapfrog integrators equipped with an additional Metropolis-Hastings rejection step.

What carries the argument

Alternating Direction HMC (AD-HMC), which alternates the sign of the momentum updates to restore the self-adjointness property required for the dynamical and probabilistic convergence arguments.

If this is right

  • Geometric convergence in Wasserstein distance holds for HMC with any auxiliary momentum distribution once the alternating-direction fix is applied.
  • The leapfrog integrator version of AD-HMC also converges geometrically after the extra Metropolis-Hastings correction.
  • AD-HMC supplies a rigorous justification for extending popular dynamic auxiliary schemes beyond symmetric Gaussians.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The framework could be applied to targets whose geometry favors heavy-tailed or skewed auxiliary momenta.
  • Testing the method on high-dimensional multimodal posteriors would reveal whether the weaker conditions translate into practical speed-ups.
  • Similar alternating-direction corrections might be examined in other MCMC algorithms that rely on reversible dynamics.

Load-bearing premise

The alternating-direction modification restores the self-adjointness property that asymmetrical momentum distributions otherwise break.

What would settle it

A concrete target distribution and asymmetrical momentum law satisfying the paper's sufficient conditions for which the Wasserstein distance of the AD-HMC chain fails to contract geometrically.

Figures

Figures reproduced from arXiv: 2110.12907 by Soumyadip Ghosh, Tomasz Nowicki, Yingdong Lu.

Figure 1
Figure 1. Figure 1: Performance of three HMC (‘N(I)’, ‘N(.15I)’ and ‘adapt-single’) and two AD-HMC ( ‘adapt-many’ and ‘simple-target’) methods: (top) W2 distance between iterate and target distribution, and (bottom) average CPU seconds, both over iteration counts. ‘adapt-single’ case adopts the mean µc and covari￾ance estimates Σc of the cluster with most skewed Σc as the replacement for the parameters of its Gaussian auxilia… view at source ↗
Figure 2
Figure 2. Figure 2: Performance of the HMC (‘N(I)’ and ‘adapt-single’) and AD-HMC ( ‘adapt-many’) set￾tings: Kuhlback-Leibler distance between iterates and the target posterior distribution over iteration counts [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Performance of the three HMC (‘N(I)’, ‘N(.15I)’ and ‘adapt-single’) and two AD-HMC ( ‘adapt-many’ and ‘simple-target’) settings of the W2 distance between the iterate and the target distribution over (top) iteration counts and (bottom) average CPU wallclock times in seconds. Each method uses the Leapfrog integrator to implement Hamiltonian motion with  = 0.05, L = 100. the ‘N(I)’ case continues to provide… view at source ↗
Figure 4
Figure 4. Figure 4: The two R 3 target distributions are displayed using samples. Left: The target distribution used in Figs. 1, 3 and, 6 centers twelve Gaussians on three linearly independent directions, as indicated by the masses with the three colors. In addition, the (uncorrelated) Gaussians have a range of variances from [0.15, 1.0]. Right: The target distribution used in Figs. 5 and 7 centers seven (uncorrelated) Gaussi… view at source ↗
Figure 5
Figure 5. Figure 5: Performance of the three HMC (‘N(I)’, ‘N(.15I)’ and ‘adapt-single’) and two AD-HMC ( ‘adapt-many’ and ‘simple-target’) settings of the W2 distance between the iterate and the target distribution over (top) iteration counts and (bottom) average CPU wallclock times in seconds. Each method uses the Leapfrog integrator to implement Hamiltonian motion with  = 0.05, L = 100 [PITH_FULL_IMAGE:figures/full_fig_p0… view at source ↗
Figure 6
Figure 6. Figure 6: Performance of the ( ‘adapt-many’ ) heuristic using forward-motion-only HMC and AD-HMC for the target in f as Figs. 1 and 3: W2 distance between the iterate and the target distribution over (top) iteration counts and (bottom) average CPU wallclock times in seconds. Each method uses Leapfrog integrator to implement Hamiltonian motion with (, L) = (0.025, 100) [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Performance of the ( ‘adapt-many’ ) heuristic using forward-motion-only HMC and AD-HMC for the same target as [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗
read the original abstract

Existing rigorous convergence guarantees for the Hamiltonian Monte Carlo (HMC) algorithm use Gaussian auxiliary momentum variables, which are crucially symmetrically distributed. We present a novel convergence analysis for HMC utilizing new dynamical and probabilistic arguments. The convergence is rigorously established under significantly weaker conditions, which among others allow for general auxiliary distributions. In our framework, we show that plain HMC with asymmetrical momentum distributions breaks a key self-adjointness requirement. We propose a modified version of HMC, that we call the Alternating Direction HMC (AD-HMC), which overcomes this difficulty. Sufficient conditions are established under which AD-HMC exhibits geometric convergence in Wasserstein distance. The geometric convergence analysis is extended to when the Hamiltonian motion is approximated by the leapfrog symplectic integrator, where an additional Metropolis-Hastings rejection step is required. Numerical experiments suggest that AD-HMC can generalize a popular dynamic auxiliary scheme to show improved performance over HMC with Gaussian auxiliaries.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper claims that standard HMC with asymmetrical momentum distributions violates self-adjointness, proposes an Alternating Direction HMC (AD-HMC) modification to restore it, and establishes sufficient conditions for geometric convergence in Wasserstein distance under weaker assumptions than Gaussian momenta. The analysis is extended to the leapfrog integrator plus Metropolis-Hastings correction, with numerical experiments suggesting performance gains over Gaussian-auxiliary HMC.

Significance. If the dynamical and probabilistic arguments hold, the result would meaningfully relax a long-standing restriction in HMC theory, permitting more flexible auxiliary distributions while retaining geometric ergodicity guarantees. The Wasserstein-distance analysis and its leapfrog extension are standard tools but applied here to a broader setting.

major comments (2)
  1. [§3] The central argument rests on the claim that the alternating-direction modification restores self-adjointness (abstract and §3); an explicit verification that the modified transition kernel satisfies the required adjoint relation with respect to the target measure is needed, as this step enables all subsequent dynamical arguments.
  2. [Theorem 4.1 (or equivalent)] Theorem establishing geometric Wasserstein convergence (likely §4) states sufficient conditions that are weaker than Gaussian; these conditions must be stated completely and checked against the leapfrog + MH extension, because any hidden dependence on symmetry would undermine the weaker-conditions claim.
minor comments (3)
  1. [Abstract] The abstract asserts 'significantly weaker conditions' without enumerating them; a short explicit list in the abstract or introduction would improve readability.
  2. [§2–3] Notation for the alternating-direction operator and the resulting kernel should be introduced once and used consistently; current usage risks confusion between the continuous and discrete cases.
  3. [Numerical experiments] Numerical experiments would benefit from reporting effective sample size per gradient evaluation alongside raw mixing times to allow direct comparison with standard HMC.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below.

read point-by-point responses
  1. Referee: [§3] The central argument rests on the claim that the alternating-direction modification restores self-adjointness (abstract and §3); an explicit verification that the modified transition kernel satisfies the required adjoint relation with respect to the target measure is needed, as this step enables all subsequent dynamical arguments.

    Authors: We agree that an explicit verification of the adjoint relation would strengthen the exposition. In the revised manuscript we will insert a direct calculation in §3 showing that the AD-HMC transition kernel P satisfies ∫ f(x) P(x, dy) π(dx) = ∫ f(y) P(y, dx) π(dy) with respect to the target π, where the alternation of momentum directions restores the required reversibility. revision: yes

  2. Referee: [Theorem 4.1 (or equivalent)] Theorem establishing geometric Wasserstein convergence (likely §4) states sufficient conditions that are weaker than Gaussian; these conditions must be stated completely and checked against the leapfrog + MH extension, because any hidden dependence on symmetry would undermine the weaker-conditions claim.

    Authors: Theorem 4.1 states the sufficient conditions in full; they depend only on the properties restored by direction alternation and do not invoke additional symmetry of the auxiliary distribution. The leapfrog-plus-MH extension in §5 applies precisely the same conditions, with the Metropolis-Hastings correction preserving invariance. We will add an explicit remark after Theorem 4.1 confirming that the stated conditions carry over to the leapfrog case without hidden symmetry assumptions. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's derivation identifies the self-adjointness failure in plain HMC with asymmetric momenta, defines the AD-HMC modification to restore it, and then applies standard dynamical and probabilistic arguments to establish geometric Wasserstein convergence under stated sufficient conditions (including the leapfrog + MH extension). No load-bearing step reduces by construction to a fitted parameter, self-definition, or self-citation chain; the convergence result is presented as following from independent analysis rather than renaming or smuggling prior results. The analysis is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based on the abstract, the paper introduces no free parameters, new entities, or ad-hoc axioms; it relies on standard mathematical background for Hamiltonian dynamics and Wasserstein metrics.

axioms (1)
  • standard math Standard properties of Hamiltonian dynamics and Wasserstein distance metrics hold for the convergence analysis.
    Invoked to establish geometric convergence rates.

pith-pipeline@v0.9.0 · 5699 in / 1167 out tokens · 24157 ms · 2026-05-24T13:18:51.581584+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

  1. [1]

    Andrieu, N

    C. Andrieu, N. D. Freitas, A. Doucet, and M. I. Jordan. An introduction to mcmc for machine learning. machine learning. Machine Learning, 50: 0 5--–43, 2003

  2. [2]

    Ankerst, M

    M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander. Optics: Ordering points to identify the clustering structure. SIGMOD Rec., 28 0 (2): 0 49–60, June 1999. ISSN 0163-5808. doi:10.1145/304181.304187. URL https://doi.org/10.1145/304181.304187

  3. [3]

    Betancourt, S

    M. Betancourt, S. Byrne, S. Livingstone, and M. Girolami. The geometric foundations of hamiltonian monte carlo. Bernoulli, 23 0 (4A): 0 2257--2298, 11 2017. doi:10.3150/16-BEJ810. URL https://doi.org/10.3150/16-BEJ810

  4. [4]

    Bogachev

    V. Bogachev. Measure Theory. Number v. 1 in Measure Theory. Springer Berlin Heidelberg, 2007. ISBN 9783540345145. URL https://books.google.com/books?id=CoSIe7h5mTsC

  5. [5]

    Bou-Rabee and J

    N. Bou-Rabee and J. M. Sanz-Serna. Randomized hamiltonian monte carlo. Ann. Appl. Probab., 27 0 (4): 0 2159--2194, 08 2017. doi:10.1214/16-AAP1255. URL https://doi.org/10.1214/16-AAP1255

  6. [6]

    Bou-Rabee, A

    N. Bou-Rabee, A. Eberle, and R. Zimmer. Coupling and convergence for hamiltonian monte carlo. Ann. Appl. Probab., 30 0 (3): 0 1209--1250, 06 2020. doi:10.1214/19-AAP1528. URL https://doi.org/10.1214/19-AAP1528

  7. [7]

    Journal of Statistical Software , author =

    B. Carpenter, A. Gelman, M. Hoffman, D. Lee, B. Goodrich, M. Betancourt, M. Brubaker, J. Guo, P. Li, and A. Riddell. Stan: A probabilistic programming language. Journal of Statistical Software, Articles, 76 0 (1): 0 1--32, 2017. ISSN 1548-7660. doi:10.18637/jss.v076.i01. URL https://www.jstatsoft.org/v076/i01

  8. [8]

    M.-F. Chen. Equivalence of exponential ergodicity and l2-exponential convergence for markov chains. Stochastic Processes and their Applications, 87 0 (2): 0 281 -- 297, 2000. ISSN 0304-4149. doi:https://doi.org/10.1016/S0304-4149(99)00114-3. URL http://www.sciencedirect.com/science/article/pii/S0304414999001143

  9. [9]

    Chen and S

    Z. Chen and S. S. Vempala. Optimal convergence rate of hamiltonian monte carlo for strongly logconcave distributions. RANDOM, 2019

  10. [10]

    Duane, A

    S. Duane, A. Kennedy, B. J. Pendleton, and D. Roweth. Hybrid monte carlo. Physics Letters B, 195 0 (2): 0 216 -- 222, 1987. ISSN 0370-2693. doi:https://doi.org/10.1016/0370-2693(87)91197-X. URL http://www.sciencedirect.com/science/article/pii/037026938791197X

  11. [11]

    Durmus and \'E

    A. Durmus and \'E . Moulines. Quantitative bounds of convergence for geometrically ergodic markov chain in the wasserstein distance with application to the metropolis adjusted langevin algorithm. Statistics and Computing, 25 0 (1): 0 5--19, 2015. doi:10.1007/s11222-014-9511-z. URL https://doi.org/10.1007/s11222-014-9511-z

  12. [12]

    Durmus, \'E

    A. Durmus, \'E . Moulines, and E. Saksman. On the convergence of hamiltonian monte carlo. arXiv: Computation, 2017

  13. [13]

    Feydy, T

    J. Feydy, T. S \'e journ \'e , F.-X. Vialard, S.-i. Amari, A. Trouve, and G. Peyr \'e . Interpolating between optimal transport and mmd using sinkhorn divergences. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 2681--2690, 2019

  14. [14]

    A. E. Gelfand and A. F. M. Smith. Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85 0 (410): 0 398--409, 1990. doi:10.1080/01621459.1990.10476213

  15. [15]

    Gelman, J

    A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin. Bayesian Data Analysis. Chapman and Hall/CRC, 2013

  16. [16]

    Girolami and B

    M. Girolami and B. Calderhead. Riemann manifold langevin and hamiltonian monte carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73 0 (2): 0 123--214, 2011. doi:10.1111/j.1467-9868.2010.00765.x. URL https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-9868.2010.00765.x

  17. [17]

    Hairer, C

    E. Hairer, C. Lubich, and G. Wanner. Geometric Numerical Integration: Structure-Preserving Algorithms for Ordinary Differential Equations. Springer Series in Computational Mathematics. Springer Berlin Heidelberg, 2013. ISBN 9783662050187. URL https://books.google.com/books?id=cPTxCAAAQBAJ

  18. [18]

    Hairer, J

    M. Hairer, J. C. Mattingly, and M. Scheutzow. Asymptotic coupling and a general form of harris'theorem with applications to stochastic delay equations. Probability Theory and Related Fields, 149 0 (1): 0 223--259, 2011. doi:10.1007/s00440-009-0250-6. URL https://doi.org/10.1007/s00440-009-0250-6

  19. [19]

    W. K. Hastings. Monte carlo sampling methods using markov chains and their applications. Biometrika, 57 0 (1): 0 97--109, 1970. ISSN 00063444. URL http://www.jstor.org/stable/2334940

  20. [20]

    N. M. Huq and J. Cleland. Bangladesh fertility survey 1989 (main report). Technical report, Dhaka: National Institute of Population Research and Training, 1990

  21. [21]

    J., Reeves J

    J. Jasche and F.-S. Kitaura. Fast hamiltonian sampling for large‐scale structure inference. Monthly Notices of the Royal Astronomical Society, 407: 0 29 -- 42, 09 2010. doi:10.1111/j.1365-2966.2010.16897.x

  22. [22]

    Joulin and Y

    A. Joulin and Y. Ollivier. Curvature, concentration and error estimates for markov chain monte carlo. Ann. Probab., 38 0 (6): 0 2418--2442, 11 2010. doi:10.1214/10-AOP541. URL https://doi.org/10.1214/10-AOP541

  23. [23]

    Leimkuhler and S

    B. Leimkuhler and S. Reich. Simulating Hamiltonian Dynamics. Cambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, 2004. ISBN 9780521772907. URL https://books.google.com/books?id=tpb-tnsZi5YC

  24. [24]

    Lelievre, M

    T. Lelievre, M. Rousset, and G. Stoltz. Langevin dynamics with constraints and computation of free energy differences, 2010

  25. [25]

    Livingstone, M

    S. Livingstone, M. Betancourt, S. Byrne, and M. Girolami. On the geometric ergodicity of hamiltonian monte carlo. Bernoulli, 25 0 (4A): 0 3109--3138, 11 2019. doi:10.3150/18-BEJ1083. URL https://doi.org/10.3150/18-BEJ1083

  26. [26]

    Mangoubi and A

    O. Mangoubi and A. Smith. Rapid mixing of hamiltonian monte carlo on strongly log-concave distributions. Proceedings of Machine Learning Research, 89, 2019

  27. [27]

    P. A. Markowich and C. Villani. On the trend to equilibrium for the fokker-planck equation: An interplay between physics and functional analysis. Matematica Contemporanea (SBM), 19: 0 1--31, 2000

  28. [28]

    S. Meyn, R. Tweedie, and P. Glynn. Markov Chains and Stochastic Stability. Cambridge Mathematical Library. Cambridge University Press, 2009. ISBN 9780521731829. URL https://books.google.com/books?id=Md7RnYEPkJwC

  29. [29]

    R. M. Neal. Bayesian learning via stochastic dynamics. In S. J. Hanson, J. D. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems 5, pages 475--482. Morgan-Kaufmann, 1993

  30. [30]

    Ollivier

    Y. Ollivier. Ricci curvature of markov chains on metric spaces. Journal of Functional Analysis, 256 0 (3): 0 810 -- 864, 2009. ISSN 0022-1236. doi:https://doi.org/10.1016/j.jfa.2008.11.001. URL http://www.sciencedirect.com/science/article/pii/S002212360800493X

  31. [31]

    Pedregosa, G

    F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in P ython. Journal of Machine Learning Research, 12: 0 2825--2830, 2011

  32. [32]

    Robert and G

    C. Robert and G. Casella. Monte Carlo Statistical Methods. Springer, 2004

  33. [33]

    G. O. Roberts and J. S. Rosenthal. Small and pseudo-small sets for markov chains. Stochastic Models, 17 0 (2): 0 121--145, 2001. doi:10.1081/STM-100002060. URL https://doi.org/10.1081/STM-100002060

  34. [34]

    G. O. Roberts and J. S. Rosenthal. General state space markov chains and mcmc algorithms. Probab. Surveys, 1: 0 20--71, 2004. doi:10.1214/154957804100000024. URL https://doi.org/10.1214/154957804100000024

  35. [35]

    Rosenthal

    J. Rosenthal. Quantitative convergence rates of markov chains: A simple account. Electron. Commun. Probab., 7: 0 123--128, 2002. doi:10.1214/ECP.v7-1054. URL https://doi.org/10.1214/ECP.v7-1054

  36. [36]

    A. M. Stuart. Inverse problems: A bayesian perspective. Acta Numerica, 19: 0 451–559, 2010. doi:10.1017/S0962492910000061

  37. [37]

    Talagrand

    M. Talagrand. Transportation cost for gaussian and other product measures. Geometric & Functional Analysis GAFA , 6 0 (3): 0 587--600, May 1996. ISSN 1420-8970. doi:10.1007/BF02249265. URL https://doi.org/10.1007/BF02249265

  38. [38]

    S. D. Team. Stan modeling language users guide and reference manual, 2017. URL https://mc-stan.org/

  39. [39]

    experiments

    L. Verlet. Computer "experiments" on classical fluids. i. thermodynamical properties of lennard-jones molecules. Phys. Rev., 159: 0 98--103, Jul 1967. doi:10.1103/PhysRev.159.98. URL https://link.aps.org/doi/10.1103/PhysRev.159.98

  40. [40]

    C. Villani. Optimal Transport: Old and New. Grundlehren der mathematischen Wissenschaften. Springer Berlin Heidelberg, 2008. ISBN 9783540710509. URL https://books.google.com/books?id=hV8o5R7\_5tkC