Hamiltonian Monte Carlo with Asymmetrical Momentum Distributions
Pith reviewed 2026-05-24 13:18 UTC · model grok-4.3
The pith
Alternating Direction HMC restores self-adjointness to allow geometric convergence with asymmetrical momentum distributions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AD-HMC exhibits geometric convergence in Wasserstein distance when auxiliary momentum distributions are allowed to be asymmetrical, provided the alternating-direction modification is used to restore self-adjointness; the result extends to leapfrog integrators equipped with an additional Metropolis-Hastings rejection step.
What carries the argument
Alternating Direction HMC (AD-HMC), which alternates the sign of the momentum updates to restore the self-adjointness property required for the dynamical and probabilistic convergence arguments.
If this is right
- Geometric convergence in Wasserstein distance holds for HMC with any auxiliary momentum distribution once the alternating-direction fix is applied.
- The leapfrog integrator version of AD-HMC also converges geometrically after the extra Metropolis-Hastings correction.
- AD-HMC supplies a rigorous justification for extending popular dynamic auxiliary schemes beyond symmetric Gaussians.
Where Pith is reading between the lines
- The framework could be applied to targets whose geometry favors heavy-tailed or skewed auxiliary momenta.
- Testing the method on high-dimensional multimodal posteriors would reveal whether the weaker conditions translate into practical speed-ups.
- Similar alternating-direction corrections might be examined in other MCMC algorithms that rely on reversible dynamics.
Load-bearing premise
The alternating-direction modification restores the self-adjointness property that asymmetrical momentum distributions otherwise break.
What would settle it
A concrete target distribution and asymmetrical momentum law satisfying the paper's sufficient conditions for which the Wasserstein distance of the AD-HMC chain fails to contract geometrically.
Figures
read the original abstract
Existing rigorous convergence guarantees for the Hamiltonian Monte Carlo (HMC) algorithm use Gaussian auxiliary momentum variables, which are crucially symmetrically distributed. We present a novel convergence analysis for HMC utilizing new dynamical and probabilistic arguments. The convergence is rigorously established under significantly weaker conditions, which among others allow for general auxiliary distributions. In our framework, we show that plain HMC with asymmetrical momentum distributions breaks a key self-adjointness requirement. We propose a modified version of HMC, that we call the Alternating Direction HMC (AD-HMC), which overcomes this difficulty. Sufficient conditions are established under which AD-HMC exhibits geometric convergence in Wasserstein distance. The geometric convergence analysis is extended to when the Hamiltonian motion is approximated by the leapfrog symplectic integrator, where an additional Metropolis-Hastings rejection step is required. Numerical experiments suggest that AD-HMC can generalize a popular dynamic auxiliary scheme to show improved performance over HMC with Gaussian auxiliaries.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that standard HMC with asymmetrical momentum distributions violates self-adjointness, proposes an Alternating Direction HMC (AD-HMC) modification to restore it, and establishes sufficient conditions for geometric convergence in Wasserstein distance under weaker assumptions than Gaussian momenta. The analysis is extended to the leapfrog integrator plus Metropolis-Hastings correction, with numerical experiments suggesting performance gains over Gaussian-auxiliary HMC.
Significance. If the dynamical and probabilistic arguments hold, the result would meaningfully relax a long-standing restriction in HMC theory, permitting more flexible auxiliary distributions while retaining geometric ergodicity guarantees. The Wasserstein-distance analysis and its leapfrog extension are standard tools but applied here to a broader setting.
major comments (2)
- [§3] The central argument rests on the claim that the alternating-direction modification restores self-adjointness (abstract and §3); an explicit verification that the modified transition kernel satisfies the required adjoint relation with respect to the target measure is needed, as this step enables all subsequent dynamical arguments.
- [Theorem 4.1 (or equivalent)] Theorem establishing geometric Wasserstein convergence (likely §4) states sufficient conditions that are weaker than Gaussian; these conditions must be stated completely and checked against the leapfrog + MH extension, because any hidden dependence on symmetry would undermine the weaker-conditions claim.
minor comments (3)
- [Abstract] The abstract asserts 'significantly weaker conditions' without enumerating them; a short explicit list in the abstract or introduction would improve readability.
- [§2–3] Notation for the alternating-direction operator and the resulting kernel should be introduced once and used consistently; current usage risks confusion between the continuous and discrete cases.
- [Numerical experiments] Numerical experiments would benefit from reporting effective sample size per gradient evaluation alongside raw mixing times to allow direct comparison with standard HMC.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below.
read point-by-point responses
-
Referee: [§3] The central argument rests on the claim that the alternating-direction modification restores self-adjointness (abstract and §3); an explicit verification that the modified transition kernel satisfies the required adjoint relation with respect to the target measure is needed, as this step enables all subsequent dynamical arguments.
Authors: We agree that an explicit verification of the adjoint relation would strengthen the exposition. In the revised manuscript we will insert a direct calculation in §3 showing that the AD-HMC transition kernel P satisfies ∫ f(x) P(x, dy) π(dx) = ∫ f(y) P(y, dx) π(dy) with respect to the target π, where the alternation of momentum directions restores the required reversibility. revision: yes
-
Referee: [Theorem 4.1 (or equivalent)] Theorem establishing geometric Wasserstein convergence (likely §4) states sufficient conditions that are weaker than Gaussian; these conditions must be stated completely and checked against the leapfrog + MH extension, because any hidden dependence on symmetry would undermine the weaker-conditions claim.
Authors: Theorem 4.1 states the sufficient conditions in full; they depend only on the properties restored by direction alternation and do not invoke additional symmetry of the auxiliary distribution. The leapfrog-plus-MH extension in §5 applies precisely the same conditions, with the Metropolis-Hastings correction preserving invariance. We will add an explicit remark after Theorem 4.1 confirming that the stated conditions carry over to the leapfrog case without hidden symmetry assumptions. revision: yes
Circularity Check
No significant circularity
full rationale
The paper's derivation identifies the self-adjointness failure in plain HMC with asymmetric momenta, defines the AD-HMC modification to restore it, and then applies standard dynamical and probabilistic arguments to establish geometric Wasserstein convergence under stated sufficient conditions (including the leapfrog + MH extension). No load-bearing step reduces by construction to a fitted parameter, self-definition, or self-citation chain; the convergence result is presented as following from independent analysis rather than renaming or smuggling prior results. The analysis is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard properties of Hamiltonian dynamics and Wasserstein distance metrics hold for the convergence analysis.
Reference graph
Works this paper leans on
-
[1]
C. Andrieu, N. D. Freitas, A. Doucet, and M. I. Jordan. An introduction to mcmc for machine learning. machine learning. Machine Learning, 50: 0 5--–43, 2003
work page 2003
-
[2]
M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander. Optics: Ordering points to identify the clustering structure. SIGMOD Rec., 28 0 (2): 0 49–60, June 1999. ISSN 0163-5808. doi:10.1145/304181.304187. URL https://doi.org/10.1145/304181.304187
-
[3]
M. Betancourt, S. Byrne, S. Livingstone, and M. Girolami. The geometric foundations of hamiltonian monte carlo. Bernoulli, 23 0 (4A): 0 2257--2298, 11 2017. doi:10.3150/16-BEJ810. URL https://doi.org/10.3150/16-BEJ810
- [4]
-
[5]
N. Bou-Rabee and J. M. Sanz-Serna. Randomized hamiltonian monte carlo. Ann. Appl. Probab., 27 0 (4): 0 2159--2194, 08 2017. doi:10.1214/16-AAP1255. URL https://doi.org/10.1214/16-AAP1255
-
[6]
N. Bou-Rabee, A. Eberle, and R. Zimmer. Coupling and convergence for hamiltonian monte carlo. Ann. Appl. Probab., 30 0 (3): 0 1209--1250, 06 2020. doi:10.1214/19-AAP1528. URL https://doi.org/10.1214/19-AAP1528
-
[7]
Journal of Statistical Software , author =
B. Carpenter, A. Gelman, M. Hoffman, D. Lee, B. Goodrich, M. Betancourt, M. Brubaker, J. Guo, P. Li, and A. Riddell. Stan: A probabilistic programming language. Journal of Statistical Software, Articles, 76 0 (1): 0 1--32, 2017. ISSN 1548-7660. doi:10.18637/jss.v076.i01. URL https://www.jstatsoft.org/v076/i01
-
[8]
M.-F. Chen. Equivalence of exponential ergodicity and l2-exponential convergence for markov chains. Stochastic Processes and their Applications, 87 0 (2): 0 281 -- 297, 2000. ISSN 0304-4149. doi:https://doi.org/10.1016/S0304-4149(99)00114-3. URL http://www.sciencedirect.com/science/article/pii/S0304414999001143
-
[9]
Z. Chen and S. S. Vempala. Optimal convergence rate of hamiltonian monte carlo for strongly logconcave distributions. RANDOM, 2019
work page 2019
-
[10]
S. Duane, A. Kennedy, B. J. Pendleton, and D. Roweth. Hybrid monte carlo. Physics Letters B, 195 0 (2): 0 216 -- 222, 1987. ISSN 0370-2693. doi:https://doi.org/10.1016/0370-2693(87)91197-X. URL http://www.sciencedirect.com/science/article/pii/037026938791197X
-
[11]
A. Durmus and \'E . Moulines. Quantitative bounds of convergence for geometrically ergodic markov chain in the wasserstein distance with application to the metropolis adjusted langevin algorithm. Statistics and Computing, 25 0 (1): 0 5--19, 2015. doi:10.1007/s11222-014-9511-z. URL https://doi.org/10.1007/s11222-014-9511-z
-
[12]
A. Durmus, \'E . Moulines, and E. Saksman. On the convergence of hamiltonian monte carlo. arXiv: Computation, 2017
work page 2017
- [13]
-
[14]
A. E. Gelfand and A. F. M. Smith. Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85 0 (410): 0 398--409, 1990. doi:10.1080/01621459.1990.10476213
- [15]
-
[16]
M. Girolami and B. Calderhead. Riemann manifold langevin and hamiltonian monte carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73 0 (2): 0 123--214, 2011. doi:10.1111/j.1467-9868.2010.00765.x. URL https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-9868.2010.00765.x
-
[17]
E. Hairer, C. Lubich, and G. Wanner. Geometric Numerical Integration: Structure-Preserving Algorithms for Ordinary Differential Equations. Springer Series in Computational Mathematics. Springer Berlin Heidelberg, 2013. ISBN 9783662050187. URL https://books.google.com/books?id=cPTxCAAAQBAJ
work page 2013
-
[18]
M. Hairer, J. C. Mattingly, and M. Scheutzow. Asymptotic coupling and a general form of harris'theorem with applications to stochastic delay equations. Probability Theory and Related Fields, 149 0 (1): 0 223--259, 2011. doi:10.1007/s00440-009-0250-6. URL https://doi.org/10.1007/s00440-009-0250-6
- [19]
-
[20]
N. M. Huq and J. Cleland. Bangladesh fertility survey 1989 (main report). Technical report, Dhaka: National Institute of Population Research and Training, 1990
work page 1989
-
[21]
J. Jasche and F.-S. Kitaura. Fast hamiltonian sampling for large‐scale structure inference. Monthly Notices of the Royal Astronomical Society, 407: 0 29 -- 42, 09 2010. doi:10.1111/j.1365-2966.2010.16897.x
-
[22]
A. Joulin and Y. Ollivier. Curvature, concentration and error estimates for markov chain monte carlo. Ann. Probab., 38 0 (6): 0 2418--2442, 11 2010. doi:10.1214/10-AOP541. URL https://doi.org/10.1214/10-AOP541
-
[23]
B. Leimkuhler and S. Reich. Simulating Hamiltonian Dynamics. Cambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, 2004. ISBN 9780521772907. URL https://books.google.com/books?id=tpb-tnsZi5YC
work page 2004
-
[24]
T. Lelievre, M. Rousset, and G. Stoltz. Langevin dynamics with constraints and computation of free energy differences, 2010
work page 2010
-
[25]
S. Livingstone, M. Betancourt, S. Byrne, and M. Girolami. On the geometric ergodicity of hamiltonian monte carlo. Bernoulli, 25 0 (4A): 0 3109--3138, 11 2019. doi:10.3150/18-BEJ1083. URL https://doi.org/10.3150/18-BEJ1083
-
[26]
O. Mangoubi and A. Smith. Rapid mixing of hamiltonian monte carlo on strongly log-concave distributions. Proceedings of Machine Learning Research, 89, 2019
work page 2019
-
[27]
P. A. Markowich and C. Villani. On the trend to equilibrium for the fokker-planck equation: An interplay between physics and functional analysis. Matematica Contemporanea (SBM), 19: 0 1--31, 2000
work page 2000
-
[28]
S. Meyn, R. Tweedie, and P. Glynn. Markov Chains and Stochastic Stability. Cambridge Mathematical Library. Cambridge University Press, 2009. ISBN 9780521731829. URL https://books.google.com/books?id=Md7RnYEPkJwC
work page 2009
-
[29]
R. M. Neal. Bayesian learning via stochastic dynamics. In S. J. Hanson, J. D. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems 5, pages 475--482. Morgan-Kaufmann, 1993
work page 1993
-
[30]
Y. Ollivier. Ricci curvature of markov chains on metric spaces. Journal of Functional Analysis, 256 0 (3): 0 810 -- 864, 2009. ISSN 0022-1236. doi:https://doi.org/10.1016/j.jfa.2008.11.001. URL http://www.sciencedirect.com/science/article/pii/S002212360800493X
-
[31]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in P ython. Journal of Machine Learning Research, 12: 0 2825--2830, 2011
work page 2011
-
[32]
C. Robert and G. Casella. Monte Carlo Statistical Methods. Springer, 2004
work page 2004
-
[33]
G. O. Roberts and J. S. Rosenthal. Small and pseudo-small sets for markov chains. Stochastic Models, 17 0 (2): 0 121--145, 2001. doi:10.1081/STM-100002060. URL https://doi.org/10.1081/STM-100002060
-
[34]
G. O. Roberts and J. S. Rosenthal. General state space markov chains and mcmc algorithms. Probab. Surveys, 1: 0 20--71, 2004. doi:10.1214/154957804100000024. URL https://doi.org/10.1214/154957804100000024
-
[35]
J. Rosenthal. Quantitative convergence rates of markov chains: A simple account. Electron. Commun. Probab., 7: 0 123--128, 2002. doi:10.1214/ECP.v7-1054. URL https://doi.org/10.1214/ECP.v7-1054
-
[36]
A. M. Stuart. Inverse problems: A bayesian perspective. Acta Numerica, 19: 0 451–559, 2010. doi:10.1017/S0962492910000061
-
[37]
M. Talagrand. Transportation cost for gaussian and other product measures. Geometric & Functional Analysis GAFA , 6 0 (3): 0 587--600, May 1996. ISSN 1420-8970. doi:10.1007/BF02249265. URL https://doi.org/10.1007/BF02249265
-
[38]
S. D. Team. Stan modeling language users guide and reference manual, 2017. URL https://mc-stan.org/
work page 2017
-
[39]
L. Verlet. Computer "experiments" on classical fluids. i. thermodynamical properties of lennard-jones molecules. Phys. Rev., 159: 0 98--103, Jul 1967. doi:10.1103/PhysRev.159.98. URL https://link.aps.org/doi/10.1103/PhysRev.159.98
-
[40]
C. Villani. Optimal Transport: Old and New. Grundlehren der mathematischen Wissenschaften. Springer Berlin Heidelberg, 2008. ISBN 9783540710509. URL https://books.google.com/books?id=hV8o5R7\_5tkC
work page 2008
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.