pith. machine review for the scientific record. sign in

arxiv: 2605.02657 · v1 · submitted 2026-05-04 · 💻 cs.LG

Recognition: unknown

CARD: Coarse-to-fine Autoregressive Modeling with Radix-based Decomposition for Transferable Free Energy Estimation

Wenbing Huang, Wen Yan, Yang Liu, Yi He, Ziyang Yu

Authors on Pith no claims yet

Pith reviewed 2026-05-09 16:02 UTC · model grok-4.3

classification 💻 cs.LG
keywords free energy estimationautoregressive modelingradix decompositiongenerative modelsmolecular interactionstransferable modelsdrug discoverythermodynamics
0
0 comments X

The pith

CARD creates a zero-free-energy distribution from molecular coordinates for absolute free energy estimation on arbitrary systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops CARD to overcome the high cost of molecular dynamics simulations and the poor generalization of existing deep learning methods for free energy differences. By decomposing 3D coordinates using a radix-based approach into sequences, it enables an autoregressive model that learns from coarse to fine details. This yields a generative distribution with zero free energy that can propose configurations for any molecular system, allowing direct absolute free energy computation without alchemical pathways or retraining per system. The approach matches classical accuracy on diverse unseen molecules while running about 40 times faster.

Core claim

CARD uses a novel radix-based decomposition to bijectively map 3D molecular coordinates to mixed discrete-continuous sequences. This enables coarse-to-fine autoregressive modeling whose resulting distribution has exactly zero free energy. Such a distribution provides a universal proposal for computing absolute free energies of arbitrary systems without dependence on alchemical transformations between states.

What carries the argument

Radix-based bijective decomposition of 3D coordinates into sequences for coarse-to-fine autoregressive density estimation that enforces zero free energy.

If this is right

  • Enables absolute free energy computation for arbitrary systems without alchemical pathways.
  • Achieves accuracy comparable to classical methods on unseen systems with diverse topologies.
  • Delivers approximately 40-fold speedup in inference compared to simulation-based approaches.
  • Overcomes constraints of system-specific input dimensions in prior deep learning methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The zero free energy property could simplify calculations of other thermodynamic quantities like entropy or enthalpy.
  • Applying the same trained model across many different molecules might accelerate high-throughput screening in drug design.
  • The decomposition technique may generalize to other 3D structure modeling tasks beyond free energy.

Load-bearing premise

The combination of radix-based bijective decomposition and coarse-to-fine autoregressive modeling produces a distribution with exactly zero free energy that generalizes accurately to unseen molecular systems with diverse topologies.

What would settle it

Computing the free energy of samples drawn from the CARD model and finding it is not zero, or observing large errors in free energy estimates for a new molecular topology not seen during training.

read the original abstract

Estimating free energy differences quantifies thermodynamic preferences in molecular interactions, which is central to chemistry and drug discovery. Despite fruitful progress, existing methods still face key limitations: classical computational approaches remain prohibitively expensive due to their reliance on extensive molecular dynamics simulations, while deep learning-based methods are constrained by either less-expressive generative models or input dimensions tied to a specific system, resulting in negligible generalization. To address these challenges, we propose CARD, a generative framework that employs a novel radix-based decomposition to bijectively convert 3D coordinates into mixed discrete-continuous sequences, enabling coarse-to-fine autoregressive modeling with enhanced expressiveness. Notably, the model corresponds to a distribution with zero free energy, serving as a proposal for absolute free energy computation of arbitrary systems without relying on alchemical pathways. Experiments across diverse tasks demonstrate that CARD matches the accuracy of classical computational methods on unseen systems with diverse topologies, while achieving an approximately 40-fold speedup in inference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper introduces CARD, a generative framework for transferable absolute free energy estimation in molecular systems. It proposes a radix-based decomposition that bijectively maps continuous 3D atomic coordinates to mixed discrete-continuous sequences, which are then modeled autoregressively in a coarse-to-fine manner. The central claim is that this construction yields a reference distribution with exactly zero free energy, enabling direct absolute free energy computation for arbitrary systems without alchemical pathways or system-specific training. Experiments across diverse tasks report accuracy matching classical computational methods on unseen systems with varied topologies, alongside an approximately 40-fold inference speedup.

Significance. If the zero-free-energy property is rigorously established and the generalization holds, CARD would offer a significant advance for computational chemistry and drug discovery by providing a fast, transferable alternative to expensive MD-based free energy calculations that avoids alchemical transformations and input-dimension constraints of prior deep learning methods.

major comments (1)
  1. [Abstract and §3 (Model formulation)] Abstract and §3 (Model formulation): The claim that the model 'corresponds to a distribution with zero free energy' is load-bearing for the absolute free energy proposal. The radix-based bijective map converts Euclidean 3D coordinates to a mixed discrete-continuous sequence; the autoregressive product-of-conditionals then defines a density on sequence space. For the induced density q(x) on coordinate space to integrate to 1 (required for F_ref = 0 by construction), the change-of-variables formula must explicitly include log|det J| where J is the Jacobian of the inverse mapping. The manuscript does not derive or correct for this term in the mixed discrete-continuous setting, so the normalization (and thus zero free energy) is not guaranteed.
minor comments (2)
  1. [§4 (Experiments)] §4 (Experiments): The abstract states 'matching accuracy' and '40-fold speedup' but supplies no quantitative tables, error bars, or explicit baseline descriptions (e.g., which classical methods and system sizes). Adding these would improve clarity.
  2. [Notation throughout] Notation throughout: Define the precise radix decomposition function and its inverse more formally, including how discrete and continuous components are handled in the density.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thorough review and for identifying a key point regarding the normalization of the reference distribution. We address the major comment below and will incorporate clarifications in the revised manuscript.

read point-by-point responses
  1. Referee: [Abstract and §3 (Model formulation)] Abstract and §3 (Model formulation): The claim that the model 'corresponds to a distribution with zero free energy' is load-bearing for the absolute free energy proposal. The radix-based bijective map converts Euclidean 3D coordinates to a mixed discrete-continuous sequence; the autoregressive product-of-conditionals then defines a density on sequence space. For the induced density q(x) on coordinate space to integrate to 1 (required for F_ref = 0 by construction), the change-of-variables formula must explicitly include log|det J| where J is the Jacobian of the inverse mapping. The manuscript does not derive or correct for this term in the mixed discrete-continuous setting, so the normalization (and thus zero free energy) is not guaranteed.

    Authors: We appreciate the referee's precise identification of this technical requirement. The radix decomposition is constructed to be bijective, with the continuous residuals mapped in a volume-preserving manner (Jacobian determinant of 1) and discrete indices handled via summation over the finite radix choices. This ensures the induced density q(x) on coordinate space integrates to 1 by construction. While §3 presents the overall autoregressive factorization and bijectivity, we acknowledge that an explicit change-of-variables derivation including the Jacobian term for the mixed discrete-continuous case was omitted. In the revised manuscript we will add this derivation in §3, verifying ∫ q(x) dx = 1 and thereby rigorously confirming the zero free energy property. revision: yes

Circularity Check

1 steps flagged

Zero free energy asserted by construction of radix decomposition plus autoregressive density on sequence space

specific steps
  1. self definitional [Abstract]
    "Notably, the model corresponds to a distribution with zero free energy, serving as a proposal for absolute free energy computation of arbitrary systems without relying on alchemical pathways."

    The zero-free-energy property is presented as an automatic consequence of the radix-based bijective decomposition combined with coarse-to-fine autoregressive modeling. The autoregressive product of conditionals normalizes the density on the decomposed sequence space by construction; the claim that the corresponding q(x) on original 3D coordinates also integrates to 1 (hence F_ref = 0) therefore collapses to the modeling choice itself unless the Jacobian of the inverse mapping is separately shown to preserve the required measure.

full rationale

The paper's central claim is that the CARD model induces a reference distribution with exactly zero free energy, enabling absolute FE estimation without alchemical paths. This property is stated as following directly from the bijective radix decomposition to mixed discrete-continuous sequences and the subsequent coarse-to-fine autoregressive factorization. The autoregressive construction guarantees a normalized density on the sequence space by definition, but the induced density q(x) on Euclidean coordinate space requires an explicit log|det J| term from the change-of-variables formula. Because the paper presents zero free energy as an inherent feature without demonstrating that the Jacobian correction is either zero or included, the claimed property reduces to a definitional consequence of the generative construction rather than an independent result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides insufficient technical detail to identify specific free parameters, axioms, or invented entities. The central innovation is the radix-based decomposition and the zero free energy correspondence, but their mathematical foundations are not elaborated.

pith-pipeline@v0.9.0 · 5476 in / 1246 out tokens · 37982 ms · 2026-05-09T16:02:56.988852+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

74 extracted references · 9 canonical work pages · 5 internal anchors

  1. [1]

    Rdkit: Open-source cheminformatics.https://www.rdkit.org

  2. [2]

    GPT-4 Technical Report

    Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

  3. [3]

    Stochastic Interpolants: A Unifying Framework for Flows and Diffusions

    Michael S Albergo, Nicholas M Boffi, and Eric Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions.arXiv preprint arXiv:2303.08797, 2023

  4. [4]

    Layer Normalization

    Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization.arXiv preprint arXiv:1607.06450, 2016

  5. [5]

    Free energy calculations by computer simulation

    Paul A Bash, UC Singh, R Langridge, and Peter A Kollman. Free energy calculations by computer simulation. Science, 236(4801):564–568, 1987

  6. [6]

    Efficient estimation of free energy differences from monte carlo data.Journal of Computational Physics, 22(2):245–268, 1976

    Charles H Bennett. Efficient estimation of free energy differences from monte carlo data.Journal of Computational Physics, 22(2):245–268, 1976

  7. [7]

    Development and benchmarking of open force field 2.0

    Simon Boothroyd, Pavan Kumar Behara, Owen C Madin, David F Hahn, Hyesu Jang, Vytautas Gapsys, Jeffrey R Wagner, Joshua T Horton, David L Dotson, Matthew W Thompson, et al. Development and benchmarking of open force field 2.0. 0: the sage small molecule force field.Journal of chemical theory and computation, 19(11): 3251–3275, 2023

  8. [8]

    Language models are few-shot learners

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advancesin neural information processing systems, 33:1877–1901, 2020

  9. [9]

    Free energy methods for the description of molecular processes.Annual Review of Biophysics, 52(1):113–138, 2023

    Christophe Chipot. Free energy methods for the description of molecular processes.Annual Review of Biophysics, 52(1):113–138, 2023

  10. [10]

    Alchemical free energy methods for drug discovery: progress and challenges.Current opinion in structural biology, 21(2):150–160, 2011

    John D Chodera, David L Mobley, Michael R Shirts, Richard W Dixon, Kim Branson, and Vijay S Pande. Alchemical free energy methods for drug discovery: progress and challenges.Current opinion in structural biology, 21(2):150–160, 2011

  11. [11]

    Relative binding free energy calculations in drug discovery: recent advances and practical considerations.Journal of chemical information and modeling, 57(12):2911–2937, 2017

    Zoe Cournia, Bryce Allen, and Woody Sherman. Relative binding free energy calculations in drug discovery: recent advances and practical considerations.Journal of chemical information and modeling, 57(12):2911–2937, 2017

  12. [12]

    Extending the applicability of the ani deep learning molecular potential to sulfur and halogens

    Christian Devereux, Justin S Smith, Kate K Huddleston, Kipton Barros, Roman Zubatyuk, Olexandr Isayev, and Adrian E Roitberg. Extending the applicability of the ani deep learning molecular potential to sulfur and halogens. Journal of chemical theory and computation, 16(7):4192–4202, 2020

  13. [13]

    Deepbar: a fast and exact method for binding free energy computation.The journal of physical chemistry letters, 12(10):2509–2515, 2021

    Xinqiang Ding and Bin Zhang. Deepbar: a fast and exact method for binding free energy computation.The journal of physical chemistry letters, 12(10):2509–2515, 2021

  14. [14]

    Uncertainty driven active learning of coarse grained free energy models.npj Computational Materials, 10(1):9, 2024

    Blake R Duschatko, Jonathan Vandermause, Nicola Molinari, and Boris Kozinsky. Uncertainty driven active learning of coarse grained free energy models.npj Computational Materials, 10(1):9, 2024

  15. [15]

    Openmm 8: molecular dynamics simulation with machine learning potentials.The Journal of Physical Chemistry B, 128(1):109–116, 2023

    Peter Eastman, Raimondas Galvelis, Raúl P Peláez, Charlles RA Abreu, Stephen E Farr, Emilio Gallicchio, Anton Gorenko, Michael M Henry, Frank Hu, Jing Huang, et al. Openmm 8: molecular dynamics simulation with machine learning potentials.The Journal of Physical Chemistry B, 128(1):109–116, 2023

  16. [16]

    Freeflow: Latent flow matching for free energy difference estimation

    Ege Erdogan, Radoslav Ralev, Mika Rebensburg, Céline Marquet, Leon Klein, and Hannes Stark. Freeflow: Latent flow matching for free energy difference estimation. InICLR 2025 Workshop on Machine Learning Multiscale Processes, 2025

  17. [17]

    Protein-ligand binding representation learning from fine-grained interactions

    Shikun Feng, Minghao Li, Yinjun Jia, Wei-Ying Ma, and Yanyan Lan. Protein-ligand binding representation learning from fine-grained interactions. InThe Twelfth International Conference on Learning Representations, 2024

  18. [18]

    Algorithm 97: shortest path.Communications of the ACM, 5(6):345–345, 1962

    Robert W Floyd. Algorithm 97: shortest path.Communications of the ACM, 5(6):345–345, 1962

  19. [19]

    Self-supervised pocket pretraining via protein fragment-surroundings alignment

    Bowen Gao, Yinjun Jia, YuanLe Mo, Yuyan Ni, Wei-Ying Ma, Zhi-Ming Ma, and Yanyan Lan. Self-supervised pocket pretraining via protein fragment-surroundings alignment. InThe Twelfth International Conference on Learning Representations, 2024. 15

  20. [20]

    On free energy calculations in drug discovery.Accounts of Chemical Research, 58(20):3137–3145, 2025

    Alessia Ghidini, Eleonora Serra, and Andrea Cavalli. On free energy calculations in drug discovery.Accounts of Chemical Research, 58(20):3137–3145, 2025

  21. [21]

    Merck molecular force field

    Thomas A Halgren. Merck molecular force field. i. basis, form, scope, parameterization, and performance of mmff94. Journal of computational chemistry, 17(5-6):490–519, 1996

  22. [22]

    Feat: Free energy estimators with adaptive transport.arXiv preprint arXiv:2504.11516, 2025

    Jiajun He, Yuanqi Du, Francisco Vargas, Yuanqing Wang, Carla P Gomes, José Miguel Hernández-Lobato, and Eric Vanden-Eijnden. Feat: Free energy estimators with adaptive transport.arXiv preprint arXiv:2504.11516, 2025

  23. [23]

    Denoising diffusion probabilistic models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020

  24. [24]

    Multiresolution equivariant graph variational autoencoder.Machine Learning: Science and Technology, 4(1):015031, 2023

    Truong Son Hy and Risi Kondor. Multiresolution equivariant graph variational autoencoder.Machine Learning: Science and Technology, 4(1):015031, 2023

  25. [25]

    Zinc20—a free ultralarge-scale chemical database for ligand discovery.Journal of chemical information and modeling, 60(12):6065–6073, 2020

    John J Irwin, Khanh G Tang, Jennifer Young, Chinzorig Dandarchuluun, Benjamin R Wong, Munkhzul Khurel- baatar, Yurii S Moroz, John Mayfield, and Roger A Sayle. Zinc20—a free ultralarge-scale chemical database for ligand discovery.Journal of chemical information and modeling, 60(12):6065–6073, 2020

  26. [26]

    Nonequilibrium equality for free energy differences.Physical Review Letters, 78(14):2690, 1997

    Christopher Jarzynski. Nonequilibrium equality for free energy differences.Physical Review Letters, 78(14):2690, 1997

  27. [27]

    Targeted free energy perturbation.Physical Review E, 65(4):046122, 2002

    Christopher Jarzynski. Targeted free energy perturbation.Physical Review E, 65(4):046122, 2002

  28. [28]

    Rare events and the convergence of exponentially averaged work values.Physical Review E—Statistical, Nonlinear, and Soft Matter Physics, 73(4):046105, 2006

    Christopher Jarzynski. Rare events and the convergence of exponentially averaged work values.Physical Review E—Statistical, Nonlinear, and Soft Matter Physics, 73(4):046105, 2006

  29. [29]

    Alphafold meets flow matching for generating protein ensembles

    Bowen Jing, Bonnie Berger, and Tommi Jaakkola. Alphafold meets flow matching for generating protein ensembles. In Forty-firstInternational Conference on Machine Learning, 2024

  30. [30]

    Deepaffinity: interpretable deep learning of compound– protein affinity through unified recurrent and convolutional neural networks.Bioinformatics, 35(18):3329–3338, 2019

    Mostafa Karimi, Di Wu, Zhangyang Wang, and Yang Shen. Deepaffinity: interpretable deep learning of compound– protein affinity through unified recurrent and convolutional neural networks.Bioinformatics, 35(18):3329–3338, 2019

  31. [31]

    The good, the bad, and the ugly:“hipen”, a new dataset for validating (s) qm/mm free energy simulations.Molecules, 24(4):681, 2019

    Fiona L Kearns, Luke Warrensford, Stefan Boresch, and H Lee Woodcock. The good, the bad, and the ugly:“hipen”, a new dataset for validating (s) qm/mm free energy simulations.Molecules, 24(4):681, 2019

  32. [32]

    Statistical mechanics of fluid mixtures.The Journal of chemical physics, 3(5):300–313, 1935

    John G Kirkwood. Statistical mechanics of fluid mixtures.The Journal of chemical physics, 3(5):300–313, 1935

  33. [33]

    Transferable boltzmann generators.Advances in Neural Information Processing Systems, 37:45281–45314, 2024

    Leon Klein and Frank Noé. Transferable boltzmann generators.Advances in Neural Information Processing Systems, 37:45281–45314, 2024

  34. [34]

    Generalist equivariant transformer towards 3d molecular interaction learning

    Xiangzhe Kong, Wenbing Huang, and Yang Liu. Generalist equivariant transformer towards 3d molecular interaction learning. InForty-firstInternational Conference on Machine Learning, 2024

  35. [35]

    Scalable emulation of protein equilibrium ensembles with generative deep learning.Science, page eadv9817, 2025

    Sarah Lewis, Tim Hempel, José Jiménez-Luna, Michael Gastegger, Yu Xie, Andrew YK Foong, Victor García Satorras, Osama Abdin, Bastiaan S Veeling, Iryna Zaporozhets, et al. Scalable emulation of protein equilibrium ensembles with generative deep learning.Science, page eadv9817, 2025

  36. [36]

    Harnessing pre-trained models for accurate prediction of protein-ligand binding affinity

    Jiashan Li and Xinqi Gong. Harnessing pre-trained models for accurate prediction of protein-ligand binding affinity. BMC bioinformatics, 26(1):1–21, 2025

  37. [37]

    Flow matching for generative modeling

    Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2023

  38. [38]

    Molecular geometry pretraining with se (3)-invariant denoising distance matching

    Shengchao Liu, Hongyu Guo, and Jian Tang. Molecular geometry pretraining with se (3)-invariant denoising distance matching. InThe Eleventh International Conference on Learning Representations, 2023

  39. [39]

    SGDR: Stochastic Gradient Descent with Warm Restarts

    Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016

  40. [40]

    Decoupled Weight Decay Regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017

  41. [41]

    Solvation free energies from neural thermodynamic integration

    Bálint Máté, François Fleuret, and Tristan Bereau. Solvation free energies from neural thermodynamic integration. The Journal of Chemical Physics, 162(12), 2025. 16

  42. [42]

    Considerations in the use of ml interaction potentials for free energy calculations.arXiv preprint arXiv:2403.13952, 2024

    Orlando A Mendible, Jonathan K Whitmer, and Yamil J Colón. Considerations in the use of ml interaction potentials for free energy calculations.arXiv preprint arXiv:2403.13952, 2024

  43. [43]

    Best practices for alchemical free energy calculations [article v1

    Antonia SJS Mey, Bryce K Allen, Hannah E Bruce Macdonald, John D Chodera, David F Hahn, Maximilian Kuhn, Julien Michel, David L Mobley, Levi N Naden, Samarjeet Prasad, et al. Best practices for alchemical free energy calculations [article v1. 0].Living journal of computational molecular science, 2(1):18378, 2020

  44. [44]

    Escaping atom types in force fields using direct chemical perception.Journal of chemical theory and computation, 14(11):6076–6092, 2018

    David L Mobley, Caitlin C Bannan, Andrea Rizzi, Christopher I Bayly, John D Chodera, Victoria T Lim, Nathan M Lim, Kyle A Beauchamp, David R Slochower, Michael R Shirts, et al. Escaping atom types in force fields using direct chemical perception.Journal of chemical theory and computation, 14(11):6076–6092, 2018

  45. [45]

    Neural thermodynamic integration: Free energies from energy-based diffusion models

    Bálint Máté, François Fleuret, and Tristan Bereau. Neural thermodynamic integration: Free energies from energy-based diffusion models. The Journal of Physical Chemistry Letters, 15(45):11395–11404, 2024. doi: 10.1021/acs.jpclett.4c01958. URLhttps://doi.org/10.1021/acs.jpclett.4c01958. PMID: 39503734

  46. [46]

    Multiresolution graph transformers and wavelet positional encoding for learning long-range and hierarchical structures.The Journal of Chemical Physics, 159(3), 2023

    Nhat Khang Ngo, Truong Son Hy, and Risi Kondor. Multiresolution graph transformers and wavelet positional encoding for learning long-range and hierarchical structures.The Journal of Chemical Physics, 159(3), 2023

  47. [47]

    Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning.Science, 365(6457):eaaw1147, 2019

    Frank Noé, Simon Olsson, Jonas Köhler, and Hao Wu. Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning.Science, 365(6457):eaaw1147, 2019

  48. [48]

    Modification of the generalized born model suitable for macromolecules

    Alexey Onufriev, Donald Bashford, and David A Case. Modification of the generalized born model suitable for macromolecules. The Journal of Physical Chemistry B, 104(15):3712–3720, 2000

  49. [49]

    Moltaut: A tool for the rapid generation of favorable tautomer in aqueous solution.Journal of Chemical Information and Modeling, 63(7):1833–1840, 2023

    Xiaolin Pan, Fanyu Zhao, Yueqing Zhang, Xingyu Wang, Xudong Xiao, John ZH Zhang, and Changge Ji. Moltaut: A tool for the rapid generation of favorable tautomer in aqueous solution.Journal of Chemical Information and Modeling, 63(7):1833–1840, 2023

  50. [50]

    Fast and accurate prediction of tautomer ratios in aqueous solution via a siamese neural network.Journal of chemical theory and computation, 21(6):3132–3141, 2025

    Xiaolin Pan, Xudong Zhang, Song Xia, and Yingkai Zhang. Fast and accurate prediction of tautomer ratios in aqueous solution via a siamese neural network.Journal of chemical theory and computation, 21(6):3132–3141, 2025

  51. [51]

    Variational inference with normalizing flows

    Danilo Rezende and Shakir Mohamed. Variational inference with normalizing flows. InInternational conference on machine learning, pages 1530–1538. PMLR, 2015

  52. [52]

    Ppi-affinity: A web tool for the prediction and optimization of protein–peptide and protein–protein binding affinity.Journal of proteome research, 21(8):1829–1841, 2022

    Sandra Romero-Molina, Yasser B Ruiz-Blanco, Joel Mieres-Perez, Mirja Harms, Jan Munch, Michael Ehrmann, and Elsa Sanchez-Garcia. Ppi-affinity: A web tool for the prediction and optimization of protein–peptide and protein–protein binding affinity.Journal of proteome research, 21(8):1829–1841, 2022

  53. [53]

    2d similarity, diversity and clustering in rdkit.RDKit UGM, 2019

    RA Sayle. 2d similarity, diversity and clustering in rdkit.RDKit UGM, 2019

  54. [54]

    Statistically optimal analysis of samples from multiple equilibrium states

    Michael R Shirts and John D Chodera. Statistically optimal analysis of samples from multiple equilibrium states. The Journal of chemical physics, 129(12), 2008

  55. [55]

    Score- based generative modeling through stochastic differential equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score- based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations, 2021

  56. [56]

    Scalable equilibrium sampling with sequential boltzmann generators

    Charlie B Tan, Joey Bose, Chen Lin, Leon Klein, Michael M Bronstein, and Alexander Tong. Scalable equilibrium sampling with sequential boltzmann generators. InForty-secondInternational Conference on Machine Learning, 2025

  57. [57]

    Amortized sampling with transferable normalizing flows

    Charlie B Tan, Majdi Hassan, Leon Klein, Saifuddin Syed, Dominique Beaini, Michael M Bronstein, Alexander Tong, and Kirill Neklyudov. Amortized sampling with transferable normalizing flows. InThe Thirty-ninthAnnual Conference on Neural Information Processing Systems, 2025

  58. [58]

    Reweighting from molecular mechanics force fields to the ani-2x neural network potential

    Sara Tkaczyk, Johannes Karwounopoulos, Andreas Schöller, H Lee Woodcock, Thierry Langer, Stefan Boresch, and Marcus Wieder. Reweighting from molecular mechanics force fields to the ani-2x neural network potential. Journal of Chemical Theory and Computation, 20(7):2719–2728, 2024

  59. [59]

    Atom3d: Tasks on molecules in three dimensions

    Raphael John Lamarre Townshend, Martin Vögele, Patricia Adriana Suriana, Alexander Derry, Alexander Powers, Yianni Laloudakis, Sidhika Balachandar, Bowen Jing, Brandon M Anderson, Stephan Eismann, et al. Atom3d: Tasks on molecules in three dimensions. InThirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track(Round 1)...

  60. [60]

    Scalable hierarchical self-attention with learnable hierarchy for long-range interactions.Transactions on Machine Learning Research, 2024

    Thuan Nguyen Anh Trang, Khang Nhat Ngo, Hugo Sonnery, Thieu Vo, Siamak Ravanbakhsh, and Truong Son Hy. Scalable hierarchical self-attention with learnable hierarchy for long-range interactions.Transactions on Machine Learning Research, 2024

  61. [61]

    Escorted free energy simulations: Improving convergence by reducing dissipation.Physical Review Letters, 100(19):190601, 2008

    Suriyanarayanan Vaikuntanathan and Christopher Jarzynski. Escorted free energy simulations: Improving convergence by reducing dissipation.Physical Review Letters, 100(19):190601, 2008

  62. [62]

    Development and testing of a general amber force field.Journal of computational chemistry, 25(9):1157–1174, 2004

    Junmei Wang, Romain M Wolf, James W Caldwell, Peter A Kollman, and David A Case. Development and testing of a general amber force field.Journal of computational chemistry, 25(9):1157–1174, 2004

  63. [63]

    Protein conformation generation via force-guided se (3) diffusion models

    Lihao Wang, Yuning Shen, Yiqun Wang, Huizhuo Yuan, Yue Wu, Quanquan Gu, et al. Protein conformation generation via force-guided se (3) diffusion models. InForty-firstInternational Conference on Machine Learning, 2024

  64. [64]

    Ab initio characterization of protein molecular dynamics with ai2bmd.Nature, 635(8040): 1019–1027, 2024

    Tong Wang, Xinheng He, Mingyu Li, Yatao Li, Ran Bi, Yusong Wang, Chaoran Cheng, Xiangzhen Shen, Jiawei Meng, He Zhang, et al. Ab initio characterization of protein molecular dynamics with ai2bmd.Nature, 635(8040): 1019–1027, 2024

  65. [65]

    Fitting quantum machine learning potentials to experimental free energy data: predicting tautomer ratios in solution.Chemical science, 12(34):11364–11381, 2021

    Marcus Wieder, Josh Fass, and John D Chodera. Fitting quantum machine learning potentials to experimental free energy data: predicting tautomer ratios in solution.Chemical science, 12(34):11364–11381, 2021

  66. [66]

    Targeted free energy estimation via learned mappings

    Peter Wirnsberger, Andrew J Ballard, George Papamakarios, Stuart Abercrombie, Sébastien Racanière, Alexander Pritzel, Danilo Jimenez Rezende, and Charles Blundell. Targeted free energy estimation via learned mappings. The Journal of Chemical Physics, 153(14), 2020

  67. [67]

    Ppi-graphomer: enhanced protein-protein affinity prediction using pretrained and graph transformer models.BMC bioinformatics, 26(1):116, 2025

    Jun Xie, Youli Zhang, Ziyang Wang, Xiaocheng Jin, Xiaoli Lu, Shengxiang Ge, and Xiaoping Min. Ppi-graphomer: enhanced protein-protein affinity prediction using pretrained and graph transformer models.BMC bioinformatics, 26(1):116, 2025

  68. [68]

    Normalizing flows are capable generative models

    Shuangfei Zhai, Ruixiang ZHANG, Preetum Nakkiran, David Berthelot, Jiatao Gu, Huangjie Zheng, Tianrong Chen, Miguel Ángel Bautista, Navdeep Jaitly, and Joshua M Susskind. Normalizing flows are capable generative models. In Forty-secondInternational Conference on Machine Learning, 2025

  69. [69]

    Predicting protein-protein interactions in the human proteome.Science, 390(6771):eadt1630, 2025

    Jing Zhang, Ian R Humphreys, Jimin Pei, Jinuk Kim, Chulwon Choi, Rongqing Yuan, Jesse Durham, Siqi Liu, Hee-Jung Choi, Minkyung Baek, et al. Predicting protein-protein interactions in the human proteome.Science, 390(6771):eadt1630, 2025

  70. [70]

    Uni-mol: A universal 3d molecular representation learning framework

    Gengmo Zhou, Zhifeng Gao, Qiankun Ding, Hang Zheng, Hongteng Xu, Zhewei Wei, Linfeng Zhang, and Guolin Ke. Uni-mol: A universal 3d molecular representation learning framework. InThe eleventh international conference on learning representations, 2023

  71. [71]

    A multi-source molecular network representation model for protein– protein interactions prediction.Scientific Reports, 14(1):6184, 2024

    Hai-Tao Zou, Bo-Ya Ji, and Xiao-Lan Xie. A multi-source molecular network representation model for protein– protein interactions prediction.Scientific Reports, 14(1):6184, 2024

  72. [72]

    High-temperature equation of state by a perturbation method

    Robert W Zwanzig. High-temperature equation of state by a perturbation method. i. nonpolar gases.The Journal of Chemical Physics, 22(8):1420–1426, 1954. 18 Appendix A Reproducibility The code and data will be made publicly available upon publication. B Proof of Propositions B.1 Proof of proposition 4.1 Assume that every conformation x∈ Ωadmits a unique PC...

  73. [73]

    The learning rate is set to 1e-3

    Stage I.We first train the model using only the negative log-likelihood objectiveLNLL, with λ1 = 1and λ2 = 0. The learning rate is set to 1e-3

  74. [74]

    In this stage, we setλ1 = 1andλ 2 = 0.01, with a learning rate of 2e-4

    Stage II.After convergence of the first stage, we continue training by incorporating the energy-matching objective Lenergy, thereby exploiting the force-field labels to rescale and refine the energy landscape. In this stage, we setλ1 = 1andλ 2 = 0.01, with a learning rate of 2e-4. Specifically, we find that jointly optimizing both objectives from scratch ...