Few-Step Boltzmann Generators via Scalable Likelihood Flow Maps

Benjamin Kurt Miller; Hanlin Yu; Jose Miguel Hernandez-Lobato; Max Simchowitz; Nicholas M. Boffi; Omar Chehab; Pradeep Ravikumar; RuiKang OuYang; Xinyue Ai; Yutong He

arxiv: 2606.29110 · v1 · pith:264PDCGVnew · submitted 2026-06-27 · 💻 cs.LG · stat.ML

Few-Step Boltzmann Generators via Scalable Likelihood Flow Maps

RuiKang OuYang , Hanlin Yu , Xinyue Ai , Yutong He , Nicholas M. Boffi , Pradeep Ravikumar , Jose Miguel Hernandez-Lobato , Max Simchowitz

show 2 more authors

Benjamin Kurt Miller Omar Chehab

This is my paper

Pith reviewed 2026-06-30 09:14 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords flow-based generative modelslikelihood estimationBoltzmann generatorslikelihood distillationfew-step inferencevariance reductionmolecular simulation

0 comments

The pith

SCALLOP replaces Hutchinson's estimator with a vectorized distillation objective to train accurate few-step likelihood flow maps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Flow-based models generate high-quality samples in few steps but have struggled to estimate likelihoods without high-variance stochastic methods or restrictive architectures. SCALLOP extends the F2D2 likelihood flow map approach by introducing a Hutchinson-free distillation objective that supports vectorized computation. This change lowers training variance and time while raising performance on Boltzmann generators for molecules and on image datasets. The resulting models remain competitive with existing methods yet deliver up to tenfold faster inference than the quickest prior baselines. Readers interested in practical density estimation for simulation tasks would see value in a method that removes a major source of instability without sacrificing speed.

Core claim

SCALLOP builds on F2D2 by replacing Hutchinson's trace estimator with an alternative likelihood distillation objective that is Hutchinson-free and admits a vectorized formulation, thereby enabling the training of flow map models that output both samples and densities in a small number of function evaluations while reducing variance and training time.

What carries the argument

The Hutchinson-free likelihood distillation objective, which replaces stochastic trace estimation with a scalable vectorized alternative inside the F2D2 likelihood flow map framework.

If this is right

Training variance drops substantially relative to F2D2 while training completes in less wall-clock time.
Performance on molecular Boltzmann generation and image modeling improves consistently over the F2D2 baseline.
Inference speed reaches up to ten times that of the fastest competing method while remaining competitive with state-of-the-art accuracy.
Both sample generation and density estimation become feasible inside the same few-step flow map architecture.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The vectorized objective may allow larger batch sizes or deeper networks on hardware that supports efficient matrix operations, extending the method to higher-dimensional systems.
If the same distillation approach transfers to other flow architectures, it could relax the need for exact-trace-friendly designs across generative modeling.
Lower training variance might reduce the number of random seeds required for reliable hyperparameter search in related few-step models.

Load-bearing premise

The alternative Hutchinson-free likelihood distillation objective accurately captures the true model likelihood and does not introduce new biases or failure modes that would offset the reported gains in stability and speed.

What would settle it

A side-by-side numerical comparison, on a low-dimensional toy distribution where exact likelihoods can be computed analytically, between the likelihood values reported by a trained SCALLOP model and the ground-truth densities would show systematic over- or under-estimation if the objective fails to match true likelihoods.

read the original abstract

Recent progress in flow-based generative modeling has led to models that output high-quality samples while using only a small number of function evaluations. However, at present, there is a lack of similar advances in estimating the model likelihood. In particular, most existing methods either rely on restrictive architectures that enable exact calculations, or use stochastic approximations such as Hutchinson's trace estimator that introduce substantial variance. In this work, we introduce SCAlable LikeLihood distillation of flOw maPs (SCALLOP). SCALLOP builds on the recently proposed F2D2, a likelihood flow map model that can generate samples and their densities in a small number of function evaluations. While F2D2 uses Hutchinson's estimator during training, we introduce an alternative and more scalable likelihood distillation objective that is Hutchinson-free and admits a vectorized formulation. Empirically, we demonstrate the effectiveness of SCALLOP as a Boltzmann generator in molecular science, and further validate its benefit on image datasets. SCALLOP significantly reduces both training variance and training time while consistently improving performance compared to F2D2, and is competitive with the state-of-the-art while achieving up to 10x inference speedup over the fastest baseline.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SCALLOP replaces Hutchinson's estimator with a vectorized distillation objective for likelihoods in few-step flows, which looks like a practical step if the equivalence holds.

read the letter

The core contribution is SCALLOP, a Hutchinson-free likelihood distillation objective for F2D2-style flow maps that supports vectorized training. It targets the gap where few-step generative models produce samples quickly but struggle with reliable density estimates without high-variance stochastic methods.

The paper shows this on Boltzmann generators for molecular science and on image data. It reports lower training variance, shorter training time, better performance than the F2D2 baseline, competitiveness with existing methods, and up to 10x faster inference than the quickest prior approach. Those are the kinds of gains that matter when both sampling speed and likelihood values are required.

The new element is the specific alternative objective that avoids the trace estimator while remaining independent of the original F2D2 training loop. That setup reduces the risk of circular fitting.

The main open question is whether the distillation objective is an unbiased replacement for the true likelihood or whether it introduces systematic error through assumptions on the flow map or target density. In high-dimensional Boltzmann settings this could matter, and the abstract alone does not show the derivation or any checks against exact likelihoods. The experimental claims would also be stronger with visible error bars and targeted ablations.

This work is for groups building flow models for scientific sampling where density estimation is a requirement. Readers focused on practical likelihood methods in generative modeling will find the empirical comparison useful.

It deserves peer review because the problem is well-posed and the direction is grounded in the existing F2D2 framework, even though the bias and experimental details need verification.

Referee Report

2 major / 1 minor

Summary. The paper introduces SCALLOP, an extension of the F2D2 likelihood flow map model for few-step Boltzmann generators. It replaces Hutchinson's trace estimator with a new Hutchinson-free, vectorized likelihood distillation objective during training, and reports that this yields lower training variance, shorter training time, improved performance over F2D2, competitiveness with state-of-the-art methods, and up to 10x inference speedup on molecular and image datasets.

Significance. If the new objective is a faithful, unbiased replacement for the true model likelihood, the work would offer a practical advance in scalable training of flow-based models that simultaneously support fast sampling and density estimation, with direct relevance to molecular simulation where both capabilities are required.

major comments (2)

[§3 (methods) and abstract] The central performance claims (reduced variance, improved accuracy, 10x speedup) rest on the new likelihood distillation objective being an unbiased equivalent to the true model likelihood. The manuscript must supply an explicit derivation or proof (likely in §3 or the methods section) showing that the vectorized formulation does not introduce systematic bias from implicit assumptions on flow-map invertibility or the target density in high-dimensional Boltzmann settings; absent this, the reported gains cannot be taken as evidence that the objective is a faithful replacement.
[Experiments section / results tables] Table or figure reporting the main experimental results (molecular Boltzmann generation and image validation) must include error bars, ablation studies on the distillation objective, and full experimental protocol details; without them the claims of consistent improvement over F2D2 and competitiveness with SOTA cannot be evaluated for statistical reliability.

minor comments (1)

Notation for the flow map and the distillation loss should be introduced with explicit definitions before first use to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment point by point below and will incorporate revisions to improve the manuscript's rigor and clarity.

read point-by-point responses

Referee: [§3 (methods) and abstract] The central performance claims (reduced variance, improved accuracy, 10x speedup) rest on the new likelihood distillation objective being an unbiased equivalent to the true model likelihood. The manuscript must supply an explicit derivation or proof (likely in §3 or the methods section) showing that the vectorized formulation does not introduce systematic bias from implicit assumptions on flow-map invertibility or the target density in high-dimensional Boltzmann settings; absent this, the reported gains cannot be taken as evidence that the objective is a faithful replacement.

Authors: We agree that an explicit derivation strengthens the claims. In the revised manuscript we will add a self-contained derivation in §3 establishing that the vectorized likelihood distillation objective is an unbiased estimator of the model log-likelihood. The derivation relies only on the change-of-variables formula for the flow map and the definition of the distillation target; it does not invoke additional invertibility assumptions beyond those already stated for F2D2 nor any special properties of the target density. revision: yes
Referee: [Experiments section / results tables] Table or figure reporting the main experimental results (molecular Boltzmann generation and image validation) must include error bars, ablation studies on the distillation objective, and full experimental protocol details; without them the claims of consistent improvement over F2D2 and competitiveness with SOTA cannot be evaluated for statistical reliability.

Authors: We acknowledge that the current presentation lacks these elements. The revised manuscript will include error bars (computed over multiple random seeds) on all quantitative results, dedicated ablation studies isolating the effect of the distillation objective, and a complete experimental protocol (hyperparameters, data splits, hardware, and evaluation metrics) placed in the main text or a clearly referenced appendix. revision: yes

Circularity Check

0 steps flagged

Minor self-citation to F2D2 with no load-bearing reduction in central claims

full rationale

The paper introduces SCALLOP as a new Hutchinson-free likelihood distillation objective that builds on but is presented as an alternative to F2D2. No equations or steps in the abstract reduce the new objective or performance claims to quantities fitted inside the same model by construction. The central claims rest on empirical validation rather than self-referential definitions or predictions. A single self-citation to prior F2D2 work is present but does not make the reported gains equivalent to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; the central claim rests on the assumption that the new distillation objective is both correct and practically superior, but no free parameters, domain axioms, or invented entities are enumerated in the provided text.

axioms (1)

domain assumption Flow-based models admit invertible mappings whose Jacobian determinant can be used for exact likelihoods under suitable architectural constraints.
Implicit background assumption of all flow-based generative modeling referenced in the abstract.

pith-pipeline@v0.9.1-grok · 5781 in / 1195 out tokens · 50974 ms · 2026-06-30T09:14:57.273120+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 23 canonical work pages · 9 internal anchors

[1]

URLhttps://arxiv.org/abs/2507.00846. X. Ai, Y. He, A. Gu, R. Salakhutdinov, J. Kolter, N. Boffi, and M. Simchowitz. Joint distillation for fast likelihood evaluation and sampling in flow-based models. InInternational Conference on Learning Representations,

work page arXiv
[2]

URLhttps://arxiv.org/abs/2505.18825. 10 N. M. Boffi, M. S. Albergo, and E. Vanden-Eijnden. How to build a consistency model: Learning flow maps via self-distillation. InAnnual Conference on Neural Information Processing Systems,

work page arXiv
[3]

URL https://arxiv.org/abs/1806.07366. K. Choi, C. Meng, Y. Song, and S. Ermon. Density ratio estimation via infinitesimal classification.arXiv e-prints,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

URLhttps://arxiv.org/abs/2310.16624. K. Frans, D. Hafner, S. Levine, and P. Abbeel. One step diffusion via shortcut models,

work page arXiv
[5]

URLhttps://arxiv.org/ abs/2410.12557. R. Gao, E. Nijkamp, D. P. Kingma, Z. Xu, A. M. Dai, and Y. N. Wu. Flow contrastive estimation of energy-based models. In2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pages 7515–7525,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

doi: 10.1109/CVPR42600.2020.00754. Z. Geng, M. Deng, X. Bai, J. Z. Kolter, and K. He. Mean flows for one-step generative modeling,

work page doi:10.1109/cvpr42600.2020.00754 2020
[7]

URLhttps: //arxiv.org/abs/2505.13447. W. Grathwohl, R. T. Q. Chen, J. Bettencourt, I. Sutskever, and D. Duvenaud. Ffjord: Free-form continuous dynamics for scalable reversible generative models,

work page internal anchor Pith review Pith/arXiv arXiv
[8]

URLhttps://arxiv.org/abs/1810.01367. H. Grubmüller. Predicting slow structural transitions in macromolecular systems: Conformational flooding.Physical Review E, 52(3):2893,

work page internal anchor Pith review Pith/arXiv arXiv
[9]

URL https://arxiv.org/abs/2506.05310. M. U. Gutmann and A. Hyvärinen. Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics.Journal of Machine Learning Research, 13(11):307–361,

work page arXiv
[10]

URLhttps://arxiv.org/abs/2506.05668. J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models,

work page internal anchor Pith review Pith/arXiv arXiv
[11]

URLhttps://arxiv.org/abs/2006.11239. M. Hutchinson. A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines.Communications in Statistics - Simulation and Computation, 19(2):433–450,

work page internal anchor Pith review Pith/arXiv arXiv 2006
[12]

URL https: //doi.org/10.1080/03610919008812866

doi: 10.1080/03610919008812866. URL https: //doi.org/10.1080/03610919008812866. A. Hyvärinen. Estimation of non-normalized statistical models by score matching.Journal of Machine Learning Research, 6(24):695–709,

work page doi:10.1080/03610919008812866
[13]

URLhttps://proceedings.neurips.cc/paper_files/paper/2023/file/ bc827452450356f9f558f4e4568d553b-Paper-Conference.pdf. J. Köhler, L. Klein, and F. Noe. Equivariant flows: Exact likelihood generative learning for symmetric densities. In H. D. III and A. Singh, editors,Proceedings of the 37th International Conference on Machine Learning, volume 119 of Procee...

2023
[14]

R. A. Meyer, C. Musco, C. Musco, and D. P. Woodruff.Hutch++: Optimal Stochastic Trace Estimation, pages 142–155. doi: 10.1137/1.9781611976496.16. J. Nam, B. Máté, A. P. Toshev, M. Kaniselvan, R. Gómez-Bombarelli, R. T. Chen, B. Wood, G.-H. Liu, and B. K. Miller. Enhancing diffusion-based sampling with molecular collective variables.arXiv preprint arXiv:25...

work page doi:10.1137/1.9781611976496.16
[15]

tex.eprint: https://www.science.org/doi/pdf/10.1126/science.aaw1147

doi: 10.1126/science.aaw1147. tex.eprint: https://www.science.org/doi/pdf/10.1126/science.aaw1147. Z. Ou, M. Zhang, A. Zhang, T. Z. Xiao, Y. Li, and D. Barber. Improving probabilistic diffusion models with optimal diagonal covariance matching,

work page doi:10.1126/science.aaw1147
[16]

URLhttps://arxiv.org/abs/2406.10808. R. OuYang, L. Grenioux, and J. M. Hernández-Lobato. A diffusive classification loss for learning energy-based generative models, 2026a. URLhttps://arxiv.org/abs/2601.21025. R. OuYang, B. Qiang, and J. M. Hernández-Lobato. Bnem: A boltzmann sampler based on bootstrapped noised energy matching, 2026b. URLhttps://arxiv.or...

work page arXiv
[17]

URLhttps://arxiv.org/abs/2506.17139. D. Rehman, T. Akhound-Sadegh, A. Gazizov, Y. Bengio, and A. Tong. FALCON: Few-step accurate likelihoods for continuous flows. InThe fourteenth international conference on learning representations,

work page arXiv
[18]

URLhttps://arxiv.org/abs/2506.05231. M. Schebek, F. Noé, and J. Rogal. Scalable boltzmann generators for equilibrium sampling of large-scale materials,

work page arXiv
[19]

URLhttps://arxiv.org/abs/2509.25486. Y. Song and P. Dhariwal. Improved techniques for training consistency models. InThe Twelfth International Conference on Learning Representations,

work page arXiv
[20]

URLhttps://arxiv.org/abs/2011.13456. Y. Song, P. Dhariwal, M. Chen, and I. Sutskever. Consistency models. InInternational Conference on Machine Learning, pages 32211–32252. PMLR,

work page internal anchor Pith review Pith/arXiv arXiv 2011
[21]

C. B. Tan, A. J. Bose, C. Lin, L. Klein, M. M. Bronstein, and A. Tong. Scalable equilibrium sampling with sequential boltzmann generators, 2026a. URLhttps://arxiv.org/abs/2502.18462. C. B. Tan, M. Hassan, L. Klein, S. Syed, D. Beaini, M. M. Bronstein, A. Tong, and K. Neklyudov. Amortized sampling with transferable normalizing flows, 2026b. URLhttps://arxi...

work page arXiv
[22]

doi: 10.1162/NECO_a_00142. Y. Xie, L. Winkler, L. Sun, S. Lewis, A. E. Foster, J. J. Luna, T. Hempel, M. Gastegger, Y. Chen, I. Zaporozhets, et al. Enhanced diffusion sampling: Efficient rare event sampling and free energy calculation with diffusion models.arXiv preprint arXiv:2602.16634,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1162/neco_a_00142
[23]

URLhttps://arxiv.org/abs/2605.26850. S. Zhai, R. Zhang, P. Nakkiran, D. Berthelot, J. Gu, H. Zheng, T. Chen, M. A. Bautista, N. Jaitly, and J. Susskind. Normalizing flows are capable generative models,

work page internal anchor Pith review Pith/arXiv arXiv
[24]

Normalizing flows are capable generative models,

URLhttps://arxiv.org/abs/2412.06329. 13 Appendix A Conditional Estimators for Flow-based Model’s Likelihood: Proofs and Connections 14 A.1 Desiderata for a Consistent Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 A.2 Derivation through Total Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 ...

work page arXiv 2025
[25]

Under Assumption 1, we denotept as the marginals for clarity

For clarity, we present the estimators and connections based on the following assumption: Assumption 1We assumev t =v∗ t, which impliespv t =p v∗ t and∇logpv t =∇logpv∗ t . Under Assumption 1, we denotept as the marginals for clarity. A.1 Desiderata for a Consistent Objective We would like to obtain an estimator fordt logpt(xt)using the marginalization tr...

2023
[26]

The two loss functions Lf(θ) =Ep(t,ξ,xt) î λ(t)∥ft(xt|ξ)−sθ(ξ,t)∥2ó ,(31) Lg(θ) =Ep(t,xt) î λ(t)∥gt(xt)−sθ(xt,t)∥2ó ,(32) have the same gradient with respect toθ

Theorem 8Consider two functions,ft(xt|ξ)and gt(xt) 1 , satisfying the property thatgt(xt) = Ept(ξ|xt) [ft(xt|ξ)]. The two loss functions Lf(θ) =Ep(t,ξ,xt) î λ(t)∥ft(xt|ξ)−sθ(ξ,t)∥2ó ,(31) Lg(θ) =Ep(t,xt) î λ(t)∥gt(xt)−sθ(xt,t)∥2ó ,(32) have the same gradient with respect toθ. The proof can be found in Yu et al. (2025). We provide a proof here for complete...

2025
[27]

: = ˙αtx1 + ˙βtx0 + ˙γtz,wherez= xt−(αtx1 +βtx0) γt ,(47) andv t(xt) =E[˜vt(xt|x0,x 1)|xt]. To estimatedt logpt(xt), one could expand it through total derivative dt logpt(xt) =∂t logpt(xt) +∇logpt(xt)·dtxt (48) =∂t logpt(xt) +∇logpt(xt)·vt(xt),(49) 16 where both∇logpt(xt)and ∂t logpt(xt)can be expressed as conditional expectations (Song et al., 2021; Yu e...

2021
[28]

+ ˙γt γt d xt ò ,(55) which can be learned efficiently without any high-order derivatives once having access tovt. A.3 Derivation through the Continuity Equation Alternatively, Equation (15) can be derived from the (log) continuity equation ∂tpt(xt) +∇·(vt(xt)pt(xt)) = 0⇔∂t logpt(xt) =−∇logpt(xt)·vt(xt)−∇·vt(xt).(56) By the instantaneous change-of-variabl...

2019
[29]

+ ˙γt γt d xt ò (73) =−E ï −˜vt(xt|x0,x 1)· Åz γt +∇logpt(xt) ã + ˙γt γt d xt ò −vt(xt)· Å 1 γt E[z|xt] +∇logpt(xt) | {z } (i) =−∇logpt(xt)+∇logpt(xt)=0 ã ,(74) where(i)is according to the Tweedie’s formula. A.6 Vectorized Estimators We first conclude that, the vectorized estimators obtained by replacing the inner-product·in Equations (70) and (71) with a...

2021
[30]

For instance, Köhler et al

that can be parameterized using more flexible model architectures. For instance, Köhler et al. (2020); Klein et al. (2023) develop equivariant CNFs tailored to molecular systems. However, this approach does not scale well: computing the log-likelihood requires integrating the divergence of the velocity field (as in Eq. 3), which becomes prohibitively expe...

2020
[31]

FALCON enables flow map to have fast yet accurate likelihood estimation and shows its effectiveness and scalability on molecular systems

proposes to make flow map invertible, by additionally imposing reconstruction losses on the jump between two timest<s . FALCON enables flow map to have fast yet accurate likelihood estimation and shows its effectiveness and scalability on molecular systems. However, FALCON requires the exact log-determinant of the Jacobian of the flow map for likelihood e...

2026
[32]

We employ an linear schedule for all generation inSCALLOPand F2D2

for ALA-(2, 3, 4, 6), respectively. We employ an linear schedule for all generation inSCALLOPand F2D2. Algorithms.We provide the pseudo code for inferences ofSCALLOPand F2D2 in Algorithm 1 below: Algorithm 1Inference forSCALLOPand F2D2 1:Input:base distributionp 0, trained likelihood flow mapfθ= [uθ;Dθ], time schedule{ti}N i=0 2:Output:Samplesx N∼pθ 1 wit...

2026
[33]

systems, following Rehman et al. (2026). Ground Truth Torsional Angles Histograms.We provide the ramachandran plots,i.e.the histograms of torsional angles, of the training data, from ALA-2 to ALA-6, in Figure

2026
[34]

Resampled Energy Histogram.We provide the energy histograms of the resampled data in Figure 6a (for SCALLOP) and Figure 6b (for F2D2)

Resampled Torsional Angles Histograms.We provide the ramachandran plots for the resampled data in Figure 4 (forSCALLOP) and Figure 5 (for F2D2). Resampled Energy Histogram.We provide the energy histograms of the resampled data in Figure 6a (for SCALLOP) and Figure 6b (for F2D2). 23 Figure 3Ramachandran plots for ALA- N systems, from ALA-2 (TOP) to ALA-6 (...

2015
[35]

LSD Model Training.For the vanilla LSD baseline, we follow the experimental setup of Boffi et al. (2026). We adopt the batch-allocation strategy used in Ai et al. (2026): during the first 200k iterations, 75% of each batch is used for the flow-matching loss and the remaining 25% is used for the self-distillation loss; during the subsequent 150k iterations...

2026

[1] [1]

URLhttps://arxiv.org/abs/2507.00846. X. Ai, Y. He, A. Gu, R. Salakhutdinov, J. Kolter, N. Boffi, and M. Simchowitz. Joint distillation for fast likelihood evaluation and sampling in flow-based models. InInternational Conference on Learning Representations,

work page arXiv

[2] [2]

URLhttps://arxiv.org/abs/2505.18825. 10 N. M. Boffi, M. S. Albergo, and E. Vanden-Eijnden. How to build a consistency model: Learning flow maps via self-distillation. InAnnual Conference on Neural Information Processing Systems,

work page arXiv

[3] [3]

URL https://arxiv.org/abs/1806.07366. K. Choi, C. Meng, Y. Song, and S. Ermon. Density ratio estimation via infinitesimal classification.arXiv e-prints,

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

URLhttps://arxiv.org/abs/2310.16624. K. Frans, D. Hafner, S. Levine, and P. Abbeel. One step diffusion via shortcut models,

work page arXiv

[5] [5]

URLhttps://arxiv.org/ abs/2410.12557. R. Gao, E. Nijkamp, D. P. Kingma, Z. Xu, A. M. Dai, and Y. N. Wu. Flow contrastive estimation of energy-based models. In2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pages 7515–7525,

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

doi: 10.1109/CVPR42600.2020.00754. Z. Geng, M. Deng, X. Bai, J. Z. Kolter, and K. He. Mean flows for one-step generative modeling,

work page doi:10.1109/cvpr42600.2020.00754 2020

[7] [7]

URLhttps: //arxiv.org/abs/2505.13447. W. Grathwohl, R. T. Q. Chen, J. Bettencourt, I. Sutskever, and D. Duvenaud. Ffjord: Free-form continuous dynamics for scalable reversible generative models,

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

URLhttps://arxiv.org/abs/1810.01367. H. Grubmüller. Predicting slow structural transitions in macromolecular systems: Conformational flooding.Physical Review E, 52(3):2893,

work page internal anchor Pith review Pith/arXiv arXiv

[9] [9]

URL https://arxiv.org/abs/2506.05310. M. U. Gutmann and A. Hyvärinen. Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics.Journal of Machine Learning Research, 13(11):307–361,

work page arXiv

[10] [10]

URLhttps://arxiv.org/abs/2506.05668. J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models,

work page internal anchor Pith review Pith/arXiv arXiv

[11] [11]

URLhttps://arxiv.org/abs/2006.11239. M. Hutchinson. A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines.Communications in Statistics - Simulation and Computation, 19(2):433–450,

work page internal anchor Pith review Pith/arXiv arXiv 2006

[12] [12]

URL https: //doi.org/10.1080/03610919008812866

doi: 10.1080/03610919008812866. URL https: //doi.org/10.1080/03610919008812866. A. Hyvärinen. Estimation of non-normalized statistical models by score matching.Journal of Machine Learning Research, 6(24):695–709,

work page doi:10.1080/03610919008812866

[13] [13]

URLhttps://proceedings.neurips.cc/paper_files/paper/2023/file/ bc827452450356f9f558f4e4568d553b-Paper-Conference.pdf. J. Köhler, L. Klein, and F. Noe. Equivariant flows: Exact likelihood generative learning for symmetric densities. In H. D. III and A. Singh, editors,Proceedings of the 37th International Conference on Machine Learning, volume 119 of Procee...

2023

[14] [14]

R. A. Meyer, C. Musco, C. Musco, and D. P. Woodruff.Hutch++: Optimal Stochastic Trace Estimation, pages 142–155. doi: 10.1137/1.9781611976496.16. J. Nam, B. Máté, A. P. Toshev, M. Kaniselvan, R. Gómez-Bombarelli, R. T. Chen, B. Wood, G.-H. Liu, and B. K. Miller. Enhancing diffusion-based sampling with molecular collective variables.arXiv preprint arXiv:25...

work page doi:10.1137/1.9781611976496.16

[15] [15]

tex.eprint: https://www.science.org/doi/pdf/10.1126/science.aaw1147

doi: 10.1126/science.aaw1147. tex.eprint: https://www.science.org/doi/pdf/10.1126/science.aaw1147. Z. Ou, M. Zhang, A. Zhang, T. Z. Xiao, Y. Li, and D. Barber. Improving probabilistic diffusion models with optimal diagonal covariance matching,

work page doi:10.1126/science.aaw1147

[16] [16]

URLhttps://arxiv.org/abs/2406.10808. R. OuYang, L. Grenioux, and J. M. Hernández-Lobato. A diffusive classification loss for learning energy-based generative models, 2026a. URLhttps://arxiv.org/abs/2601.21025. R. OuYang, B. Qiang, and J. M. Hernández-Lobato. Bnem: A boltzmann sampler based on bootstrapped noised energy matching, 2026b. URLhttps://arxiv.or...

work page arXiv

[17] [17]

URLhttps://arxiv.org/abs/2506.17139. D. Rehman, T. Akhound-Sadegh, A. Gazizov, Y. Bengio, and A. Tong. FALCON: Few-step accurate likelihoods for continuous flows. InThe fourteenth international conference on learning representations,

work page arXiv

[18] [18]

URLhttps://arxiv.org/abs/2506.05231. M. Schebek, F. Noé, and J. Rogal. Scalable boltzmann generators for equilibrium sampling of large-scale materials,

work page arXiv

[19] [19]

URLhttps://arxiv.org/abs/2509.25486. Y. Song and P. Dhariwal. Improved techniques for training consistency models. InThe Twelfth International Conference on Learning Representations,

work page arXiv

[20] [20]

URLhttps://arxiv.org/abs/2011.13456. Y. Song, P. Dhariwal, M. Chen, and I. Sutskever. Consistency models. InInternational Conference on Machine Learning, pages 32211–32252. PMLR,

work page internal anchor Pith review Pith/arXiv arXiv 2011

[21] [21]

C. B. Tan, A. J. Bose, C. Lin, L. Klein, M. M. Bronstein, and A. Tong. Scalable equilibrium sampling with sequential boltzmann generators, 2026a. URLhttps://arxiv.org/abs/2502.18462. C. B. Tan, M. Hassan, L. Klein, S. Syed, D. Beaini, M. M. Bronstein, A. Tong, and K. Neklyudov. Amortized sampling with transferable normalizing flows, 2026b. URLhttps://arxi...

work page arXiv

[22] [22]

doi: 10.1162/NECO_a_00142. Y. Xie, L. Winkler, L. Sun, S. Lewis, A. E. Foster, J. J. Luna, T. Hempel, M. Gastegger, Y. Chen, I. Zaporozhets, et al. Enhanced diffusion sampling: Efficient rare event sampling and free energy calculation with diffusion models.arXiv preprint arXiv:2602.16634,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1162/neco_a_00142

[23] [23]

URLhttps://arxiv.org/abs/2605.26850. S. Zhai, R. Zhang, P. Nakkiran, D. Berthelot, J. Gu, H. Zheng, T. Chen, M. A. Bautista, N. Jaitly, and J. Susskind. Normalizing flows are capable generative models,

work page internal anchor Pith review Pith/arXiv arXiv

[24] [24]

Normalizing flows are capable generative models,

URLhttps://arxiv.org/abs/2412.06329. 13 Appendix A Conditional Estimators for Flow-based Model’s Likelihood: Proofs and Connections 14 A.1 Desiderata for a Consistent Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 A.2 Derivation through Total Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 ...

work page arXiv 2025

[25] [25]

Under Assumption 1, we denotept as the marginals for clarity

For clarity, we present the estimators and connections based on the following assumption: Assumption 1We assumev t =v∗ t, which impliespv t =p v∗ t and∇logpv t =∇logpv∗ t . Under Assumption 1, we denotept as the marginals for clarity. A.1 Desiderata for a Consistent Objective We would like to obtain an estimator fordt logpt(xt)using the marginalization tr...

2023

[26] [26]

The two loss functions Lf(θ) =Ep(t,ξ,xt) î λ(t)∥ft(xt|ξ)−sθ(ξ,t)∥2ó ,(31) Lg(θ) =Ep(t,xt) î λ(t)∥gt(xt)−sθ(xt,t)∥2ó ,(32) have the same gradient with respect toθ

Theorem 8Consider two functions,ft(xt|ξ)and gt(xt) 1 , satisfying the property thatgt(xt) = Ept(ξ|xt) [ft(xt|ξ)]. The two loss functions Lf(θ) =Ep(t,ξ,xt) î λ(t)∥ft(xt|ξ)−sθ(ξ,t)∥2ó ,(31) Lg(θ) =Ep(t,xt) î λ(t)∥gt(xt)−sθ(xt,t)∥2ó ,(32) have the same gradient with respect toθ. The proof can be found in Yu et al. (2025). We provide a proof here for complete...

2025

[27] [27]

: = ˙αtx1 + ˙βtx0 + ˙γtz,wherez= xt−(αtx1 +βtx0) γt ,(47) andv t(xt) =E[˜vt(xt|x0,x 1)|xt]. To estimatedt logpt(xt), one could expand it through total derivative dt logpt(xt) =∂t logpt(xt) +∇logpt(xt)·dtxt (48) =∂t logpt(xt) +∇logpt(xt)·vt(xt),(49) 16 where both∇logpt(xt)and ∂t logpt(xt)can be expressed as conditional expectations (Song et al., 2021; Yu e...

2021

[28] [28]

+ ˙γt γt d xt ò ,(55) which can be learned efficiently without any high-order derivatives once having access tovt. A.3 Derivation through the Continuity Equation Alternatively, Equation (15) can be derived from the (log) continuity equation ∂tpt(xt) +∇·(vt(xt)pt(xt)) = 0⇔∂t logpt(xt) =−∇logpt(xt)·vt(xt)−∇·vt(xt).(56) By the instantaneous change-of-variabl...

2019

[29] [29]

+ ˙γt γt d xt ò (73) =−E ï −˜vt(xt|x0,x 1)· Åz γt +∇logpt(xt) ã + ˙γt γt d xt ò −vt(xt)· Å 1 γt E[z|xt] +∇logpt(xt) | {z } (i) =−∇logpt(xt)+∇logpt(xt)=0 ã ,(74) where(i)is according to the Tweedie’s formula. A.6 Vectorized Estimators We first conclude that, the vectorized estimators obtained by replacing the inner-product·in Equations (70) and (71) with a...

2021

[30] [30]

For instance, Köhler et al

that can be parameterized using more flexible model architectures. For instance, Köhler et al. (2020); Klein et al. (2023) develop equivariant CNFs tailored to molecular systems. However, this approach does not scale well: computing the log-likelihood requires integrating the divergence of the velocity field (as in Eq. 3), which becomes prohibitively expe...

2020

[31] [31]

FALCON enables flow map to have fast yet accurate likelihood estimation and shows its effectiveness and scalability on molecular systems

proposes to make flow map invertible, by additionally imposing reconstruction losses on the jump between two timest<s . FALCON enables flow map to have fast yet accurate likelihood estimation and shows its effectiveness and scalability on molecular systems. However, FALCON requires the exact log-determinant of the Jacobian of the flow map for likelihood e...

2026

[32] [32]

We employ an linear schedule for all generation inSCALLOPand F2D2

for ALA-(2, 3, 4, 6), respectively. We employ an linear schedule for all generation inSCALLOPand F2D2. Algorithms.We provide the pseudo code for inferences ofSCALLOPand F2D2 in Algorithm 1 below: Algorithm 1Inference forSCALLOPand F2D2 1:Input:base distributionp 0, trained likelihood flow mapfθ= [uθ;Dθ], time schedule{ti}N i=0 2:Output:Samplesx N∼pθ 1 wit...

2026

[33] [33]

systems, following Rehman et al. (2026). Ground Truth Torsional Angles Histograms.We provide the ramachandran plots,i.e.the histograms of torsional angles, of the training data, from ALA-2 to ALA-6, in Figure

2026

[34] [34]

Resampled Energy Histogram.We provide the energy histograms of the resampled data in Figure 6a (for SCALLOP) and Figure 6b (for F2D2)

Resampled Torsional Angles Histograms.We provide the ramachandran plots for the resampled data in Figure 4 (forSCALLOP) and Figure 5 (for F2D2). Resampled Energy Histogram.We provide the energy histograms of the resampled data in Figure 6a (for SCALLOP) and Figure 6b (for F2D2). 23 Figure 3Ramachandran plots for ALA- N systems, from ALA-2 (TOP) to ALA-6 (...

2015

[35] [35]

LSD Model Training.For the vanilla LSD baseline, we follow the experimental setup of Boffi et al. (2026). We adopt the batch-allocation strategy used in Ai et al. (2026): during the first 200k iterations, 75% of each batch is used for the flow-matching loss and the remaining 25% is used for the self-distillation loss; during the subsequent 150k iterations...

2026