pith. sign in

arxiv: 2606.29110 · v1 · pith:264PDCGVnew · submitted 2026-06-27 · 💻 cs.LG · stat.ML

Few-Step Boltzmann Generators via Scalable Likelihood Flow Maps

Pith reviewed 2026-06-30 09:14 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords flow-based generative modelslikelihood estimationBoltzmann generatorslikelihood distillationfew-step inferencevariance reductionmolecular simulation
0
0 comments X

The pith

SCALLOP replaces Hutchinson's estimator with a vectorized distillation objective to train accurate few-step likelihood flow maps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Flow-based models generate high-quality samples in few steps but have struggled to estimate likelihoods without high-variance stochastic methods or restrictive architectures. SCALLOP extends the F2D2 likelihood flow map approach by introducing a Hutchinson-free distillation objective that supports vectorized computation. This change lowers training variance and time while raising performance on Boltzmann generators for molecules and on image datasets. The resulting models remain competitive with existing methods yet deliver up to tenfold faster inference than the quickest prior baselines. Readers interested in practical density estimation for simulation tasks would see value in a method that removes a major source of instability without sacrificing speed.

Core claim

SCALLOP builds on F2D2 by replacing Hutchinson's trace estimator with an alternative likelihood distillation objective that is Hutchinson-free and admits a vectorized formulation, thereby enabling the training of flow map models that output both samples and densities in a small number of function evaluations while reducing variance and training time.

What carries the argument

The Hutchinson-free likelihood distillation objective, which replaces stochastic trace estimation with a scalable vectorized alternative inside the F2D2 likelihood flow map framework.

If this is right

  • Training variance drops substantially relative to F2D2 while training completes in less wall-clock time.
  • Performance on molecular Boltzmann generation and image modeling improves consistently over the F2D2 baseline.
  • Inference speed reaches up to ten times that of the fastest competing method while remaining competitive with state-of-the-art accuracy.
  • Both sample generation and density estimation become feasible inside the same few-step flow map architecture.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The vectorized objective may allow larger batch sizes or deeper networks on hardware that supports efficient matrix operations, extending the method to higher-dimensional systems.
  • If the same distillation approach transfers to other flow architectures, it could relax the need for exact-trace-friendly designs across generative modeling.
  • Lower training variance might reduce the number of random seeds required for reliable hyperparameter search in related few-step models.

Load-bearing premise

The alternative Hutchinson-free likelihood distillation objective accurately captures the true model likelihood and does not introduce new biases or failure modes that would offset the reported gains in stability and speed.

What would settle it

A side-by-side numerical comparison, on a low-dimensional toy distribution where exact likelihoods can be computed analytically, between the likelihood values reported by a trained SCALLOP model and the ground-truth densities would show systematic over- or under-estimation if the objective fails to match true likelihoods.

read the original abstract

Recent progress in flow-based generative modeling has led to models that output high-quality samples while using only a small number of function evaluations. However, at present, there is a lack of similar advances in estimating the model likelihood. In particular, most existing methods either rely on restrictive architectures that enable exact calculations, or use stochastic approximations such as Hutchinson's trace estimator that introduce substantial variance. In this work, we introduce SCAlable LikeLihood distillation of flOw maPs (SCALLOP). SCALLOP builds on the recently proposed F2D2, a likelihood flow map model that can generate samples and their densities in a small number of function evaluations. While F2D2 uses Hutchinson's estimator during training, we introduce an alternative and more scalable likelihood distillation objective that is Hutchinson-free and admits a vectorized formulation. Empirically, we demonstrate the effectiveness of SCALLOP as a Boltzmann generator in molecular science, and further validate its benefit on image datasets. SCALLOP significantly reduces both training variance and training time while consistently improving performance compared to F2D2, and is competitive with the state-of-the-art while achieving up to 10x inference speedup over the fastest baseline.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces SCALLOP, an extension of the F2D2 likelihood flow map model for few-step Boltzmann generators. It replaces Hutchinson's trace estimator with a new Hutchinson-free, vectorized likelihood distillation objective during training, and reports that this yields lower training variance, shorter training time, improved performance over F2D2, competitiveness with state-of-the-art methods, and up to 10x inference speedup on molecular and image datasets.

Significance. If the new objective is a faithful, unbiased replacement for the true model likelihood, the work would offer a practical advance in scalable training of flow-based models that simultaneously support fast sampling and density estimation, with direct relevance to molecular simulation where both capabilities are required.

major comments (2)
  1. [§3 (methods) and abstract] The central performance claims (reduced variance, improved accuracy, 10x speedup) rest on the new likelihood distillation objective being an unbiased equivalent to the true model likelihood. The manuscript must supply an explicit derivation or proof (likely in §3 or the methods section) showing that the vectorized formulation does not introduce systematic bias from implicit assumptions on flow-map invertibility or the target density in high-dimensional Boltzmann settings; absent this, the reported gains cannot be taken as evidence that the objective is a faithful replacement.
  2. [Experiments section / results tables] Table or figure reporting the main experimental results (molecular Boltzmann generation and image validation) must include error bars, ablation studies on the distillation objective, and full experimental protocol details; without them the claims of consistent improvement over F2D2 and competitiveness with SOTA cannot be evaluated for statistical reliability.
minor comments (1)
  1. Notation for the flow map and the distillation loss should be introduced with explicit definitions before first use to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment point by point below and will incorporate revisions to improve the manuscript's rigor and clarity.

read point-by-point responses
  1. Referee: [§3 (methods) and abstract] The central performance claims (reduced variance, improved accuracy, 10x speedup) rest on the new likelihood distillation objective being an unbiased equivalent to the true model likelihood. The manuscript must supply an explicit derivation or proof (likely in §3 or the methods section) showing that the vectorized formulation does not introduce systematic bias from implicit assumptions on flow-map invertibility or the target density in high-dimensional Boltzmann settings; absent this, the reported gains cannot be taken as evidence that the objective is a faithful replacement.

    Authors: We agree that an explicit derivation strengthens the claims. In the revised manuscript we will add a self-contained derivation in §3 establishing that the vectorized likelihood distillation objective is an unbiased estimator of the model log-likelihood. The derivation relies only on the change-of-variables formula for the flow map and the definition of the distillation target; it does not invoke additional invertibility assumptions beyond those already stated for F2D2 nor any special properties of the target density. revision: yes

  2. Referee: [Experiments section / results tables] Table or figure reporting the main experimental results (molecular Boltzmann generation and image validation) must include error bars, ablation studies on the distillation objective, and full experimental protocol details; without them the claims of consistent improvement over F2D2 and competitiveness with SOTA cannot be evaluated for statistical reliability.

    Authors: We acknowledge that the current presentation lacks these elements. The revised manuscript will include error bars (computed over multiple random seeds) on all quantitative results, dedicated ablation studies isolating the effect of the distillation objective, and a complete experimental protocol (hyperparameters, data splits, hardware, and evaluation metrics) placed in the main text or a clearly referenced appendix. revision: yes

Circularity Check

0 steps flagged

Minor self-citation to F2D2 with no load-bearing reduction in central claims

full rationale

The paper introduces SCALLOP as a new Hutchinson-free likelihood distillation objective that builds on but is presented as an alternative to F2D2. No equations or steps in the abstract reduce the new objective or performance claims to quantities fitted inside the same model by construction. The central claims rest on empirical validation rather than self-referential definitions or predictions. A single self-citation to prior F2D2 work is present but does not make the reported gains equivalent to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; the central claim rests on the assumption that the new distillation objective is both correct and practically superior, but no free parameters, domain axioms, or invented entities are enumerated in the provided text.

axioms (1)
  • domain assumption Flow-based models admit invertible mappings whose Jacobian determinant can be used for exact likelihoods under suitable architectural constraints.
    Implicit background assumption of all flow-based generative modeling referenced in the abstract.

pith-pipeline@v0.9.1-grok · 5781 in / 1195 out tokens · 50974 ms · 2026-06-30T09:14:57.273120+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

35 extracted references · 23 canonical work pages · 9 internal anchors

  1. [1]

    URLhttps://arxiv.org/abs/2507.00846. X. Ai, Y. He, A. Gu, R. Salakhutdinov, J. Kolter, N. Boffi, and M. Simchowitz. Joint distillation for fast likelihood evaluation and sampling in flow-based models. InInternational Conference on Learning Representations,

  2. [2]

    URLhttps://arxiv.org/abs/2505.18825. 10 N. M. Boffi, M. S. Albergo, and E. Vanden-Eijnden. How to build a consistency model: Learning flow maps via self-distillation. InAnnual Conference on Neural Information Processing Systems,

  3. [3]

    URL https://arxiv.org/abs/1806.07366. K. Choi, C. Meng, Y. Song, and S. Ermon. Density ratio estimation via infinitesimal classification.arXiv e-prints,

  4. [4]

    URLhttps://arxiv.org/abs/2310.16624. K. Frans, D. Hafner, S. Levine, and P. Abbeel. One step diffusion via shortcut models,

  5. [5]

    URLhttps://arxiv.org/ abs/2410.12557. R. Gao, E. Nijkamp, D. P. Kingma, Z. Xu, A. M. Dai, and Y. N. Wu. Flow contrastive estimation of energy-based models. In2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pages 7515–7525,

  6. [6]

    doi: 10.1109/CVPR42600.2020.00754. Z. Geng, M. Deng, X. Bai, J. Z. Kolter, and K. He. Mean flows for one-step generative modeling,

  7. [7]

    URLhttps: //arxiv.org/abs/2505.13447. W. Grathwohl, R. T. Q. Chen, J. Bettencourt, I. Sutskever, and D. Duvenaud. Ffjord: Free-form continuous dynamics for scalable reversible generative models,

  8. [8]

    URLhttps://arxiv.org/abs/1810.01367. H. Grubmüller. Predicting slow structural transitions in macromolecular systems: Conformational flooding.Physical Review E, 52(3):2893,

  9. [9]

    URL https://arxiv.org/abs/2506.05310. M. U. Gutmann and A. Hyvärinen. Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics.Journal of Machine Learning Research, 13(11):307–361,

  10. [10]

    URLhttps://arxiv.org/abs/2506.05668. J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models,

  11. [11]

    URLhttps://arxiv.org/abs/2006.11239. M. Hutchinson. A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines.Communications in Statistics - Simulation and Computation, 19(2):433–450,

  12. [12]

    URL https: //doi.org/10.1080/03610919008812866

    doi: 10.1080/03610919008812866. URL https: //doi.org/10.1080/03610919008812866. A. Hyvärinen. Estimation of non-normalized statistical models by score matching.Journal of Machine Learning Research, 6(24):695–709,

  13. [13]

    URLhttps://proceedings.neurips.cc/paper_files/paper/2023/file/ bc827452450356f9f558f4e4568d553b-Paper-Conference.pdf. J. Köhler, L. Klein, and F. Noe. Equivariant flows: Exact likelihood generative learning for symmetric densities. In H. D. III and A. Singh, editors,Proceedings of the 37th International Conference on Machine Learning, volume 119 of Procee...

  14. [14]

    R. A. Meyer, C. Musco, C. Musco, and D. P. Woodruff.Hutch++: Optimal Stochastic Trace Estimation, pages 142–155. doi: 10.1137/1.9781611976496.16. J. Nam, B. Máté, A. P. Toshev, M. Kaniselvan, R. Gómez-Bombarelli, R. T. Chen, B. Wood, G.-H. Liu, and B. K. Miller. Enhancing diffusion-based sampling with molecular collective variables.arXiv preprint arXiv:25...

  15. [15]

    tex.eprint: https://www.science.org/doi/pdf/10.1126/science.aaw1147

    doi: 10.1126/science.aaw1147. tex.eprint: https://www.science.org/doi/pdf/10.1126/science.aaw1147. Z. Ou, M. Zhang, A. Zhang, T. Z. Xiao, Y. Li, and D. Barber. Improving probabilistic diffusion models with optimal diagonal covariance matching,

  16. [16]

    URLhttps://arxiv.org/abs/2406.10808. R. OuYang, L. Grenioux, and J. M. Hernández-Lobato. A diffusive classification loss for learning energy-based generative models, 2026a. URLhttps://arxiv.org/abs/2601.21025. R. OuYang, B. Qiang, and J. M. Hernández-Lobato. Bnem: A boltzmann sampler based on bootstrapped noised energy matching, 2026b. URLhttps://arxiv.or...

  17. [17]

    URLhttps://arxiv.org/abs/2506.17139. D. Rehman, T. Akhound-Sadegh, A. Gazizov, Y. Bengio, and A. Tong. FALCON: Few-step accurate likelihoods for continuous flows. InThe fourteenth international conference on learning representations,

  18. [18]

    URLhttps://arxiv.org/abs/2506.05231. M. Schebek, F. Noé, and J. Rogal. Scalable boltzmann generators for equilibrium sampling of large-scale materials,

  19. [19]

    URLhttps://arxiv.org/abs/2509.25486. Y. Song and P. Dhariwal. Improved techniques for training consistency models. InThe Twelfth International Conference on Learning Representations,

  20. [20]

    URLhttps://arxiv.org/abs/2011.13456. Y. Song, P. Dhariwal, M. Chen, and I. Sutskever. Consistency models. InInternational Conference on Machine Learning, pages 32211–32252. PMLR,

  21. [21]

    C. B. Tan, A. J. Bose, C. Lin, L. Klein, M. M. Bronstein, and A. Tong. Scalable equilibrium sampling with sequential boltzmann generators, 2026a. URLhttps://arxiv.org/abs/2502.18462. C. B. Tan, M. Hassan, L. Klein, S. Syed, D. Beaini, M. M. Bronstein, A. Tong, and K. Neklyudov. Amortized sampling with transferable normalizing flows, 2026b. URLhttps://arxi...

  22. [22]

    doi: 10.1162/NECO_a_00142. Y. Xie, L. Winkler, L. Sun, S. Lewis, A. E. Foster, J. J. Luna, T. Hempel, M. Gastegger, Y. Chen, I. Zaporozhets, et al. Enhanced diffusion sampling: Efficient rare event sampling and free energy calculation with diffusion models.arXiv preprint arXiv:2602.16634,

  23. [23]

    URLhttps://arxiv.org/abs/2605.26850. S. Zhai, R. Zhang, P. Nakkiran, D. Berthelot, J. Gu, H. Zheng, T. Chen, M. A. Bautista, N. Jaitly, and J. Susskind. Normalizing flows are capable generative models,

  24. [24]

    Normalizing flows are capable generative models,

    URLhttps://arxiv.org/abs/2412.06329. 13 Appendix A Conditional Estimators for Flow-based Model’s Likelihood: Proofs and Connections 14 A.1 Desiderata for a Consistent Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 A.2 Derivation through Total Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 ...

  25. [25]

    Under Assumption 1, we denotept as the marginals for clarity

    For clarity, we present the estimators and connections based on the following assumption: Assumption 1We assumev t =v∗ t, which impliespv t =p v∗ t and∇logpv t =∇logpv∗ t . Under Assumption 1, we denotept as the marginals for clarity. A.1 Desiderata for a Consistent Objective We would like to obtain an estimator fordt logpt(xt)using the marginalization tr...

  26. [26]

    The two loss functions Lf(θ) =Ep(t,ξ,xt) î λ(t)∥ft(xt|ξ)−sθ(ξ,t)∥2ó ,(31) Lg(θ) =Ep(t,xt) î λ(t)∥gt(xt)−sθ(xt,t)∥2ó ,(32) have the same gradient with respect toθ

    Theorem 8Consider two functions,ft(xt|ξ)and gt(xt) 1 , satisfying the property thatgt(xt) = Ept(ξ|xt) [ft(xt|ξ)]. The two loss functions Lf(θ) =Ep(t,ξ,xt) î λ(t)∥ft(xt|ξ)−sθ(ξ,t)∥2ó ,(31) Lg(θ) =Ep(t,xt) î λ(t)∥gt(xt)−sθ(xt,t)∥2ó ,(32) have the same gradient with respect toθ. The proof can be found in Yu et al. (2025). We provide a proof here for complete...

  27. [27]

    : = ˙αtx1 + ˙βtx0 + ˙γtz,wherez= xt−(αtx1 +βtx0) γt ,(47) andv t(xt) =E[˜vt(xt|x0,x 1)|xt]. To estimatedt logpt(xt), one could expand it through total derivative dt logpt(xt) =∂t logpt(xt) +∇logpt(xt)·dtxt (48) =∂t logpt(xt) +∇logpt(xt)·vt(xt),(49) 16 where both∇logpt(xt)and ∂t logpt(xt)can be expressed as conditional expectations (Song et al., 2021; Yu e...

  28. [28]

    + ˙γt γt d xt ò ,(55) which can be learned efficiently without any high-order derivatives once having access tovt. A.3 Derivation through the Continuity Equation Alternatively, Equation (15) can be derived from the (log) continuity equation ∂tpt(xt) +∇·(vt(xt)pt(xt)) = 0⇔∂t logpt(xt) =−∇logpt(xt)·vt(xt)−∇·vt(xt).(56) By the instantaneous change-of-variabl...

  29. [29]

    + ˙γt γt d xt ò (73) =−E ï −˜vt(xt|x0,x 1)· Åz γt +∇logpt(xt) ã + ˙γt γt d xt ò −vt(xt)· Å 1 γt E[z|xt] +∇logpt(xt) | {z } (i) =−∇logpt(xt)+∇logpt(xt)=0 ã ,(74) where(i)is according to the Tweedie’s formula. A.6 Vectorized Estimators We first conclude that, the vectorized estimators obtained by replacing the inner-product·in Equations (70) and (71) with a...

  30. [30]

    For instance, Köhler et al

    that can be parameterized using more flexible model architectures. For instance, Köhler et al. (2020); Klein et al. (2023) develop equivariant CNFs tailored to molecular systems. However, this approach does not scale well: computing the log-likelihood requires integrating the divergence of the velocity field (as in Eq. 3), which becomes prohibitively expe...

  31. [31]

    FALCON enables flow map to have fast yet accurate likelihood estimation and shows its effectiveness and scalability on molecular systems

    proposes to make flow map invertible, by additionally imposing reconstruction losses on the jump between two timest<s . FALCON enables flow map to have fast yet accurate likelihood estimation and shows its effectiveness and scalability on molecular systems. However, FALCON requires the exact log-determinant of the Jacobian of the flow map for likelihood e...

  32. [32]

    We employ an linear schedule for all generation inSCALLOPand F2D2

    for ALA-(2, 3, 4, 6), respectively. We employ an linear schedule for all generation inSCALLOPand F2D2. Algorithms.We provide the pseudo code for inferences ofSCALLOPand F2D2 in Algorithm 1 below: Algorithm 1Inference forSCALLOPand F2D2 1:Input:base distributionp 0, trained likelihood flow mapfθ= [uθ;Dθ], time schedule{ti}N i=0 2:Output:Samplesx N∼pθ 1 wit...

  33. [33]

    systems, following Rehman et al. (2026). Ground Truth Torsional Angles Histograms.We provide the ramachandran plots,i.e.the histograms of torsional angles, of the training data, from ALA-2 to ALA-6, in Figure

  34. [34]

    Resampled Energy Histogram.We provide the energy histograms of the resampled data in Figure 6a (for SCALLOP) and Figure 6b (for F2D2)

    Resampled Torsional Angles Histograms.We provide the ramachandran plots for the resampled data in Figure 4 (forSCALLOP) and Figure 5 (for F2D2). Resampled Energy Histogram.We provide the energy histograms of the resampled data in Figure 6a (for SCALLOP) and Figure 6b (for F2D2). 23 Figure 3Ramachandran plots for ALA- N systems, from ALA-2 (TOP) to ALA-6 (...

  35. [35]

    LSD Model Training.For the vanilla LSD baseline, we follow the experimental setup of Boffi et al. (2026). We adopt the batch-allocation strategy used in Ai et al. (2026): during the first 200k iterations, 75% of each batch is used for the flow-matching loss and the remaining 25% is used for the self-distillation loss; during the subsequent 150k iterations...