pith. sign in

arxiv: 2606.08375 · v2 · pith:SGK4QWJ6new · submitted 2026-06-07 · 💻 cs.LG

Few-step Cofolding with All-Atom Flow Maps

Pith reviewed 2026-06-27 18:37 UTC · model grok-4.3

classification 💻 cs.LG
keywords all-atom cofoldingflow mapsdiffusion distillationfew-step samplingprotein-ligand complexesSE(3) alignmentEDM noise schedulereward-guided search
0
0 comments X

The pith

DeCAF distills all-atom cofolding diffusion models into flow maps that match teacher accuracy with 5x fewer inference steps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DeCAF to convert expensive iterative diffusion rollouts for all-atom protein and protein-ligand structures into fast flow-map samplers. It uses a denoiser-based flow formulation with endpoint losses that support SE(3) rigid alignment and a change of variables to match EDM noise schedules, allowing direct distillation from pretrained teachers. Experiments show DeCAF-Pearl matches its Pearl teacher on success rate at one-fifth the NFEs, while DeCAF-Boltz improves RMSD and physical validity over Boltz-1x at strict low-NFE budgets on Runs N' Poses and PoseBusters. The work also adds a reward-guided search layer that further boosts sampling quality under the flow-map lookahead. The central goal is to make high-fidelity all-atom cofolding practical for both deployment and inference-time search without sacrificing sample quality.

Core claim

DeCAF-Boltz statistically improves over Boltz-1x in RMSD and physical validity at strict NFE budgets, and DeCAF-Pearl matches its teacher on success rate while using 5x fewer NFEs.

What carries the argument

Denoiser-based flow map with endpoint losses that support SE(3) rigid alignment, plus a change of variables to operate in sigma-space noise schedules of EDM-style architectures.

If this is right

  • All-atom cofolding models can be deployed at inference budgets previously limited to coarse-grained or single-structure predictors.
  • Reward-guided search becomes feasible inside the flow-map lookahead without the cost of full diffusion trajectories.
  • Direct distillation from any EDM-style pretrained cofolding checkpoint is possible without retraining the teacher.
  • Pareto frontiers for accuracy versus compute shift leftward across the full range of NFE budgets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same denoiser-flow recipe could be applied to other diffusion-based structure generators such as small-molecule conformer models or RNA folding.
  • If SE(3) alignment proves critical, future flow-map work on rigid bodies may adopt the same endpoint-loss pattern by default.
  • Low-NFE sampling opens the door to coupling cofolding with online experimental feedback loops that were previously too slow.

Load-bearing premise

Endpoint losses plus SE(3) rigid alignment during distillation are enough to keep the distilled flow map as accurate as the original diffusion teacher.

What would settle it

A head-to-head comparison at 1-4 NFEs on PoseBusters or Runs N' Poses where DeCAF-Pearl success rate falls below the Pearl teacher or DeCAF-Boltz RMSD exceeds Boltz-1x.

Figures

Figures reproduced from arXiv: 2606.08375 by Avishek Joey Bose, Gianluca Scarpellini, Juno Nam, Maruan Al-Shedivat, Nicholas Matthew Boffi, Peter Holderrieth, Pranav Murugan, Rafael G\'omez-Bombarelli, Ron Shprints, Tommi Jaakkola.

Figure 1
Figure 1. Figure 1: DECAF accelerates all-atom biomolecular structure prediction with a few-step flow-map lookahead across EDM noise levels. Atom-resolution guidance enables candidate search toward high-reward configurations. works [Boitreaud et al., 2024, Wohlwend et al., 2025, Passaro et al., 2025, Team et al., 2026, Gen￾esis Research, 2025]. This perspective is well-suited to learning the global geometry of the target dis￾… view at source ↗
Figure 2
Figure 2. Figure 2: Success rate vs. training-set simi￾larity on the RnP benchmark at 40 NFE [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Mean RMSD per structure for DECAF-SEARCH (150 NFEs) (x) vs. Boltz-1x (800 NFEs) (y) on RnP. (Q1.) Low NFE regime. Does DECAF outperform Boltz-1 with a limited inference budget (§4.1)? (Q2.) Analysis of compute-optimal frontier. What is the Pareto frontier of DECAF-SEARCH against Boltz-1x for inference time scaling across any inference compute budget (§4.2)? (Q3.) Performance analysis. What are the relative… view at source ↗
Figure 4
Figure 4. Figure 4: DECAF-Boltz is cost-effective as we increase compute. A comparison of DECAF-SEARCH and Boltz-1x as a function of NFE budget. The solid lines are the per-NFE compute optimal frontier for each method. Boltz-1x across every NFE budget on all metrics. We further find that at different inference budgets, the exact Pareto-optimal recipe for DECAF-SEARCH varies. Specifically, at low NFEs (≤ 30), particle-based SM… view at source ↗
Figure 5
Figure 5. Figure 5: Each panel overlays the ground-truth crystal ligand against DECAF-SEARCH and Boltz-1x (first row) and Pearl (second row) samples at the specified NFE. Row 1 compares DECAF-SEARCH-Boltz against Boltz-1x and Row 2 compares DECAF-SEARCH-Pearl against Pearl. The protein pocket is shown as a light-gray cartoon, and predicted ligand atoms that clash with the protein in red. At the bottom of each panel, we report… view at source ↗
Figure 6
Figure 6. Figure 6: RMSD< 2 Å best@5 on PoseBusters. DECAF (FlowMap sampler Alg. 1, γ=0) vs Boltz-1 (ODE) at matched sampling steps; dashed line marks Boltz-1 at 200 steps [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Example failure modes due to PoseBusters checks DECAF predictions blue, Boltz-1 gray. Pocket cartoon (gray) is the residues within 6 Å of the crystal ligand (green). 7FZS | drug-like 0.45 ˚A · PB-valid DeCAF FK (NFE = 20) 0.34 ˚A · PB-valid DeCAF FK (NFE = 40) 0.38 ˚A · PB-valid DeCAF MCTS (NFE = 800) 0.56 ˚A · PB-valid Boltz-1x (NFE = 40) 0.47 ˚A · PB-valid Boltz-1x (NFE = 800) 8CNO | ligand-at-PPI 1.69 ˚… view at source ↗
Figure 8
Figure 8. Figure 8: Qualitative grid for chemically-relevant subsets of Runs N’ Poses. DECAF predictions blue, Boltz-1 gray. Pocket cartoon (gray) is the residues within 6 Å of the crystal ligand (green). 20 [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Multi-method qualitative grid for PoseBusters complexes. DECAF predictions blue, Boltz-1 gray. Pocket cartoon (gray) is the residues within 6 Å of the crystal ligand (green). 21 [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗
read the original abstract

All-atom generative modeling of 3D biomolecular complexes has emerged as the dominant paradigm for predicting the structure of proteins and protein-ligand systems. Generating structures at the atomic level of fidelity, however, typically requires expensive iterative diffusion rollouts, making both conventional deployment and inference-time search techniques computationally costly. In this paper, we introduce the Denoiser Cofolding All-Atom Flowmap (DeCAF) framework for distilling state-of-the-art all-atom cofolding models into all-atom flow maps that produce high-quality samples in only a few inference steps. We build DeCAF on a denoiser-based formulation of flow maps with endpoint losses that naturally support SE(3) rigid alignment, which we show is critical for training accurate models. We further derive a simple change of variables that lets DeCAF operate in the {\sigma}-space noise schedule of EDM-style architectures, enabling direct distillation from pretrained cofolding diffusion models. Equipped with DeCAF's flowmap lookahead, we introduce a purpose-built inference-time framework that improves sampling through reward-guided search. Empirically, DeCAF-Boltz statistically improves over Boltz-1x in both accuracy (RMSD) and physical validity scores of protein-ligand poses at strict NFE budgets on the challenging Runs N' Poses, while also showing a more optimal Pareto frontier across all inference compute budgets on PoseBusters. Distilling the state-of-the-art Pearl cofolding model, DeCAF-Pearl outperforms diffusion-based cofolding models and matches its teacher on success rate while using 5x fewer NFEs. We release our code at https://github.com/genesistherapeutics/decaf.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The manuscript introduces the DeCAF framework for distilling pretrained all-atom cofolding diffusion models (e.g., Pearl, Boltz-1x) into few-step flow-map generators. It uses a denoiser-based flow-map formulation with endpoint losses that support SE(3) rigid alignment, derives a change-of-variables to EDM-style σ-space for direct distillation, and adds a reward-guided inference-time search procedure. Central empirical claims are that DeCAF-Pearl matches its teacher's success rate at 5× fewer NFEs while DeCAF-Boltz improves RMSD and physical validity over Boltz-1x at strict NFE budgets on Runs N' Poses and shows a better Pareto frontier on PoseBusters; code is released at https://github.com/genesistherapeutics/decaf.

Significance. If the distillation preserves the teacher's distribution without material degradation, the work would meaningfully reduce the inference cost of all-atom generative cofolding, enabling faster deployment and search. The explicit release of code is a positive contribution to reproducibility. The technical adaptation of flow maps to SE(3)-equivariant all-atom settings is a natural extension of existing flow-matching literature, but the significance hinges on whether the reported gains are robust.

major comments (3)
  1. [Abstract, §4] Abstract and §4 (Experiments): the headline claims of statistical improvement in RMSD/validity and exact matching of teacher success rate are presented without reported dataset splits, number of independent runs, p-values, or ablation controls on the distillation procedure; this information is load-bearing for assessing whether the gains are reliable or subject to post-hoc selection.
  2. [§3.1–3.2] §3.1–3.2 (Denoiser-based flow map and endpoint losses): the claim that endpoint losses plus SE(3) rigid alignment suffice to transfer the teacher's marginals is central to the matching-performance result, yet the change-of-variables is only a reparameterization and endpoint losses do not explicitly constrain intermediate trajectory marginals; no verification (e.g., KL divergence or marginal matching plots) is shown that the learned vector field reproduces the diffusion teacher's distribution at non-endpoint times.
  3. [§4.3] §4.3 (Reward-guided search): the inference-time framework is presented as improving sampling, but the manuscript does not report an ablation isolating the contribution of the flow-map lookahead versus the reward model itself, which is required to substantiate that the few-step generator enables the reported search gains.
minor comments (3)
  1. [§3.1] Notation for the σ-space change of variables could be clarified with an explicit equation relating the flow-map velocity to the EDM denoiser output.
  2. [Figure 3] Figure captions for the Pareto curves should state the exact NFE values used and whether error bars reflect standard error over multiple seeds.
  3. [§2] The manuscript cites prior flow-matching and cofolding works but omits discussion of recent few-step diffusion distillation methods outside biomolecular modeling; a brief comparison paragraph would help situate the contribution.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the presentation of our results. We address each major point below and indicate revisions to be incorporated in the next version of the manuscript.

read point-by-point responses
  1. Referee: [Abstract, §4] Abstract and §4 (Experiments): the headline claims of statistical improvement in RMSD/validity and exact matching of teacher success rate are presented without reported dataset splits, number of independent runs, p-values, or ablation controls on the distillation procedure; this information is load-bearing for assessing whether the gains are reliable or subject to post-hoc selection.

    Authors: We agree that these details are important for assessing reliability. In the revised manuscript we will explicitly state the dataset splits, report means and standard deviations over multiple independent runs, include p-values for the reported comparisons, and add ablation controls on key distillation hyperparameters. revision: yes

  2. Referee: [§3.1–3.2] §3.1–3.2 (Denoiser-based flow map and endpoint losses): the claim that endpoint losses plus SE(3) rigid alignment suffice to transfer the teacher's marginals is central to the matching-performance result, yet the change-of-variables is only a reparameterization and endpoint losses do not explicitly constrain intermediate trajectory marginals; no verification (e.g., KL divergence or marginal matching plots) is shown that the learned vector field reproduces the diffusion teacher's distribution at non-endpoint times.

    Authors: Endpoint losses with SE(3) alignment are intended to match the teacher's terminal marginals, while the flow-matching objective enforces trajectory consistency by construction. The empirical success-rate parity with the teacher at 5× fewer steps provides indirect evidence of distribution preservation. We will add a clarifying discussion of this point and, where space allows, supplementary marginal-consistency diagnostics at intermediate times. revision: partial

  3. Referee: [§4.3] §4.3 (Reward-guided search): the inference-time framework is presented as improving sampling, but the manuscript does not report an ablation isolating the contribution of the flow-map lookahead versus the reward model itself, which is required to substantiate that the few-step generator enables the reported search gains.

    Authors: We agree that isolating the lookahead contribution would strengthen the claim. In the revised manuscript we will include an ablation that compares the full reward-guided search against (i) the reward model paired with standard few-step sampling and (ii) the flow map without the lookahead component. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical distillation validated against external teachers

full rationale

The paper frames DeCAF as a distillation procedure from pretrained external cofolding diffusion models (Pearl, Boltz) into flow maps, with performance measured by direct comparison on held-out benchmarks (Runs N' Poses, PoseBusters) at fixed NFE budgets. The change-of-variables to EDM σ-space and endpoint-loss + SE(3) alignment are presented as enabling reparameterization and training choices; success is shown empirically rather than derived by construction. No equations reduce reported RMSD/success-rate metrics to quantities defined inside the paper, and no load-bearing self-citations or uniqueness theorems are invoked. The work is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No full text available; cannot enumerate free parameters, axioms, or invented entities. Abstract implies standard assumptions of diffusion model distillation but provides no explicit ledger.

pith-pipeline@v0.9.1-grok · 5875 in / 1054 out tokens · 15707 ms · 2026-06-27T18:37:52.582346+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

22 extracted references · 3 canonical work pages

  1. [1]

    doi: 10.1038/nchem.1243

    ISSN 1755-4349. doi: 10.1038/nchem.1243. URLhttp://dx.doi.org/10.1038/nchem.1243. N. M. Boffi, M. S. Albergo, and E. Vanden-Eijnden. Flow map matching with stochastic interpolants: A mathematical framework for consistency models.Transactions on Machine Learning Research, 2025a. ISSN 2835-8856. N. M. Boffi, M. S. Albergo, and E. Vanden-Eijnden. How to buil...

  2. [2]

    URL https://www.biorxiv.org/content/10.1101/2024.10.10.615955v1

    doi: 10.1101/2024.10.10.615955. URL https://www.biorxiv.org/content/10.1101/2024.10.10.615955v1. A. J. Bose, T. Akhound-Sadegh, G. Huguet, K. Fatras, J. Rector-Brooks, C.-H. Liu, A. C. Nica, M. Korablyov, M. Bronstein, and A. Tong. Se (3)-stochastic flow matching for protein backbone generation.arXiv preprint arXiv:2310.02391,

  3. [3]

    doi: 10.1039/D3SC04185A. L. Cao, I. Goreshnik, B. Coventry, J. B. Case, L. Miller, L. Kozodoy, R. E. Chen, L. Carter, A. C. Walls, Y .-J. Park, E.-M. Strauch, L. Stewart, M. S. Diamond, D. Veesler, and D. Baker. De novo design of picomolar sars-cov-2 miniprotein inhibitors.Science, 370(6515):426–431,

  4. [4]

    K. Didi, Z. Zhang, G. Zhou, D. Reidenbach, Z. Cao, S. Cha, T. Geffner, C. Dallago, J. Tang, M. M. Bronstein, et al. Scaling atomistic protein binder design with generative pretraining and test-time compute.arXiv preprint arXiv:2603.27950,

  5. [5]

    Geffner, K

    T. Geffner, K. Didi, Z. Cao, D. Reidenbach, Z. Zhang, C. Dallago, E. Kucukbenli, K. Kreis, and A. Vahdat. La-proteina: Atomistic protein generation via partially latent flow matching.arXiv preprint arXiv:2507.09466, 2025a. T. Geffner, K. Didi, Z. Zhang, D. Reidenbach, Z. Cao, J. Yim, M. Geiger, C. Dallago, E. Kucukbenli, A. Vahdat, et al. Proteina: Scalin...

  6. [6]

    Z. Geng, M. Deng, X. Bai, J. Z. Kolter, and K. He. Mean flows for one-step generative modeling. In Advances in Neural Information Processing Systems (NeurIPS), 2025a. Z. Geng, Y . Lu, Z. Wu, E. Shechtman, J. Z. Kolter, and K. He. Improved mean flows: On the challenges of fastforward generative models.arXiv preprint arXiv:2512.02012, 2025b. J. Ho and T. Sa...

  7. [7]

    Holderrieth, D

    P. Holderrieth, D. Chen, L. Eyring, I. Shah, G. Anantharaman, Y . He, Z. Akata, T. Jaakkola, N. M. Boffi, and M. Simchowitz. Diamond maps: Efficient reward alignment via stochastic flow maps. arXiv preprint arXiv:2602.05993, 2026a. P. Holderrieth, U. Singer, T. Jaakkola, R. T. Q. Chen, Y . Lipman, and B. Karrer. GLASS flows: Effi- cient inference for rewa...

  8. [8]

    V . Jain, K. Sareen, M. Pedramfar, and S. Ravanbakhsh. Diffusion tree sampling: Scalable inference- time alignment of diffusion models.arXiv preprint arXiv:2506.20701,

  9. [9]

    Kim, C.-H

    D. Kim, C.-H. Lai, W.-H. Liao, N. Murata, Y . Takida, T. Uesaka, Y . He, Y . Mitsufuji, and S. Ermon. Consistency trajectory models: Learning probability flow ode trajectory of diffusion.arXiv preprint arXiv:2310.02279,

  10. [10]

    C. Lee, J. Yoo, M. Agarwal, S. Shah, J. Huang, A. Raghunathan, S. Hong, N. M. Boffi, and J. Kim. Flow map language models: One-step language modeling via continuous denoising.arXiv preprint arXiv:2602.16813,

  11. [11]

    Y . Lu, S. Lu, Q. Sun, H. Zhao, Z. Jiang, X. Wang, T. Li, Z. Geng, and K. He. One-step latent-free image generation with pixel mean flows.arXiv preprint arXiv:2601.22158,

  12. [12]

    Potaptchik, A

    P. Potaptchik, A. Saravanan, A. Mammadov, A. Prat, M. S. Albergo, and Y . W. Teh. Meta flow maps enable scalable reward alignment.arXiv preprint arXiv:2601.14430, 2026a. 13 P. Potaptchik, J. Yim, A. Saravanan, P. Holderrieth, E. Vanden-Eijnden, and M. S. Albergo. Discrete flow maps.arXiv preprint arXiv:2604.09784, 2026b. J. Rector-Brooks, T. Lambert, M. S...

  13. [13]

    D. Roos, O. Davis, F. Eijkelboom, M. Bronstein, M. Welling,˙I. ˙I. Ceylan, L. Ambrogioni, and J.-W. van de Meent. Categorical flow maps.arXiv preprint arXiv:2602.12233,

  14. [14]

    Sabour, M

    A. Sabour, M. S. Albergo, C. Domingo-Enrich, N. M. Boffi, S. Fidler, K. Kreis, and E. Vanden- Eijnden. Test-time scaling of diffusions with flow maps.arXiv preprint arXiv:2511.22688, 2025a. A. Sabour, S. Fidler, and K. Kreis. Align your flow: Scaling continuous-time flow map distillation, 2025b. URLhttps://arxiv.org/abs/2506.14603. D.-A. Silva, S. Yu, U. ...

  15. [15]

    Singhal, Z

    R. Singhal, Z. Horvitz, R. Teehan, M. Ren, Z. Yu, K. McKeown, and R. Ranganath. A general frame- work for inference-time scaling and steering of diffusion models.arXiv preprint arXiv:2501.06848,

  16. [16]

    Škrinjar, J

    P. Škrinjar, J. Eberhardt, J. Durairaj, and T. Schwede. Have protein-ligand co-folding methods moved beyond memorisation?BioRxiv, pages 2025–02,

  17. [17]

    P. Team, Y . Zhang, C. Gong, H. Zhang, W. Ma, Z. Liu, X. Chen, J. Guan, L. Wang, Y . Yang, et al. Protenix-v1: Toward high-accuracy open-source biomolecular structure prediction.bioRxiv, pages 2026–02,

  18. [18]

    Uehara, Y

    M. Uehara, Y . Zhao, C. Wang, X. Li, A. Regev, S. Levine, and T. Biancalani. Inference-time alignment in diffusion models with reward-guided generation: Tutorial and review.arXiv preprint arXiv:2501.09685,

  19. [19]

    Wohlwend, G

    J. Wohlwend, G. Corso, S. Passaro, N. Getz, M. Reveiz, K. Leidal, W. Swiderski, L. Atkinson, T. Portnoi, I. Chinn, et al. Boltz-1 democratizing biomolecular interaction modeling.BioRxiv, pages 2024–11,

  20. [20]

    J. Yim, A. Campbell, A. Y . Foong, M. Gastegger, J. Jiménez-Luna, S. Lewis, V . G. Satorras, B. S. Veeling, R. Barzilay, T. Jaakkola, et al. Fast protein backbone generation with se (3) flow matching. arXiv preprint arXiv:2310.05297,

  21. [21]

    As a slight extension of algorithm 2, we realized that the gradient estimate in∇ xσ0 R(xσ0)can be a noisy estimate of the optimal guidance direction

    P p=1 14: Compute particle scoresR (p) ˆx(p) 0 ,∀p 15: ifSEARCH=MCTSthen 16: X ← X ∪ T 17: Update weights of all(x, m)∈ T▷backup 18: Draw(x (p) σi , i)∼ Xaccording to UCT,∀p 19: else ifSEARCH=FKthen▷ K= 1, soTholds one entry per particle 20: (x(p) σi , i)←the entry ofTfor particlep,∀p 21: ifsmodL= 0then 22: Resample{x (p) σi }P p=1 with weights∝R (p) 23: ...

  22. [22]

    Boltz-1x (default)

    γRMSD<2↑PB Valid↑Success Rate↑ 0.375.3 89.2 65.2 0.575.690.365.9 0.772.8 90.3 63.8 1.070.390.761.3 5 10 20 Sampling steps 60 65 70 75 80Success Rate (%) 74.4 75.2 76.2 72.2 74.8 75.2 Boltz-1 @ 200 steps (76.2%) DeCAF (Ours) Boltz-1 Figure 6: RMSD <2 Å best@5 on PoseBusters. DECAF (FlowMap sampler Alg. 1, γ=0) vs Boltz-1 (ODE) at matched sampling steps; da...