A Systematic Evaluation of Co-folding Model Representations for Small-Molecule Learning
Pith reviewed 2026-05-25 07:37 UTC · model grok-4.3
The pith
Representations from protein-ligand co-folding transfer to standalone small-molecule tasks and match or outperform models trained only on molecular data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Boltz2 representations match or outperform existing models on the ADMET benchmark, accelerate molecular generative modeling, and improve sample efficiency in structure-guided ligand optimization. They are complementary to representations from 3D conformers, bioassay labels, and quantum-chemical properties. Extending representation alignment to reinforcement learning shows that dense representation-level supervision can complement scalar rewards in molecular discovery.
What carries the argument
Atom-level ligand representations extracted from the co-folding model and transferred to standalone small-molecule tasks through systematic probing and distillation.
If this is right
- Protein-ligand co-folding supplies a workable pretraining signal for small-molecule representation learning.
- The co-folding model can be used directly as an off-the-shelf source of molecular features.
- Representation-level supervision inside reinforcement learning improves sample efficiency beyond scalar reward signals alone.
- Co-folding features add information not captured by 3D conformers, bioassay labels, or quantum-chemical properties.
Where Pith is reading between the lines
- Large collections of protein-ligand structures could reduce reliance on purely molecular pretraining corpora.
- The same transfer approach might be tested on other paired biological data such as protein-protein or ligand-ligand relations.
- Evaluating the representations on tasks farther from the original co-folding distribution would clarify the breadth of transfer.
Load-bearing premise
The atom-level features learned from protein-ligand pairs remain informative when the protein is removed and the features are applied to isolated molecules.
What would settle it
A result in which the co-folding representations, after probing or distillation, fall below the performance of multiple established standalone molecular models on the majority of ADMET endpoints.
Figures
read the original abstract
Small-molecule foundation models are typically pretrained on standalone molecular data, unlike vision and language models that often benefit from cross-modal or relational supervision. Protein-ligand co-folding provides a molecular analogue of such supervision by exposing models to atom-level ligand-protein interactions, raising the question of whether co-folding models can yield strong small-molecule representations. We study this question using Boltz2, a modern co-folding model, by transferring its atom-level ligand representations to standalone small-molecule tasks. Through systematic probing and distillation, we show that Boltz2 representations match or outperform existing models on the ADMET benchmark, accelerate molecular generative modeling, and improve sample efficiency in structure-guided ligand optimization. We further find that Boltz2 representations are complementary to those learned from conventional standalone molecular supervision, including 3D conformers, bioassay labels, and quantum-chemical properties. Finally, we extend representation alignment to reinforcement learning, showing that dense representation-level supervision can complement scalar rewards in molecular discovery. These results identify protein-ligand co-folding as a promising pretraining paradigm for small-molecule representation learning and position Boltz2 as a strong, off-the-shelf molecular foundation model.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript evaluates atom-level ligand representations extracted from the Boltz2 protein-ligand co-folding model on standalone small-molecule tasks. Through probing, distillation, and complementarity experiments, it reports that these representations match or outperform existing models on the ADMET benchmark, accelerate generative modeling, improve sample efficiency in structure-guided ligand optimization, and complement representations from 3D conformers, bioassays, and quantum-chemical properties. The work also extends representation alignment to reinforcement learning, showing benefits from dense supervision alongside scalar rewards.
Significance. If the empirical transfer results hold under rigorous controls, the findings would establish protein-ligand co-folding as a viable pretraining source for small-molecule representations, offering an off-the-shelf foundation model (Boltz2) that does not require protein input at test time. This could meaningfully shift pretraining paradigms in molecular ML by leveraging relational supervision that is currently underused for ligand-only tasks.
minor comments (3)
- The abstract and introduction would benefit from explicit statements of the exact layer(s) and pooling strategy used to extract ligand representations from Boltz2, as this choice directly affects reproducibility of the reported transfer performance.
- Figure captions and method descriptions should clarify whether error bars reflect multiple random seeds, multiple train/test splits, or both, particularly for the ADMET and RL experiments.
- A short table summarizing the number of parameters, pretraining data scale, and inference cost for Boltz2 versus the baseline models would help readers assess the practical trade-offs of the proposed representations.
Simulated Author's Rebuttal
We thank the referee for their positive summary of our work and for recommending minor revision. The assessment accurately captures the manuscript's contributions on transferring Boltz2 co-folding representations to small-molecule tasks, including ADMET performance, generative modeling acceleration, sample efficiency gains, and complementarity with 3D, bioassay, and quantum-chemical signals, as well as the extension to representation alignment in reinforcement learning.
Circularity Check
No significant circularity
full rationale
The manuscript is a purely empirical study that transfers atom-level representations from the existing Boltz2 co-folding model to protein-free small-molecule benchmarks (ADMET, generative modeling, RL). No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the reported methods or results. All claims rest on direct experimental measurements of transfer performance, complementarity, and sample-efficiency gains; these measurements are independent of any author-defined inputs and do not reduce to self-definitional or fitted-input constructions.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774,
work page internal anchor Pith review Pith/arXiv arXiv
-
[2]
Uni-qsar: an auto-ml tool for molecular property prediction.arXiv preprint arXiv:2304.12239,
Gao, Z., Ji, X., Zhao, G., Wang, H., Zheng, H., Ke, G., and Zhang, L. Uni-qsar: an auto-ml tool for molecular property prediction.arXiv preprint arXiv:2304.12239,
-
[3]
Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Vaughan, A., et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783,
work page internal anchor Pith review Pith/arXiv arXiv
-
[4]
The Platonic Representation Hypothesis
Huh, M., Cheung, B., Wang, T., and Isola, P. The platonic representation hypothesis.arXiv preprint arXiv:2405.07987,
work page internal anchor Pith review Pith/arXiv arXiv
-
[5]
Pessimistic backward policy for gflownets
Jang, H., Jang, Y ., Kim, M., Park, J., and Ahn, S. Pessimistic backward policy for gflownets. InAdvances in Neural Information Processing Systems, 2024a. Jang, H., Kim, M., and Ahn, S. Learning energy decompo- sitions for partial inference in gflownets. InInternational Conference on Learning Representations, 2024b. Jang, Y ., Kim, D., and Ahn, S. Graph g...
-
[6]
Kim, J., Chang, W., Ji, H., and Joung, I. Quantum-informed molecular representation learning enhancing admet prop- erty prediction.Journal of Chemical Information and Modeling, 64(13):5028–5040, 2024a. 9 Boltz is a Strong Baseline for Atom-level Representation Learning Kim, M., Yun, T., Bengio, E., Zhang, D., Bengio, Y ., Ahn, S., and Park, J. Local searc...
work page 2024
-
[7]
Attentive Statistics Pooling for Deep Speaker Embedding
Okabe, K., Koshinaka, T., and Shinoda, K. Attentive statis- tics pooling for deep speaker embedding.arXiv preprint arXiv:1803.10963,
work page internal anchor Pith review Pith/arXiv arXiv
-
[8]
Seo, H., Kim, T., Yu, S., and Ahn, S. Learning flexible for- ward trajectories for masked molecular diffusion.arXiv preprint arXiv:2505.16790,
-
[9]
Diffusion Transformers with Representation Autoencoders
Zheng, B., Ma, N., Tong, S., and Xie, S. Diffusion trans- formers with representation autoencoders.arXiv preprint arXiv:2510.11690,
work page internal anchor Pith review Pith/arXiv arXiv
-
[10]
A.2. Implementation Details Pooling.We concatenate pair representations extracted from the 16th, 32nd, 48th, and 64th Pairformer layers, which correspond to the quarter, half, three-quarter, and final depths of the Pairformer stack, to obtain a 512-dimensional pair representation. Unlike existing small molecular foundation models, Boltz do not train a poo...
work page 2018
-
[11]
Although the nominal dimensionality is high, strong correlations between features reduce the effective dimensionality. Probing Network and Training Configuration.Our detailed probing and evaluation setups follow prior work (Klaser et al., 2024). We adopt a lightweight probing setup, where a task-specific MLP head is trained on top of frozen molecular repr...
work page 2024
-
[12]
In this experiment, we use the official codebase of GruM
to a denoising diffusion–based molecular generative model, specifically GruM (Jo et al., 2024). In this experiment, we use the official codebase of GruM. B.1. Measuring CKNNA We measure representation alignment using Centered Kernel Nearest-Neighbor Alignment (CKNNA) (Huh et al., 2024), which evaluates local alignment between two representation spaces bas...
work page 2024
-
[13]
Before being passed into the Pairformer, pair representations are initialized by transforming edge features, concatenations of atom-pair features, and a time conditioning signal with separate two-layer MLPs and combining the resulting embeddings additively. Single representations are initialized from atom features and the time conditioning signal via two-...
work page 2024
-
[14]
Boltz2 representations f(m) are precomputed for all molecules
The distillation network maps generative model representations to the Boltz2 representation space. Boltz2 representations f(m) are precomputed for all molecules. We also flatten the single and pair representations when applying representation alignment, while masking out-of-range indices. 13 Boltz is a Strong Baseline for Atom-level Representation Learnin...
work page 2025
-
[15]
This is defined as follows: Linter
We also study intermediate-state distillation as an ablation in Appendix D.1. This is defined as follows: Linter. align.(τ= (s 0, . . . , sT ), f(m)) =−λ· TX t=0 cos hθ(st), f(m) I(st) . Here, I(s t) denotes the index set of atoms in the terminal molecule sT that correspond to the molecular substructure present at the intermediate state st, and f(m) I(st)...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.