A Systematic Evaluation of Co-folding Model Representations for Small-Molecule Learning

Honghui Kim; Hyosoon Jang; Hyunjin Seo; Seonghyun Park; Sungsoo Ahn; Taewon Kim; Yunhui Jang

arxiv: 2602.13249 · v2 · pith:V7LEI7O3new · submitted 2026-02-02 · 🧬 q-bio.BM · cs.AI· cs.LG

A Systematic Evaluation of Co-folding Model Representations for Small-Molecule Learning

Hyosoon Jang , Hyunjin Seo , Honghui Kim , Seonghyun Park , Taewon Kim , Yunhui Jang , Sungsoo Ahn This is my paper

Pith reviewed 2026-05-25 07:37 UTC · model grok-4.3

classification 🧬 q-bio.BM cs.AIcs.LG

keywords co-foldingsmall-molecule representationsADMET benchmarkmolecular generative modelingligand optimizationprotein-ligand interactionsrepresentation transfer

0 comments

The pith

Representations from protein-ligand co-folding transfer to standalone small-molecule tasks and match or outperform models trained only on molecular data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether a model trained on protein-ligand pairs can supply useful atom-level features for molecules when the protein is no longer present. It extracts those features from one co-folding model and tests them on property prediction, generative modeling, and structure-guided optimization without any retraining of the original model. The transferred features reach or exceed the performance of existing small-molecule models on the ADMET benchmark and combine productively with features from 3D conformers, bioassay data, and quantum calculations. The work also shows that dense representation-level signals can supplement scalar rewards inside reinforcement learning loops for molecular design. These outcomes suggest that relational supervision from co-folding offers a viable alternative pretraining route for small-molecule foundation models.

Core claim

Boltz2 representations match or outperform existing models on the ADMET benchmark, accelerate molecular generative modeling, and improve sample efficiency in structure-guided ligand optimization. They are complementary to representations from 3D conformers, bioassay labels, and quantum-chemical properties. Extending representation alignment to reinforcement learning shows that dense representation-level supervision can complement scalar rewards in molecular discovery.

What carries the argument

Atom-level ligand representations extracted from the co-folding model and transferred to standalone small-molecule tasks through systematic probing and distillation.

If this is right

Protein-ligand co-folding supplies a workable pretraining signal for small-molecule representation learning.
The co-folding model can be used directly as an off-the-shelf source of molecular features.
Representation-level supervision inside reinforcement learning improves sample efficiency beyond scalar reward signals alone.
Co-folding features add information not captured by 3D conformers, bioassay labels, or quantum-chemical properties.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Large collections of protein-ligand structures could reduce reliance on purely molecular pretraining corpora.
The same transfer approach might be tested on other paired biological data such as protein-protein or ligand-ligand relations.
Evaluating the representations on tasks farther from the original co-folding distribution would clarify the breadth of transfer.

Load-bearing premise

The atom-level features learned from protein-ligand pairs remain informative when the protein is removed and the features are applied to isolated molecules.

What would settle it

A result in which the co-folding representations, after probing or distillation, fall below the performance of multiple established standalone molecular models on the majority of ADMET endpoints.

Figures

Figures reproduced from arXiv: 2602.13249 by Honghui Kim, Hyosoon Jang, Hyunjin Seo, Seonghyun Park, Sungsoo Ahn, Taewon Kim, Yunhui Jang.

**Figure 1.** Figure 1: Boltz as atom-level small molecular foundation models. We repurpose Boltz, originally trained for protein-ligand co-folding, as a small-molecule representation model by leveraging atom-level ligand representations. • Representation-guided molecular optimization. We extend representation alignment to online reinforcement learning for molecular discovery, showing that dense representation-level auxiliary s… view at source ↗

**Figure 2.** Figure 2: Boltz2 vs. existing foundation models on ADMET benchmarks. As illustrated, Boltz2 shows competitive performance compared to existing foundation models specialized for small molecules on four out of five domains. 0.136 0.138 CKNNA 99.2 99.4 Validity Boltz2 R 2=0.424 0.116 0.118 0.120 0.122 CKNNA 99.2 99.4 MolE R 2=0.047 0.092 0.093 0.094 CKNNA 99.2 99.4 Mini R 2=0.104 [PITH_FULL_IMAGE:figures/full_fig_p005… view at source ↗

**Figure 3.** Figure 3: Representation alignment with foundation models vs. generation quality. Stronger alignment with Boltz2 representations correlates with higher molecular generation quality. 4. Molecular Generation We next evaluate the quality of Boltz2 representations on small-molecule generation tasks. This experiment is motivated by recent work showing that representations from high-quality foundation models can improve… view at source ↗

**Figure 4.** Figure 4: Training acceleration using Boltz2. Representation alignment with Boltz2 accelerates training of generative models [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Results on structure-guided ligand discovery. The results are averaged over three random seeds. Representation alignment with Boltz2 improves the sample efficiency for discovering high-score molecules that bind to target structures. In our experiments, we extend this setup by incorporating representation alignment-based distillation into the policy. Specifically, we maximize the cosine similarity between t… view at source ↗

**Figure 6.** Figure 6: Representation alignment between Boltz2 vs. existing molecular foundation models. Boltz2 exhibits relatively weak CKNNA with existing molecular foundation models [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Online SynFlowNet-Boltz ligand discovery pipeline. C. Experimental Setup of Structure-guided Ligand Discovery We use SynFlowNet-Boltz ligand discovery pipeline introduced in prior work (Passaro et al., 2025; Cretu et al., 2025). The task is formulated as an online reinforcement learning problem, where a molecular policy is iteratively updated based on binding affinity rewards computed by Boltz2. In this ex… view at source ↗

**Figure 8.** Figure 8: Representation alignment on intermediate molecules. D. Additional Results D.1. Intermediate State Distillation We further conduct an ablation study that extends representation alignment-based distillation from generated molecules to intermediate molecules produced by the policy before generation, for example, the intermediate molecular representations shown in [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

read the original abstract

Small-molecule foundation models are typically pretrained on standalone molecular data, unlike vision and language models that often benefit from cross-modal or relational supervision. Protein-ligand co-folding provides a molecular analogue of such supervision by exposing models to atom-level ligand-protein interactions, raising the question of whether co-folding models can yield strong small-molecule representations. We study this question using Boltz2, a modern co-folding model, by transferring its atom-level ligand representations to standalone small-molecule tasks. Through systematic probing and distillation, we show that Boltz2 representations match or outperform existing models on the ADMET benchmark, accelerate molecular generative modeling, and improve sample efficiency in structure-guided ligand optimization. We further find that Boltz2 representations are complementary to those learned from conventional standalone molecular supervision, including 3D conformers, bioassay labels, and quantum-chemical properties. Finally, we extend representation alignment to reinforcement learning, showing that dense representation-level supervision can complement scalar rewards in molecular discovery. These results identify protein-ligand co-folding as a promising pretraining paradigm for small-molecule representation learning and position Boltz2 as a strong, off-the-shelf molecular foundation model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Boltz2 representations from co-folding transfer to standalone small-molecule tasks and add value on top of standard pretraining.

read the letter

Boltz2 representations from co-folding transfer to standalone small-molecule tasks and add value on top of standard pretraining. The paper's core contribution is the systematic evaluation showing that ligand atom features extracted from Boltz2 match or beat current models on ADMET, help generative modeling, and improve efficiency in RL-based optimization. They also demonstrate complementarity with 3D conformers, bioassays, and quantum properties. This is new because prior work on molecular pretraining stayed within standalone data, while this pulls in relational supervision from protein-ligand pairs without needing the protein at inference. They do a good job laying out the probing, distillation, and alignment steps, plus the extension to dense representation supervision in RL. The results position co-folding as a viable pretraining route and Boltz2 as a ready-to-use model for these tasks. The main soft spot is that the strength of the evidence depends on details not visible in the abstract, such as how they handle the transfer without retraining and what baselines they use for the generative and RL parts. If the experiments lack sufficient controls for data leakage or distribution shift, the complementarity claims could weaken. Still, the stress-test note indicates the key assumption gets direct testing through their experiments. This work is for researchers in molecular machine learning who are looking for new pretraining signals. Anyone tuning foundation models for ADMET or ligand design will find the complementarity experiments useful. It deserves a serious referee because the empirical setup directly addresses a timely question in the field and reports concrete gains. Recommendation: send it out for peer review.

Referee Report

0 major / 3 minor

Summary. The manuscript evaluates atom-level ligand representations extracted from the Boltz2 protein-ligand co-folding model on standalone small-molecule tasks. Through probing, distillation, and complementarity experiments, it reports that these representations match or outperform existing models on the ADMET benchmark, accelerate generative modeling, improve sample efficiency in structure-guided ligand optimization, and complement representations from 3D conformers, bioassays, and quantum-chemical properties. The work also extends representation alignment to reinforcement learning, showing benefits from dense supervision alongside scalar rewards.

Significance. If the empirical transfer results hold under rigorous controls, the findings would establish protein-ligand co-folding as a viable pretraining source for small-molecule representations, offering an off-the-shelf foundation model (Boltz2) that does not require protein input at test time. This could meaningfully shift pretraining paradigms in molecular ML by leveraging relational supervision that is currently underused for ligand-only tasks.

minor comments (3)

The abstract and introduction would benefit from explicit statements of the exact layer(s) and pooling strategy used to extract ligand representations from Boltz2, as this choice directly affects reproducibility of the reported transfer performance.
Figure captions and method descriptions should clarify whether error bars reflect multiple random seeds, multiple train/test splits, or both, particularly for the ADMET and RL experiments.
A short table summarizing the number of parameters, pretraining data scale, and inference cost for Boltz2 versus the baseline models would help readers assess the practical trade-offs of the proposed representations.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our work and for recommending minor revision. The assessment accurately captures the manuscript's contributions on transferring Boltz2 co-folding representations to small-molecule tasks, including ADMET performance, generative modeling acceleration, sample efficiency gains, and complementarity with 3D, bioassay, and quantum-chemical signals, as well as the extension to representation alignment in reinforcement learning.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The manuscript is a purely empirical study that transfers atom-level representations from the existing Boltz2 co-folding model to protein-free small-molecule benchmarks (ADMET, generative modeling, RL). No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the reported methods or results. All claims rest on direct experimental measurements of transfer performance, complementarity, and sample-efficiency gains; these measurements are independent of any author-defined inputs and do not reduce to self-definitional or fitted-input constructions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities for the central claim; the work is an empirical transfer study rather than a derivation.

pith-pipeline@v0.9.0 · 5759 in / 1194 out tokens · 37488 ms · 2026-05-25T07:37:31.160606+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages · 5 internal anchors

[1]

GPT-4 Technical Report

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

Uni-qsar: an auto-ml tool for molecular property prediction.arXiv preprint arXiv:2304.12239,

Gao, Z., Ji, X., Zhao, G., Wang, H., Zheng, H., Ke, G., and Zhang, L. Uni-qsar: an auto-ml tool for molecular property prediction.arXiv preprint arXiv:2304.12239,

work page arXiv
[3]

The Llama 3 Herd of Models

Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Vaughan, A., et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

The Platonic Representation Hypothesis

Huh, M., Cheung, B., Wang, T., and Isola, P. The platonic representation hypothesis.arXiv preprint arXiv:2405.07987,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Pessimistic backward policy for gflownets

Jang, H., Jang, Y ., Kim, M., Park, J., and Ahn, S. Pessimistic backward policy for gflownets. InAdvances in Neural Information Processing Systems, 2024a. Jang, H., Kim, M., and Ahn, S. Learning energy decompo- sitions for partial inference in gflownets. InInternational Conference on Learning Representations, 2024b. Jang, Y ., Kim, D., and Ahn, S. Graph g...

work page arXiv
[6]

Quantum-informed molecular representation learning enhancing admet prop- erty prediction.Journal of Chemical Information and Modeling, 64(13):5028–5040, 2024a

Kim, J., Chang, W., Ji, H., and Joung, I. Quantum-informed molecular representation learning enhancing admet prop- erty prediction.Journal of Chemical Information and Modeling, 64(13):5028–5040, 2024a. 9 Boltz is a Strong Baseline for Atom-level Representation Learning Kim, M., Yun, T., Bengio, E., Zhang, D., Bengio, Y ., Ahn, S., and Park, J. Local searc...

work page 2024
[7]

Attentive Statistics Pooling for Deep Speaker Embedding

Okabe, K., Koshinaka, T., and Shinoda, K. Attentive statis- tics pooling for deep speaker embedding.arXiv preprint arXiv:1803.10963,

work page internal anchor Pith review Pith/arXiv arXiv
[8]

Learning flexible for- ward trajectories for masked molecular diffusion.arXiv preprint arXiv:2505.16790,

Seo, H., Kim, T., Yu, S., and Ahn, S. Learning flexible for- ward trajectories for masked molecular diffusion.arXiv preprint arXiv:2505.16790,

work page arXiv
[9]

Diffusion Transformers with Representation Autoencoders

Zheng, B., Ma, N., Tong, S., and Xie, S. Diffusion trans- formers with representation autoencoders.arXiv preprint arXiv:2510.11690,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

A.2. Implementation Details Pooling.We concatenate pair representations extracted from the 16th, 32nd, 48th, and 64th Pairformer layers, which correspond to the quarter, half, three-quarter, and final depths of the Pairformer stack, to obtain a 512-dimensional pair representation. Unlike existing small molecular foundation models, Boltz do not train a poo...

work page 2018
[11]

Probing Network and Training Configuration.Our detailed probing and evaluation setups follow prior work (Klaser et al., 2024)

Although the nominal dimensionality is high, strong correlations between features reduce the effective dimensionality. Probing Network and Training Configuration.Our detailed probing and evaluation setups follow prior work (Klaser et al., 2024). We adopt a lightweight probing setup, where a task-specific MLP head is trained on top of frozen molecular repr...

work page 2024
[12]

In this experiment, we use the official codebase of GruM

to a denoising diffusion–based molecular generative model, specifically GruM (Jo et al., 2024). In this experiment, we use the official codebase of GruM. B.1. Measuring CKNNA We measure representation alignment using Centered Kernel Nearest-Neighbor Alignment (CKNNA) (Huh et al., 2024), which evaluates local alignment between two representation spaces bas...

work page 2024
[13]

Single representations are initialized from atom features and the time conditioning signal via two-layer MLPs

Before being passed into the Pairformer, pair representations are initialized by transforming edge features, concatenations of atom-pair features, and a time conditioning signal with separate two-layer MLPs and combining the resulting embeddings additively. Single representations are initialized from atom features and the time conditioning signal via two-...

work page 2024
[14]

Boltz2 representations f(m) are precomputed for all molecules

The distillation network maps generative model representations to the Boltz2 representation space. Boltz2 representations f(m) are precomputed for all molecules. We also flatten the single and pair representations when applying representation alignment, while masking out-of-range indices. 13 Boltz is a Strong Baseline for Atom-level Representation Learnin...

work page 2025
[15]

This is defined as follows: Linter

We also study intermediate-state distillation as an ablation in Appendix D.1. This is defined as follows: Linter. align.(τ= (s 0, . . . , sT ), f(m)) =−λ· TX t=0 cos hθ(st), f(m) I(st) . Here, I(s t) denotes the index set of atoms in the terminal molecule sT that correspond to the molecular substructure present at the intermediate state st, and f(m) I(st)...

work page 2025

[1] [1]

GPT-4 Technical Report

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774,

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

Uni-qsar: an auto-ml tool for molecular property prediction.arXiv preprint arXiv:2304.12239,

Gao, Z., Ji, X., Zhao, G., Wang, H., Zheng, H., Ke, G., and Zhang, L. Uni-qsar: an auto-ml tool for molecular property prediction.arXiv preprint arXiv:2304.12239,

work page arXiv

[3] [3]

The Llama 3 Herd of Models

Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Vaughan, A., et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783,

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

The Platonic Representation Hypothesis

Huh, M., Cheung, B., Wang, T., and Isola, P. The platonic representation hypothesis.arXiv preprint arXiv:2405.07987,

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

Pessimistic backward policy for gflownets

Jang, H., Jang, Y ., Kim, M., Park, J., and Ahn, S. Pessimistic backward policy for gflownets. InAdvances in Neural Information Processing Systems, 2024a. Jang, H., Kim, M., and Ahn, S. Learning energy decompo- sitions for partial inference in gflownets. InInternational Conference on Learning Representations, 2024b. Jang, Y ., Kim, D., and Ahn, S. Graph g...

work page arXiv

[6] [6]

Quantum-informed molecular representation learning enhancing admet prop- erty prediction.Journal of Chemical Information and Modeling, 64(13):5028–5040, 2024a

Kim, J., Chang, W., Ji, H., and Joung, I. Quantum-informed molecular representation learning enhancing admet prop- erty prediction.Journal of Chemical Information and Modeling, 64(13):5028–5040, 2024a. 9 Boltz is a Strong Baseline for Atom-level Representation Learning Kim, M., Yun, T., Bengio, E., Zhang, D., Bengio, Y ., Ahn, S., and Park, J. Local searc...

work page 2024

[7] [7]

Attentive Statistics Pooling for Deep Speaker Embedding

Okabe, K., Koshinaka, T., and Shinoda, K. Attentive statis- tics pooling for deep speaker embedding.arXiv preprint arXiv:1803.10963,

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

Learning flexible for- ward trajectories for masked molecular diffusion.arXiv preprint arXiv:2505.16790,

Seo, H., Kim, T., Yu, S., and Ahn, S. Learning flexible for- ward trajectories for masked molecular diffusion.arXiv preprint arXiv:2505.16790,

work page arXiv

[9] [9]

Diffusion Transformers with Representation Autoencoders

Zheng, B., Ma, N., Tong, S., and Xie, S. Diffusion trans- formers with representation autoencoders.arXiv preprint arXiv:2510.11690,

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

A.2. Implementation Details Pooling.We concatenate pair representations extracted from the 16th, 32nd, 48th, and 64th Pairformer layers, which correspond to the quarter, half, three-quarter, and final depths of the Pairformer stack, to obtain a 512-dimensional pair representation. Unlike existing small molecular foundation models, Boltz do not train a poo...

work page 2018

[11] [11]

Probing Network and Training Configuration.Our detailed probing and evaluation setups follow prior work (Klaser et al., 2024)

Although the nominal dimensionality is high, strong correlations between features reduce the effective dimensionality. Probing Network and Training Configuration.Our detailed probing and evaluation setups follow prior work (Klaser et al., 2024). We adopt a lightweight probing setup, where a task-specific MLP head is trained on top of frozen molecular repr...

work page 2024

[12] [12]

In this experiment, we use the official codebase of GruM

to a denoising diffusion–based molecular generative model, specifically GruM (Jo et al., 2024). In this experiment, we use the official codebase of GruM. B.1. Measuring CKNNA We measure representation alignment using Centered Kernel Nearest-Neighbor Alignment (CKNNA) (Huh et al., 2024), which evaluates local alignment between two representation spaces bas...

work page 2024

[13] [13]

Single representations are initialized from atom features and the time conditioning signal via two-layer MLPs

Before being passed into the Pairformer, pair representations are initialized by transforming edge features, concatenations of atom-pair features, and a time conditioning signal with separate two-layer MLPs and combining the resulting embeddings additively. Single representations are initialized from atom features and the time conditioning signal via two-...

work page 2024

[14] [14]

Boltz2 representations f(m) are precomputed for all molecules

The distillation network maps generative model representations to the Boltz2 representation space. Boltz2 representations f(m) are precomputed for all molecules. We also flatten the single and pair representations when applying representation alignment, while masking out-of-range indices. 13 Boltz is a Strong Baseline for Atom-level Representation Learnin...

work page 2025

[15] [15]

This is defined as follows: Linter

We also study intermediate-state distillation as an ablation in Appendix D.1. This is defined as follows: Linter. align.(τ= (s 0, . . . , sT ), f(m)) =−λ· TX t=0 cos hθ(st), f(m) I(st) . Here, I(s t) denotes the index set of atoms in the terminal molecule sT that correspond to the molecular substructure present at the intermediate state st, and f(m) I(st)...

work page 2025