arxiv: 2605.10722 · v1 · submitted 2026-05-11 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

On Improving Graph Neural Networks for QSAR by Pre-training on Extended-Connectivity Fingerprints

Charlotte M. Deane, Garrett M. Morris, Markus Dablander, Sam Money-Kyrle, Stephane Werner, Thierry Hanser

Authors on Pith no claims yet

Pith reviewed 2026-05-12 05:04 UTC · model grok-4.3

classification 💻 cs.LG

keywords graph neural networksQSARpre-trainingextended-connectivity fingerprintsmolecular property predictiondrug discoveryout-of-distribution generalization

0 comments

The pith

Pre-training graph neural networks to predict extended-connectivity fingerprints improves performance on most QSAR benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a pre-training strategy for graph neural networks that uses extended-connectivity fingerprints as the target to enhance results on quantitative structure-activity relationship tasks in drug discovery. It tests the approach with statistical validation on out-of-distribution splits across Biogen industry datasets and compares against standard baselines. A reader would care because GNNs have shown inconsistent advantages over simpler molecular representations, and this method provides a general way to boost them without architecture changes. The work also identifies limits, including underperformance on heterogeneous data and complex endpoints like binding affinity, along with checks for substructure leakage effects.

Core claim

Pre-training GNNs to predict ECFPs produces statistically significant gains in standard performance metrics over all evaluated baselines on five out of six Biogen benchmarks under challenging out-of-distribution splits, while showing reduced effectiveness for more heterogeneous datasets and complex endpoints such as binding affinity prediction.

What carries the argument

The pre-training objective of predicting bits in the Extended-Connectivity Fingerprint (ECFP), which encodes substructure presence and acts as a self-supervised signal to build molecular representations before downstream QSAR fine-tuning.

If this is right

GNNs become more competitive with classical fingerprint methods on standard QSAR tasks.
Pre-training supports better out-of-distribution generalization across many practically relevant molecular datasets.
Effectiveness varies with dataset heterogeneity and endpoint complexity, requiring case-by-case evaluation.
Substructure leakage during pre-training can diminish benefits in specific scenarios.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

GNNs may learn substructure patterns more reliably through this explicit pre-training step than through end-to-end supervised training alone.
The same pre-training idea could be tested with alternative molecular fingerprints to check whether gains hold across different representations.
Integration into virtual screening workflows might improve early-stage hit identification rates by leveraging the boosted initial predictions.

Load-bearing premise

The chosen out-of-distribution splits and substructure-leakage checks are sufficient to ensure that performance gains reflect genuine generalization rather than residual data overlap or dataset artifacts.

What would settle it

Re-running the benchmarks with stricter splits that fully eliminate substructure overlap between pre-training and fine-tuning sets and finding that the statistically significant improvements vanish or reverse.

Figures

Figures reproduced from arXiv: 2605.10722 by Charlotte M. Deane, Garrett M. Morris, Markus Dablander, Sam Money-Kyrle, Stephane Werner, Thierry Hanser.

**Figure 1.** Figure 1: Performance metrics on ESOL aqueous solubility with all molecules (ESOL) and only molecules with a drug-like solubility [PITH_FULL_IMAGE:figures/full_fig_p037_1.png] view at source ↗

**Figure 2.** Figure 2: Rank-biserial coefficient (rrb) effect sizes (upper triangle) and Wilcoxon signed-rank test p-values (lower triangle) of the Pearson correlation (↑ ρ) between methods on Biogen datasets. Tasks are (a) Human Plasma Protein Binding (PPB), (b) Rat PPB, (c) Human Intrinsic Clearance (CLint), (d) Rat CLint, (e) Efflux, (f) Solubility. Featurisation methods shown are GINs trained from scratch (Scratch GIN), ECFP… view at source ↗

**Figure 3.** Figure 3: Rank-biserial coefficient (rrb) effect sizes (upper triangle) and Wilcoxon signed-rank test p-values (lower triangle) of the mean absolute percentage error (↓ MAPE(%)) between methods on Biogen datasets. Tasks are (a) Human Plasma Protein Binding (PPB), (b) Rat PPB, (c) Human Intrinsic Clearance (CLint), (d) Rat CLint, (e) Efflux, (f) Solubility. Featurisation methods shown are GINs trained from scratch (S… view at source ↗

**Figure 4.** Figure 4: Rank-biserial coefficient (rrb) effect sizes (upper triangle) and Wilcoxon signed-rank test p-values (lower triangle) of the Pearson correlation (↑ ρ) of PT-GIN models pre-trained on Tanimoto filtered QMugs subsets. Tasks are (a) Human Plasma Protein Binding (PPB), (b) Rat PPB, (c) Human Intrinsic Clearance (CLint), (d) Rat CLint, (e) Efflux, (f) Solubility. Featurisation methods shown are GINs pre-trained… view at source ↗

**Figure 5.** Figure 5: Rank-biserial coefficient (rrb) effect sizes (upper triangle) and Wilcoxon signed-rank test p-values (lower triangle) of the mean absolute percentage error (↓ MAPE(%)) of PT-GIN models pre-trained on Tanimoto filtered QMugs subsets. Tasks are (a) Human Plasma Protein Binding (PPB), (b) Rat PPB, (c) Human Intrinsic Clearance (CLint), (d) Rat CLint, (e) Efflux, (f) Solubility. Featurisation methods shown are… view at source ↗

**Figure 6.** Figure 6: Rank-biserial coefficient (rrb) effect sizes (upper triangle) and Wilcoxon signed-rank test p-values (lower triangle) of the coefficient of determination (↑ R 2 ) between methods on ChEMBL datasets. Tasks are (a) DRD2, (b) Factor XA binding affinity prediction. Featurisation methods shown are ECFP-pretrained GINs (PT-GIN), hashed Extended-Connectivity Fingerprints (ECFPhashed), hashed Functional-Connectivi… view at source ↗

**Figure 7.** Figure 7: Rank-biserial coefficient (rrb) effect sizes (upper triangle) and Wilcoxon signed-rank test p-values (lower triangle) of the Pearson correlation (↑ ρ) between methods on ChEMBL datasets. Tasks are (a) DRD2, (b) Factor XA binding affinity prediction. Featurisation methods shown are ECFP-pretrained GINs (PT-GIN), hashed Extended-Connectivity Fingerprints (ECFPhashed), hashed Functional-Connectivity Fingerpri… view at source ↗

**Figure 8.** Figure 8: Rank-biserial coefficient (rrb) effect sizes (upper triangle) and Wilcoxon signed-rank test p-values (lower triangle) of the mean absolute percentage error (↓ MAPE(%)) between methods on ChEMBL datasets. Tasks are (a) DRD2, (b) Factor XA binding affinity prediction. Featurisation methods shown are ECFP-pretrained GINs (PT-GIN), hashed Extended-Connectivity Fingerprints (ECFPhashed), hashed Functional-Conne… view at source ↗

**Figure 9.** Figure 9: Rank-biserial coefficient (rrb) effect sizes (upper triangle) and Wilcoxon signed-rank test p-values (lower triangle) of the Pearson correlation (↑ ρ) between methods on MoleculeNet regression datasets. Datasets shown are (a) ESOL, (b) FreeSolv, and (c) Lipophilicity. Featurisation methods shown are ECFP-pretrained GINs (PT-GIN), hashed Extended-Connectivity Fingerprints (ECFPhashed), hashed Functional-Con… view at source ↗

**Figure 10.** Figure 10: Rank-biserial coefficient (rrb) effect sizes (upper triangle) and Wilcoxon signed-rank test p-values (lower triangle) of the mean absolute percentage error (↓ MAPE(%)) between methods on MoleculeNet regression datasets. Datasets shown are (a) ESOL, (b) FreeSolv, and (c) Lipophilicity. Featurisation methods shown are ECFP-pretrained GINs (PT-GIN), hashed Extended-Connectivity Fingerprints (ECFPhashed), has… view at source ↗

**Figure 11.** Figure 11: Rank-biserial coefficient (rrb) effect sizes (upper triangle) and Wilcoxon signed-rank test p-values (lower triangle) of the Area Under the Receiver Operating Characteristic (↑ AUROC) between methods on MUV virtual screening tasks. Tasks are (a) MUV466, (b) MUV548, (c) MUV600, (d) MUV644, (e) MUV652, (f) MUV689, (g) MUV692, (h) MUV712, (i) MUV713, (j) MUV733, (k) MUV737, (l) MUV810, (m) MUV832, (n) MUV846… view at source ↗

**Figure 12.** Figure 12: Rank-biserial coefficient (rrb) effect sizes (upper triangle) and Wilcoxon signed-rank test p-values (lower triangle) of the Area Under the Precision-Recall Curve (↑ AUCPR) between methods on MUV virtual screening tasks. Tasks are (a) MUV466, (b) MUV548, (c) MUV600, (d) MUV644, (e) MUV652, (f) MUV689, (g) MUV692, (h) MUV712, (i) MUV713, (j) MUV733, (k) MUV737, (l) MUV810, (m) MUV832, (n) MUV846, (o) MUV85… view at source ↗

**Figure 13.** Figure 13: Rank-biserial coefficient (rrb) effect sizes (upper triangle) and Wilcoxon signed-rank test p-values (lower triangle) of the Matthews correlation coefficient (↑ MCC) between methods on MUV virtual screening tasks. Tasks are (a) MUV466, (b) MUV548, (c) MUV600, (d) MUV644, (e) MUV652, (f) MUV689, (g) MUV692, (h) MUV712, (i) MUV713, (j) MUV733, (k) MUV737, (l) MUV810, (m) MUV832, (n) MUV846, (o) MUV852, (p) … view at source ↗

**Figure 14.** Figure 14: Rank-biserial coefficient (rrb) effect sizes (upper triangle) and Wilcoxon signed-rank test p-values (lower triangle) of the Enrichment Factor at 10% (↑ EF10%) between methods on MUV virtual screening tasks. Tasks are (a) MUV466, (b) MUV548, (c) MUV600, (d) MUV644, (e) MUV652, (f) MUV689, (g) MUV692, (h) MUV712, (i) MUV713, (j) MUV733, (k) MUV737, (l) MUV810, (m) MUV832, (n) MUV846, (o) MUV852, (p) MUV858… view at source ↗

**Figure 15.** Figure 15: Rank-biserial coefficient (rrb) effect sizes (upper triangle) and Wilcoxon signed-rank test p-values (lower triangle) of the Enrichment Factor at 5% (↑ EF5%) between methods on MUV virtual screening tasks. Tasks are (a) MUV466, (b) MUV548, (c) MUV600, (d) MUV644, (e) MUV652, (f) MUV689, (g) MUV692, (h) MUV712, (i) MUV713, (j) MUV733, (k) MUV737, (l) MUV810, (m) MUV832, (n) MUV846, (o) MUV852, (p) MUV858, … view at source ↗

**Figure 16.** Figure 16: Rank-biserial coefficient (rrb) effect sizes (upper triangle) and Wilcoxon signed-rank test p-values (lower triangle) of the Enrichment Factor at 1% (↑ EF1%) between methods on MUV virtual screening tasks. Tasks are (a) MUV466, (b) MUV548, (c) MUV600, (d) MUV644, (e) MUV652, (f) MUV689, (g) MUV692, (h) MUV712, (i) MUV713, (j) MUV733, (k) MUV737, (l) MUV810, (m) MUV832, (n) MUV846, (o) MUV852, (p) MUV858, … view at source ↗

**Figure 17.** Figure 17: Rank-biserial coefficient (rrb) effect sizes (upper triangle) and Wilcoxon signed-rank test p-values (lower triangle) of the Enrichment Factor at 0.5% (↑ EF0.5%) between methods on MUV virtual screening tasks. Tasks are (a) MUV466, (b) MUV548, (c) MUV600, (d) MUV644, (e) MUV652, (f) MUV689, (g) MUV692, (h) MUV712, (i) MUV713, (j) MUV733, (k) MUV737, (l) MUV810, (m) MUV832, (n) MUV846, (o) MUV852, (p) MUV8… view at source ↗

**Figure 18.** Figure 18: Rank-biserial coefficient (rrb) effect sizes (upper triangle) and Wilcoxon signed-rank test p-values (lower triangle) of the Area Under the Receiver Operating Characteristic (↑ AUROC) between methods on Tox21 toxicity tasks. For task labels, see [PITH_FULL_IMAGE:figures/full_fig_p054_18.png] view at source ↗

**Figure 19.** Figure 19: Rank-biserial coefficient (rrb) effect sizes (upper triangle) and Wilcoxon signed-rank test p-values (lower triangle) of the Area Under the Precision-Recall Curve (↑ AUCPR) between methods on Tox21 toxicity tasks. For task labels, see [PITH_FULL_IMAGE:figures/full_fig_p055_19.png] view at source ↗

**Figure 20.** Figure 20: Rank-biserial coefficient (rrb) effect sizes (upper triangle) and Wilcoxon signed-rank test p-values (lower triangle) of the Matthews correlation coefficient (↑ MCC) between methods on Tox21 toxicity screening tasks. For task labels, see [PITH_FULL_IMAGE:figures/full_fig_p056_20.png] view at source ↗

**Figure 21.** Figure 21: Substructure importances on the FreeSolv dataset. (a) Permutation feature importances for ECFP [PITH_FULL_IMAGE:figures/full_fig_p058_21.png] view at source ↗

**Figure 22.** Figure 22: Distribution of ∆Gsolv in the FreeSolv dataset for molecules with and without R-OH. 24 [PITH_FULL_IMAGE:figures/full_fig_p059_22.png] view at source ↗

**Figure 23.** Figure 23: Substructure importances on the Human CLint dataset [ [PITH_FULL_IMAGE:figures/full_fig_p060_23.png] view at source ↗

**Figure 24.** Figure 24: Substructure importances on the Biogen Solubility task [ [PITH_FULL_IMAGE:figures/full_fig_p061_24.png] view at source ↗

**Figure 25.** Figure 25: Substructure importances on the MUV858 task [ [PITH_FULL_IMAGE:figures/full_fig_p062_25.png] view at source ↗

**Figure 26.** Figure 26: Substructure importances on the Tox21 SR ARE task [ [PITH_FULL_IMAGE:figures/full_fig_p063_26.png] view at source ↗

**Figure 27.** Figure 27: Metric distributions for PT-GIN on Efflux. Left column contains histograms and boxplots. Right column contains normal [PITH_FULL_IMAGE:figures/full_fig_p064_27.png] view at source ↗

**Figure 28.** Figure 28: Metric distributions for PT-GIN on MUV466. Left column contains histograms and boxplots. Right column contains [PITH_FULL_IMAGE:figures/full_fig_p065_28.png] view at source ↗

**Figure 29.** Figure 29: Enrichment Factor distributions for PT-GIN on MUV466. Left column contains histograms and boxplots. Right column [PITH_FULL_IMAGE:figures/full_fig_p066_29.png] view at source ↗

**Figure 30.** Figure 30: Tanimoto similarity filtering of pre-training dataset. [PITH_FULL_IMAGE:figures/full_fig_p072_30.png] view at source ↗

read the original abstract

Molecular Graph Neural Networks (GNNs) are increasingly common in drug discovery, particularly for Quantitative Structure-Activity Relationship (QSAR) studies; yet, their superiority compared to classical molecular featurisation approaches is disputed. We report a general strategy for improving GNNs for QSAR by pre-training to predict Extended-Connectivity Fingerprints (ECFP). We validate our approach with statistical tests and challenging out-of-distribution (OOD) splits. Across five out of six Biogen benchmarks, we observed a statistically significant improvement in standard performance metrics over all evaluated baselines when using ECFP pre-trained GNNs. However, for more heterogeneous datasets and more complex endpoints, such as binding affinity prediction, pre-trained GNNs underperformed in OOD settings. Importantly, we investigated the impact of substructure-level data leakage during pre-training on downstream performance. While we identified scenarios where pre-training on ECFPs was less effective, our findings show that ECFP-based pre-training can enhance downstream OOD performance on a diverse set of practically relevant QSAR tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes pre-training GNNs to predict Extended-Connectivity Fingerprints (ECFP) as a general strategy to improve performance on QSAR tasks. It validates the approach on six Biogen benchmarks using OOD splits and statistical tests, claiming statistically significant gains over baselines on five of six tasks, while noting underperformance on heterogeneous datasets and complex endpoints such as binding affinity. The authors also examine substructure-level data leakage during pre-training and its effect on downstream OOD results.

Significance. If the empirical claims hold after addressing the controls, the work supplies a simple, fingerprint-based pre-training recipe that can narrow the gap between GNNs and classical molecular descriptors in practical drug-discovery settings. The explicit use of OOD splits and leakage checks, together with the candid reporting of failure cases on complex tasks, adds value beyond purely positive results.

major comments (2)

[Abstract / Experimental Validation] Abstract and experimental section: the headline claim of statistically significant improvement on five of six benchmarks rests on OOD splits and substructure-leakage controls, yet the manuscript supplies no quantitative details on split construction (similarity thresholds, scaffold/temporal vs. random), the precise leakage metric employed, the fraction of affected molecules, or the number of independent runs and exact statistical test underlying the significance statements. These omissions are load-bearing because residual molecular overlap could explain the reported gains.
[Results] Results section: full performance tables (means, standard deviations, p-values) are absent from the provided description, and hyper-parameter choices for both the GNN and the pre-training objective are not reported. Without these, it is impossible to reproduce or assess the robustness of the cross-baseline comparisons.

minor comments (2)

[Abstract] The abstract would be clearer if it named the specific GNN architectures and the exact performance metrics (e.g., RMSE, AUC) used for each endpoint.
[Introduction] A short related-work paragraph contrasting ECFP pre-training with existing self-supervised GNN objectives (e.g., graph contrastive or property-prediction pre-training) would help situate the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which has helped us identify areas where additional detail will improve the reproducibility and transparency of the work. We address each major comment below and have revised the manuscript accordingly.

read point-by-point responses

Referee: [Abstract / Experimental Validation] Abstract and experimental section: the headline claim of statistically significant improvement on five of six benchmarks rests on OOD splits and substructure-leakage controls, yet the manuscript supplies no quantitative details on split construction (similarity thresholds, scaffold/temporal vs. random), the precise leakage metric employed, the fraction of affected molecules, or the number of independent runs and exact statistical test underlying the significance statements. These omissions are load-bearing because residual molecular overlap could explain the reported gains.

Authors: We agree that these specifics are required to substantiate the OOD claims. In the revised manuscript we have added a new subsection in Methods that (i) describes the OOD split protocol, including scaffold-based splitting with a Tanimoto similarity threshold of 0.35 and temporal splits for the time-stamped datasets; (ii) defines the substructure leakage metric as the fraction of test-set molecules that share at least one ECFP bit with any pre-training molecule and reports the per-benchmark fractions (ranging 4–18 %); (iii) states that all experiments were repeated over 10 independent random seeds; and (iv) specifies the use of paired Wilcoxon signed-rank tests with Holm–Bonferroni correction for the significance statements (p < 0.05). The abstract has been updated to reference these controls explicitly. revision: yes
Referee: [Results] Results section: full performance tables (means, standard deviations, p-values) are absent from the provided description, and hyper-parameter choices for both the GNN and the pre-training objective are not reported. Without these, it is impossible to reproduce or assess the robustness of the cross-baseline comparisons.

Authors: We acknowledge the omission. The revised manuscript now contains complete performance tables (main text and supplementary material) that report mean ± standard deviation across the 10 runs together with the exact p-values for every baseline comparison. A new appendix lists all hyperparameters: GNN architecture (3 message-passing layers, hidden dimension 128, dropout 0.1), pre-training objective (learning rate 1e-3, 50 epochs, batch size 256), and the same settings used for the fine-tuning stage. These additions allow full reproduction of the reported results. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical pre-training evaluation

full rationale

The paper introduces an empirical pre-training procedure in which GNNs are trained to predict ECFP vectors before fine-tuning on QSAR endpoints. All performance claims rest on direct experimental comparisons against baselines on six Biogen datasets, using OOD splits and explicit substructure-leakage checks. No equations, uniqueness theorems, or first-principles derivations appear; the reported gains are statistical observations from held-out test sets rather than quantities defined by the same fitted parameters. Self-citations, if present, are not invoked to justify the core method or to forbid alternatives. The work is therefore self-contained as standard empirical machine-learning validation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on standard machine-learning assumptions about transfer from fingerprint prediction to activity prediction and on the representativeness of the chosen OOD splits; no free parameters or new entities are introduced in the abstract.

axioms (1)

domain assumption Out-of-distribution splits constructed from the Biogen datasets adequately test generalization without residual substructure leakage
The paper's validation strategy depends on these splits being leakage-free and representative of real deployment conditions.

pith-pipeline@v0.9.0 · 5503 in / 1352 out tokens · 63548 ms · 2026-05-12T05:04:47.792624+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
We report a general strategy for improving GNNs for QSAR by pre-training to predict Extended-Connectivity Fingerprints (ECFP).
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear
We opt for the latter approach to generate many evenly-sized OOD splits.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · 2 internal anchors

[1]

Nature Machine Intelligence , volume =

Molecular Contrastive Learning of Representations via Graph Neural Networks , author =. Nature Machine Intelligence , volume =

work page
[2]

Multiple Comparisons Using Rank Sums , volume=

Dunn, Olive Jean , year=. Multiple Comparisons Using Rank Sums , volume=. Technometrics , publisher=. doi:10.2307/1266041 , url =

work page doi:10.2307/1266041
[3]

Proceedings of the 31st International Conference on Neural Information Processing Systems , publisher =

Attention Is All You Need , author =. Proceedings of the 31st International Conference on Neural Information Processing Systems , publisher =

work page
[4]

Individual

Wilcoxon, Frank , year = 1945, journal =. Individual. doi:10.2307/3001968 , url =. 3001968 , eprinttype =

work page doi:10.2307/3001968 1945
[5]

Patrick , year = 2024, url =

Walters, W. Patrick , year = 2024, url =. Useful

work page 2024
[6]

Should We Really Use Post-Hoc Tests Based on Mean-Ranks? , author =. J. Mach. Learn. Res. , volume =

work page
[7]

Scipy.Stats.Tukey\_hsd --- Tukey's Honestly Significant Difference Test ---

work page
[8]

Psychometrika , volume =

Rank-Biserial Correlation , author =. Psychometrika , volume =. doi:10.1007/BF02289138 , url =

work page doi:10.1007/bf02289138
[9]

, year = 2014, journal =

Kerby, Dave S. , year = 2014, journal =. The

work page 2014
[10]

Descriptor-Based

Burns, Jackson and Zalte, Akshat and Green, William , year =. Descriptor-Based. doi:10.48550/arXiv.2506.15792 , archiveprefix =. 2506.15792 , primaryclass =

work page doi:10.48550/arxiv.2506.15792
[11]

, year =

Armstrong, Richard A. , year =. When to Use the. Ophthalmic and Physiological Optics , volume =. doi:10.1111/opo.12131 , url =

work page doi:10.1111/opo.12131
[12]

Proceedings of the 35th International Conference on Neural Information Processing Systems , volume=

Do transformers really perform badly for graph representation? , author=. Proceedings of the 35th International Conference on Neural Information Processing Systems , volume=. 2021 , url=

work page 2021
[13]

Aitken, Murray and Roland, Alex and Kleinrock, Michael , year =. Global

work page
[14]

Optuna: A Next-generation Hyperparameter Optimization Framework

Optuna: A Next-generation Hyperparameter Optimization Framework , author=. doi:10.1145/3292500.3330701 , year=

work page doi:10.1145/3292500.3330701
[15]

and Singh, Kanwar P

Alexander, John H. and Singh, Kanwar P. , year =. Inhibition of. American Journal of Cardiovascular Drugs , volume =. doi:10.2165/00129784-200505050-00001 , langid =

work page doi:10.2165/00129784-200505050-00001
[16]

and Wognum, Cas and

Ash, Jeremy R. and Wognum, Cas and. Practically Significant Method Comparison Protocols for Machine Learning in Small Molecule Drug Discovery. , journal =. 2025 , doi =

work page 2025
[17]

Breiman, Leo , year =. Random. Machine Learning , volume =. doi:10.1023/A:1010933404324 , url =

work page doi:10.1023/a:1010933404324
[18]

Unsupervised

Butina, Darko , year =. Unsupervised. Journal of Chemical Information and Computer Sciences , volume =. doi:10.1021/ci9803381 , url =

work page doi:10.1021/ci9803381
[19]

Biggin, Philip , year =

Crusius, Daniel and Cipcigan, Flaviu and C. Biggin, Philip , year =. Are We Fitting Data or Noise?. Faraday Discussions , volume =. doi:10.1039/D4FD00091A , url =

work page doi:10.1039/d4fd00091a
[20]

, year =

Dablander, Markus and Hanser, Thierry and Lambiotte, Renaud and Morris, Garrett M. , year =. Exploring. Journal of Cheminformatics , volume =. doi:10.1186/s13321-023-00708-w , url =

work page doi:10.1186/s13321-023-00708-w
[21]

, year =

Dablander, Markus and Hanser, Thierry and Lambiotte, Renaud and Morris, Garrett M. , year =. Sort &. Journal of Cheminformatics , volume =. doi:10.1186/s13321-024-00932-y , url =

work page doi:10.1186/s13321-024-00932-y
[22]

Prospective

Fang, Cheng and Wang, Ye and Grater, Richard and Kapadnis, Sudarshan and Black, Cheryl and Trapa, Patrick and Sciabola, Simone , year =. Prospective. Journal of Chemical Information and Modeling , volume =. doi:10.1021/acs.jcim.3c00160 , url =

work page doi:10.1021/acs.jcim.3c00160
[23]

and Riley, Patrick F

Gilmer, Justin and Schoenholz, Samuel S. and Riley, Patrick F. and Vinyals, Oriol and Dahl, George E. , year=. Neural message passing for Quantum chemistry , abstractNote=. Proceedings of the 34th International Conference on Machine Learning , publisher=

work page
[24]

Proceedings of the 28th

Hou, Zhenyu and Liu, Xiao and Cen, Yukuo and Dong, Yuxiao and Yang, Hongxia and Wang, Chunjie and Tang, Jie , year =. Proceedings of the 28th. doi:10.1145/3534678.3539321 , isbn =

work page doi:10.1145/3534678.3539321
[25]

Strategies for pre-training graph neural networks.arXiv preprint arXiv:1905.12265, 2019

Hu, Weihua and Liu, Bowen and Gomes, Joseph and Zitnik, Marinka and Liang, Percy and Pande, Vijay and Leskovec, Jure , year =. Strategies for. doi:10.48550/arXiv.1905.12265 , number =. 1905.12265 , archiveprefix =

work page doi:10.48550/arxiv.1905.12265 1905
[26]

2022 , journal =

Isert, Clemens and Atz, Kenneth and. 2022 , journal =. doi:10.1038/s41597-022-01390-7 , url =

work page doi:10.1038/s41597-022-01390-7 2022
[27]

Mol2vec:

Jaeger, Sabrina and Fulle, Simone and Turk, Samo , year =. Mol2vec:. Journal of Chemical Information and Modeling , volume =. doi:10.1021/acs.jcim.7b00616 , url =

work page doi:10.1021/acs.jcim.7b00616
[28]

Molecular Graph Convolutions: Moving beyond Fingerprints , shorttitle =

Kearnes, Steven and McCloskey, Kevin and Berndl, Marc and Pande, Vijay and Riley, Patrick , year =. Molecular Graph Convolutions: Moving beyond Fingerprints , shorttitle =. Journal of Computer-Aided Molecular Design , volume =. doi:10.1007/s10822-016-9938-8 , pmcid =

work page doi:10.1007/s10822-016-9938-8
[29]

2017 , pages =

Proceedings of the 31st International Conference on Neural Information Processing Systems , author =. 2017 , pages =

work page 2017
[30]

2018 , journal =

The Rise and Fall of Machine Learning Methods in Biomedical Research , author =. 2018 , journal =. doi:10.12688/f1000research.13016.2 , url =

work page doi:10.12688/f1000research.13016.2 2018
[31]

2025 , journal =

Coverage Bias in Small Molecule Machine Learning , author =. 2025 , journal =. doi:10.1038/s41467-024-55462-w , url =

work page doi:10.1038/s41467-024-55462-w 2025
[32]

Landrum, Greg , year =

work page
[33]

From Intuition to

McGibbon, Miles and Shave, Steven and Dong, Jie and Gao, Yumiao and Houston, Douglas R and Xie, Jiancong and Yang, Yuedong and Schwaller, Philippe and Blay, Vincent , year =. From Intuition to. Briefings in Bioinformatics , volume =. doi:10.1093/bib/bbad422 , url =

work page doi:10.1093/bib/bbad422
[34]

doi:10.1038/s41467-024-53751-y , url =

2024 , journal =. doi:10.1038/s41467-024-53751-y , url =

work page doi:10.1038/s41467-024-53751-y 2024
[35]

and Wen, Shi Wu and Wu, Xinyin and Acheampong, Kwabena and Liu, Aizhong , year =

Pan, Xiongfeng and Kaminga, Atipatsa C. and Wen, Shi Wu and Wu, Xinyin and Acheampong, Kwabena and Liu, Aizhong , year =. Dopamine and. Frontiers in Aging Neuroscience , volume =. doi:10.3389/fnagi.2019.00175 , url =

work page doi:10.3389/fnagi.2019.00175 2019
[36]

Scikit-Learn:

Pedregosa, Fabian and Varoquaux, Ga. Scikit-Learn:. 2011 , journal =

work page 2011
[37]

and Deane, Charlotte and Morris, Garrett M

Raja, Arun and Zhao, Hongtao and Tyrchan, Christian and Nittinger, Eva and Bronstein, Michael M. and Deane, Charlotte and Morris, Garrett M. , year =. On the

work page
[38]

Extended-connectivity fingerprints

Extended-Connectivity Fingerprints , author =. 2010 , journal =. doi:10.1021/ci100050t , url =

work page doi:10.1021/ci100050t 2010
[39]

Wallach, Izhar and Heifets, Abraham , year =. Most. Journal of Chemical Information and Modeling , volume =. doi:10.1021/acs.jcim.7b00403 , url =

work page doi:10.1021/acs.jcim.7b00403
[40]

Patrick , year =

Walters, W. Patrick , year =. We

work page
[41]

2024 , journal =

A Call for an Industry-Led Initiative to Critically Assess Machine Learning for Real-World Drug Discovery , author =. 2024 , journal =. doi:10.1038/s42256-024-00911-w , url =

work page doi:10.1038/s42256-024-00911-w 2024
[42]

doi: 10.1039/c7sc02664a

Wu, Zhenqin and Ramsundar, Bharath and N. Feinberg, Evan and Gomes, Joseph and Geniesse, Caleb and S. Pappu, Aneesh and Leswing, Karl and Pande, Vijay , year =. Chemical Science , volume =. doi:10.1039/C7SC02664A , url =

work page doi:10.1039/c7sc02664a
[43]

Analyzing

Yang, Kevin and Swanson, Kyle and Jin, Wengong and Coley, Connor and Eiden, Philipp and Gao, Hua and. Analyzing. 2019 , journal =. doi:10.1021/acs.jcim.9b00237 , url =

work page doi:10.1021/acs.jcim.9b00237 2019
[44]

Pre-Training via

Zaidi, Sheheryar and Schaarschmidt, Michael and Martens, James and Kim, Hyunjik and Teh, Yee Whye and. Pre-Training via. 2206.00133 , archiveprefix =

work page arXiv
[45]

Zdrazil, Barbara and Felix, Eloy and Hunter, Fiona and Manners, Emma J and Blackshaw, James and Corbett, Sybilla and. The. 2024 , journal =. doi:10.1093/nar/gkad1004 , url =

work page doi:10.1093/nar/gkad1004 2024
[46]

2013 , journal =

Expanding Medicinal Chemistry Space , author =. 2013 , journal =. doi:10.1016/j.drudis.2012.10.008 , url =

work page doi:10.1016/j.drudis.2012.10.008 2013
[47]

Klarner, Leo and Rudner, Tim G. J. and Reutlinger, Michael and Schindler, Torsten and Morris, Garrett M. and Deane, Charlotte and Teh, Yee Whye , date =. Drug. Proceedings of the 40th. 2023 , pages =

work page 2023
[48]

Scaffold

Guo, Qianrong and. Scaffold. Artificial. 2024 , pages =. doi:10.1007/978-3-031-72359-9_5 , abstract =

work page doi:10.1007/978-3-031-72359-9_5 2024
[49]

2025 , pages=

Journal of Cheminformatics , author=. 2025 , pages=. doi:10.1186/s13321-025-01039-8 , url =

work page doi:10.1186/s13321-025-01039-8 2025
[50]

Fey, Matthias and Lenssen, Jan Eric , year =. Fast. doi:10.48550/arXiv.1903.02428 , archiveprefix =. 1903.02428 , primaryclass =

work page internal anchor Pith review doi:10.48550/arxiv.1903.02428 1903
[51]

2019 , volume =

Proceedings of the 33rd International Conference on Neural Information Processing Systems , author =. 2019 , volume =

work page 2019
[52]

Xu, Keyulu and Hu, Weihua and Leskovec, Jure and Jegelka, Stefanie , year =. How. doi:10.48550/arXiv.1810.00826 , archiveprefix =. 1810.00826 , primaryclass =

work page internal anchor Pith review doi:10.48550/arxiv.1810.00826
[53]

Sypetkowski, Maciej and Wenkel, Frederik and Poursafaei, Farimah and Dickson, Nia and Suri, Karush and Fradkin, Philip and Beaini, Dominique , year =. On the. arXiv , langid =:2404.11568 , primaryclass =

work page arXiv
[54]

Could Graph Neural Networks Learn Better Molecular Representation for Drug Discovery?

Jiang, Dejun and Wu, Zhenxing and Hsieh, Chang-Yu and Chen, Guangyong and Liao, Ben and Wang, Zhe and Shen, Chao and Cao, Dongsheng and Wu, Jian and Hou, Tingjun , year =. Could Graph Neural Networks Learn Better Molecular Representation for Drug Discovery?. Journal of Cheminformatics , volume =. doi:10.1186/s13321-020-00479-8 , url =

work page doi:10.1186/s13321-020-00479-8
[55]

, year =

Delaney, John S. , year =. Journal of Chemical Information and Computer Sciences , volume =. doi:10.1021/ci034243x , url =

work page doi:10.1021/ci034243x
[56]

and Guthrie, J

Mobley, David L. and Guthrie, J. Peter , year =. Journal of Computer-Aided Molecular Design , volume =. doi:10.1007/s10822-014-9747-x , url =

work page doi:10.1007/s10822-014-9747-x
[57]

and Huang, Ruili and Waidyanatha, Suramya and Shinn, Paul and Collins, Bradley J

Richard, Ann M. and Huang, Ruili and Waidyanatha, Suramya and Shinn, Paul and Collins, Bradley J. and Thillainadarajah, Inthirany and Grulke, Christopher M. and Williams, Antony J. and Lougee, Ryan R. and Judson, Richard S. and Houck, Keith A. and Shobair, Mahmoud and Yang, Chihae and Rathman, James F. and Yasgar, Adam and Fitzpatrick, Suzanne C. and Sime...

work page doi:10.1021/acs.chemrestox.0c00264
[58]

2005 , journal =

Aqueous and Cosolvent Solubility Data for Drug-like Organic Compounds , author =. 2005 , journal =. doi:10.1208/aapsj070110 , url =

work page doi:10.1208/aapsj070110 2005
[59]

and Baumann, Knut , year =

Rohrer, Sebastian G. and Baumann, Knut , year =. Maximum Unbiased Validation (. Journal of Chemical Information and Modeling , volume =. doi:10.1021/ci8002649 , url =

work page doi:10.1021/ci8002649
[60]

and Huang, Xi-Ping and C

Axen, Seth D. and Huang, Xi-Ping and C. A. 2017 , month = sep, journal =. doi:10.1021/acs.jmedchem.7b00696 , url =

work page doi:10.1021/acs.jmedchem.7b00696 2017
[61]

2024 , journal =

Guiding Questions to Avoid Data Leakage in Biological Machine Learning Applications , author =. 2024 , journal =. doi:10.1038/s41592-024-02362-y , url =

work page doi:10.1038/s41592-024-02362-y 2024
[62]

Proceedings of the 37th International Conference on Neural Information Processing Systems , author =

Understanding the. Proceedings of the 37th International Conference on Neural Information Processing Systems , author =. 2023 , url =

work page 2023
[63]

Nature Machine Intelligence , pages =

Molecular Deep Learning at the Edge of Chemical Space , author =. Nature Machine Intelligence , pages =. doi:10.1038/s42256-026-01216-w , urldate =

work page doi:10.1038/s42256-026-01216-w
[64]

and Riniker, Sereina , year = 2024, journal =

Landrum, Gregory A. and Riniker, Sereina , year = 2024, journal =. Combining. doi:10.1021/acs.jcim.4c00049 , urldate =

work page doi:10.1021/acs.jcim.4c00049 2024