arxiv: 2604.26498 · v1 · submitted 2026-04-29 · 💻 cs.LG · q-bio.QM

Recognition: unknown

Do Larger Models Really Win in Drug Discovery? A Benchmark Assessment of Model Scaling in AI-Driven Molecular Property and Activity Prediction

Jinjiang Guo

Authors on Pith no claims yet

Pith reviewed 2026-05-07 11:31 UTC · model grok-4.3

classification 💻 cs.LG q-bio.QM

keywords molecular property predictiondrug discoverymodel scalingbenchmarkgraph neural networkspretrained modelsADMET predictionstructure-activity relationship

0 comments

The pith

Compact classical and graph models outperform larger pretrained ones on most molecular prediction tasks in a large benchmark.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether bigger pretrained models automatically beat smaller, specialized ones for predicting molecular properties and activities in drug discovery. It runs a large-scale comparison across 22 endpoints using structure-similarity-separated cross-validation that covers public ADMET and Tox21 data plus internal anti-infective sets. Classical machine-learning models win the most tasks, followed by graph neural networks, while pretrained sequence models win fewer; rule-based large-language-model reasoning baselines do not win under the main metrics. The results show that predictive gains are modest and depend on the specific endpoint, data regime, and validation setup rather than raw model size. A sympathetic reader sees this as evidence that scaling alone does not guarantee better molecular predictions and that alignment between representation and task matters more.

Core claim

Across 167,056 held-out evaluations on 22 molecular property and activity endpoints, classical ML models such as random forests on ECFP4 fingerprints and ExtraTrees on RDKit descriptors win ten primary-metric tasks, GNNs such as GIN and Ligandformer win nine, and pretrained molecular sequence models such as MoLFormer and ChemBERTa2 win three. Rule-based SAR reasoning with large language models does not win under the prespecified primary metrics, although it shows some gains when supplied with training-fold knowledge. These outcomes indicate that compact, specialized models remain highly effective, performance differences among model classes are often modest and endpoint-dependent, and larger

What carries the argument

Structure-similarity-separated five-fold cross-validation benchmark on 22 endpoints comparing classical ML, GNNs, pretrained sequence models, and rule-based LLM baselines.

If this is right

Performance gains from model class are modest and vary strongly with the biological endpoint.
Compact specialized models can match or exceed larger general models for standard predictive tasks.
Pretrained molecular models do not deliver a universal advantage in supervised property prediction.
Large models may still be useful for zero-shot reasoning and SAR interpretation rather than direct prediction.
Model selection in drug discovery should prioritize alignment of representation, inductive bias, and data regime over scale.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Drug-discovery teams could reduce compute costs by preferring compact models for routine property prediction while reserving larger models for hypothesis generation.
Future benchmarks should include more internal pharmaceutical data and real-world deployment metrics to test whether the observed pattern holds outside public datasets.
The modest performance gaps suggest that hybrid approaches combining classical fingerprints with lightweight graph layers may be sufficient for many endpoints.

Load-bearing premise

The chosen endpoints, similarity-separated validation protocol, and selected model implementations fairly represent typical molecular prediction problems without systematically favoring any model class.

What would settle it

A follow-up study on the same or expanded endpoints that uses a different validation split or includes more diverse external test sets and finds larger pretrained models winning a clear majority of tasks under the same primary metrics.

Figures

Figures reproduced from arXiv: 2604.26498 by Jinjiang Guo.

**Figure 2.** Figure 2: Molecular representation pathways compared in the benchmark. Fingerprints and de view at source ↗

**Figure 3.** Figure 3: Structure-similarity-separated five-fold cross-validation workflow. Molecules are stan view at source ↗

**Figure 4.** Figure 4: Proportional summary of model-family wins across ADMET, Tox21 and anti-infective view at source ↗

**Figure 5.** Figure 5: Effect of train-fold-derived SAR knowledge on LLM-SAR performance across task groups view at source ↗

read the original abstract

The rapid growth of molecular foundation models and general-purpose large language models has encouraged a scale-centric view of artificial intelligence in drug discovery, in which larger pretrained models are expected to supersede compact cheminformatics models and task-specific graph neural networks (GNNs). We test this assumption on 22 molecular property and activity endpoints, including public ADMET and Tox21 benchmarks and two internal anti-infective activity datasets. Across 167,056 held-out task--molecule evaluations under structure-similarity-separated five-fold cross-validation (37,756 ADMET, 77,946 Tox21, 49,266 anti-TB and 2,088 antimalaria), classical machine-learning (ML) models such as RF(ECFP4) and ExtraTrees(RDKit descriptors) win ten primary-metric tasks, GNNs such as GIN and Ligandformer win nine, and pretrained molecular sequence models such as MoLFormer and ChemBERTa2 win three. Rule-based SAR reasoning baselines, represented by GPT5.5-SAR and Opus4.7-SAR, do not win under the prespecified primary metrics, although train-fold-derived SAR knowledge provides measurable but uneven gains for SAR reasoning and interpretation. These results indicate that compact, specialized models remain highly effective for molecular property and activity prediction. The performance differences among classical ML, GNN and pretrained sequence models are often modest and endpoint-dependent, whereas larger or more general models do not provide a universal predictive advantage. Large models may still add value for zero-shot reasoning, SAR interpretation and hypothesis generation, but the results suggest that predictive performance depends on the alignment among molecular representation, inductive bias, data regime, endpoint biology and validation protocol.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Larger pretrained models do not dominate this 22-task molecular benchmark, where classical ML and GNNs take most wins, though the similarity-based CV split may give fingerprint models an unintended edge.

read the letter

Dear colleague, the main thing to know is that this benchmark finds classical ML models like RF on ECFP4 winning ten of the 22 tasks, GNNs winning nine, and pretrained sequence models like MoLFormer and ChemBERTa2 winning only three. The differences are often modest and vary by endpoint, so the paper pushes back on the idea that simply scaling up delivers a universal advantage in drug discovery prediction. They also tested some LLM-based SAR reasoning baselines, which did not win on the primary metrics but showed some gains in interpretation settings. What the work does well is the sheer volume of the comparison: 167k held-out evaluations across public ADMET, Tox21, and two internal anti-infective sets, all under the same structure-similarity-separated five-fold CV. Reporting concrete win counts rather than just aggregate scores makes the outcome easier to interpret, and including both public and internal data adds some practical relevance. The soft spot is the validation design itself. Splitting folds by molecular similarity, which overlaps heavily with the ECFP4 and RDKit representations used by the classical baselines, can make the held-out sets easier for those models while remaining harder for SMILES-sequence approaches. The abstract notes endpoint-dependent results, but without seeing full tables, error bars, or any ablation on alternative splits, it is hard to know how much this coupling drives the distribution of wins. The claim that larger models lack a universal edge still holds directionally, yet the protocol may limit how far the result generalizes. This paper is aimed at people building or choosing models for molecular property prediction in pharma settings. It gives usable data on when compact models remain competitive. It deserves peer review because the scale of the experiment is worth referee scrutiny on the split method and on whether the modest gaps survive different validation choices.

Referee Report

2 major / 1 minor

Summary. The manuscript benchmarks classical ML (RF(ECFP4), ExtraTrees(RDKit)), GNNs (GIN, Ligandformer), and pretrained sequence models (MoLFormer, ChemBERTa2) plus rule-based SAR baselines on 22 molecular property/activity endpoints (public ADMET, Tox21, and two internal anti-infective sets). Using structure-similarity-separated 5-fold CV across 167,056 held-out evaluations, it reports classical ML winning 10 primary-metric tasks, GNNs winning 9, and sequence models winning 3, concluding that larger or more general models do not confer a universal predictive advantage and that compact specialized models remain highly effective, with differences being modest and endpoint-dependent.

Significance. If the results are robust, the work supplies a large-scale empirical counterpoint to scale-centric assumptions in AI-driven drug discovery. The volume of evaluations (37k ADMET + 78k Tox21 + 51k internal) and inclusion of both public and proprietary endpoints provide concrete data on when representation-inductive-bias alignment matters more than model size. The explicit separation of predictive performance from zero-shot/SAR-interpretation uses of large models is a useful distinction for practitioners.

major comments (2)

[Abstract] Abstract and validation protocol: the central claim that classical ML wins 10 tasks versus 3 for pretrained sequence models rests on the structure-similarity-separated 5-fold CV. Because the split metric is likely computed from fingerprints or descriptors that directly overlap with the input representations of RF(ECFP4) and ExtraTrees(RDKit), the held-out folds may be systematically easier for classical models than for SMILES-sequence models whose pretraining does not optimize for the same local substructure similarity. This coupling risks confounding the reported win distribution with validation design rather than intrinsic scaling behavior.
[Methods] Methods (CV and model details): the manuscript should specify the exact similarity metric and threshold used for fold separation, report per-endpoint performance tables with confidence intervals or standard deviations across folds, and include at least one control experiment (e.g., random splits or similarity-agnostic splits) to test whether the observed advantage for fingerprint-based models persists.

minor comments (1)

[Abstract] Abstract: the phrase 'train-fold-derived SAR knowledge provides measurable but uneven gains' would benefit from a quantitative summary or supplementary table showing the magnitude of those gains across endpoints.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our benchmark study. We address the concerns about the validation protocol and methods below, with revisions to improve transparency and reporting.

read point-by-point responses

Referee: [Abstract] Abstract and validation protocol: the central claim that classical ML wins 10 tasks versus 3 for pretrained sequence models rests on the structure-similarity-separated 5-fold CV. Because the split metric is likely computed from fingerprints or descriptors that directly overlap with the input representations of RF(ECFP4) and ExtraTrees(RDKit), the held-out folds may be systematically easier for classical models than for SMILES-sequence models whose pretraining does not optimize for the same local substructure similarity. This coupling risks confounding the reported win distribution with validation design rather than intrinsic scaling behavior.

Authors: The structure-similarity split follows standard practice in molecular machine learning to simulate realistic generalization to novel chemical matter and avoid leakage from near-duplicates. We have revised the Methods to explicitly state that fold separation uses Tanimoto similarity on ECFP4 fingerprints (threshold now specified). While this representation overlaps with one classical baseline, the same partitions are used for all models, and pretrained sequence models are expected to capture substructure patterns from their large-scale pretraining. The endpoint-dependent results and modest effect sizes indicate that the win distribution reflects inductive bias alignment rather than an artifact of the split. We added a clarifying sentence to the abstract and a short discussion paragraph on validation design. revision: partial
Referee: [Methods] Methods (CV and model details): the manuscript should specify the exact similarity metric and threshold used for fold separation, report per-endpoint performance tables with confidence intervals or standard deviations across folds, and include at least one control experiment (e.g., random splits or similarity-agnostic splits) to test whether the observed advantage for fingerprint-based models persists.

Authors: We agree on the need for greater detail. The revised Methods now specifies the exact metric (Tanimoto similarity on ECFP4 fingerprints) and the threshold used for fold separation. We have added supplementary tables reporting mean performance and standard deviation across the five folds for every endpoint, model, and primary/secondary metric. For the control, we performed an additional analysis using random splits; under this protocol all models improve but the relative ordering (classical ML and GNNs competitive with or ahead of sequence models on most tasks) is preserved. These results are reported in a new subsection and support that the main findings are not driven solely by the similarity-based split. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical benchmark with direct data-driven comparisons

full rationale

The paper reports model performance rankings across 22 endpoints under a fixed structure-similarity-separated 5-fold CV protocol. No derivations, equations, or fitted parameters are present that could reduce to self-definition or prediction-by-construction. Central claims rest on tabulated win counts (RF/ECFP4 wins 10, GNNs win 9, sequence models win 3) obtained from explicit held-out evaluations, not from any ansatz, uniqueness theorem, or self-citation chain. The validation design is stated explicitly and applied uniformly; any potential representation-split alignment is an external methodological concern, not a reduction of the reported results to their own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical axioms, free parameters, or invented entities; the work is an empirical benchmark study.

pith-pipeline@v0.9.0 · 5619 in / 967 out tokens · 37268 ms · 2026-05-07T11:31:54.467959+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 27 canonical work pages · 4 internal anchors

[1]

doi: 10.1039/c7sc02664a

Zhenqin Wu, Bharath Ramsundar, Evan N. Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing, and Vijay Pande. Moleculenet: A benchmark for molec- ular machine learning.Chemical Science, 9:513–530, 2018. doi: 10.1039/C7SC02664A

work page doi:10.1039/c7sc02664a 2018
[2]

Huang, T

Kexin Huang, Tianfan Fu, Wenhao Gao, Yue Zhao, Yusuf Roohani, Jure Leskovec, Connor W. Coley, Cao Xiao, Jimeng Sun, and Marinka Zitnik. Therapeutics data commons: Machine learning datasets and tasks for drug discovery and development. InProceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, 2021. URLhttps:// arxiv.org/a...

work page arXiv 2021
[3]

Bronskill, Krzysztof Maziarz, Henryk Misztela, Julien Lanini, Marwin Segler, Nadine Schneider, and Marc Brockschmidt

Megan Stanley, John F. Bronskill, Krzysztof Maziarz, Henryk Misztela, Julien Lanini, Marwin Segler, Nadine Schneider, and Marc Brockschmidt. Fs-mol: A few-shot learning dataset of molecules. InNeurIPS Datasets and Benchmarks Track, 2021. URLhttps://openreview. net/forum?id=701FtuyLlAd

2021
[4]

Limitations of representation learning in small molecule property prediction.Nature Communications, 14:6394, 2023

Ana Laura Dias, Latimah Bustillo, and Tiago Rodrigues. Limitations of representation learning in small molecule property prediction.Nature Communications, 14:6394, 2023. doi: 10.1038/ s41467-023-41967-3. URLhttps://www.nature.com/articles/s41467-023-41967-3

2023
[5]

Jun Xia, Lecheng Zhang, Xiao Zhu, and Stan Z. Li. Why deep models often cannot beat non-deep counterparts on molecular property prediction?, 2023. URLhttps://arxiv.org/ abs/2306.17702

work page arXiv 2023
[6]

Benchmarking ma- chine learning in admet predictions: The practical impact of feature representations in ligand- based models.Journal of Cheminformatics, 17:108, 2025

Gintautas Kamuntavicius, Tanya Paquet, Orestis Bastas, Dainius Salkauskas, Alvaro Prat, Hisham Abdel Aty, Aurimas Pabrinkis, Povilas Norvaisas, and Roy Tal. Benchmarking ma- chine learning in admet predictions: The practical impact of feature representations in ligand- based models.Journal of Cheminformatics, 17:108, 2025. doi: 10.1186/s13321-025-01041-0....

work page doi:10.1186/s13321-025-01041-0 2025
[7]

Seyone Chithrananda, Gabriel Grand, Bharath Ramsun- dar, et al

Seyone Chithrananda, Gabriel Grand, and Bharath Ramsundar. Chemberta: Large-scale self- supervised pretraining for molecular property prediction, 2020. URLhttps://arxiv.org/ abs/2010.09885

work page arXiv 2020
[8]

Seyone Chithrananda, Gabriel Grand, Bharath Ramsun- dar, et al

Walid Ahmad, Eric Simon, Seyone Chithrananda, Gabriel Grand, and Bharath Ramsundar. Chemberta-2: Towards chemical foundation models, 2022. URLhttps://arxiv.org/abs/ 2209.01712

work page arXiv 2022
[9]

Large-scale chemical language representations capture molecular structure and properties.Nature Machine Intelligence, 2022

Jerret Ross, Brian Belgodere, Vijil Chenthamarakshan, Inkit Padhi, Youssef Mroueh, and Payel Das. Large-scale chemical language representations capture molecular structure and properties.Nature Machine Intelligence, 2022. URLhttps://www.nature.com/articles/ s42256-022-00580-7

2022
[10]

Tice, Christopher P

Raymond R. Tice, Christopher P. Austin, Robert J. Kavlock, and John R. Bucher. Tox21 challenge to build predictive models of nuclear receptor and stress response pathways as medi- ated by exposure to environmental chemicals and drugs.Frontiers in Environmental Science, 16
[11]

URLhttps://www.frontiersin.org/journals/environmental-science/articles/ 10.3389/fenvs.2015.00085/full

work page doi:10.3389/fenvs.2015.00085/full 2015
[12]

Qsar modeling of tox21 challenge stress response and nuclear receptor signaling toxicity assays.Frontiers in Environmental Science, 2016

Andreas Mayr, Gunter Klambauer, Thomas Unterthiner, and Sepp Hochreiter. Qsar modeling of tox21 challenge stress response and nuclear receptor signaling toxicity assays.Frontiers in Environmental Science, 2016. URLhttps://www.frontiersin.org/articles/10.3389/ fenvs.2016.00003/full

work page arXiv 2016
[13]

Lemenze, Emily C

Poonam Chitale, Alexander D. Lemenze, Emily C. Fogarty, Avi Shah, Courtney Grady, Aubrey R. Odom-Mabey, W. Evan Johnson, Jason H. Yang, A. Murat Eren, Roland Brosch, Pradeep Kumar, and David Alland. A comprehensive update to the mycobac- terium tuberculosis h37rv reference genome.Nature Communications, 13:7068, 2022. doi: 10.1038/s41467-022-34853-x

work page doi:10.1038/s41467-022-34853-x 2022
[14]

Wallace, Vineet Kumar, Ursula Pieper, Andrej Sali, Jeremy R

Francisco Mart ’inez-Jim ’enez, George Papadatos, Li Yang, Iain M. Wallace, Vineet Kumar, Ursula Pieper, Andrej Sali, Jeremy R. Brown, John P. Overington, and Marc A. Marti-Renom. Target prediction for an open access set of compounds active against mycobacterium tuberculosis.PLoS Computational Biology, 9(10):e1003253, 2013. doi: 10.1371/journal.pcbi.1003253

work page doi:10.1371/journal.pcbi.1003253 2013
[15]

Performance and Analysis of the Alchemical Transfer Method for Binding-Free-Energy Predictions of Diverse Ligands

Thomas Lane, Daniel P. Russo, Kimberley M. Zorn, Alex M. Clark, Alexandru Korotcov, Valery Tkachenko, Robert C. Reynolds, Alexander L. Perryman, Joel S. Freundlich, and Sean Ekins. Comparing and validating machine learning models for mycobacterium tuber- culosis drug discovery.Molecular Pharmaceutics, 15(10):4346–4360, 2018. doi: 10.1021/acs. molpharmaceu...

work page doi:10.1021/acs 2018
[16]

Genome-wide functional screening of drug-resistance genes in plasmodium falciparum.Nature Communications, 13:6163, 2022

Shiroh Iwanaga, Rie Kubota, Tsubasa Nishi, Sumalee Kamchonwongpaisan, Somdet Srichairatanakool, Naoaki Shinzawa, Din Syafruddin, Masao Yuda, and Chairat Uthaipibull. Genome-wide functional screening of drug-resistance genes in plasmodium falciparum.Nature Communications, 13:6163, 2022. doi: 10.1038/s41467-022-33804-w

work page doi:10.1038/s41467-022-33804-w 2022
[17]

Johansson, Structural and electronic relationships between the lanthanide and actinide elements, Hy- perfine Interactions 128 (2000) 41–66

Leo Breiman. Random forests.Machine Learning, 45(1):5–32, 2001. doi: 10.1023/A: 1010933404324

work page doi:10.1023/a: 2001
[18]

Extremely randomized trees.Machine Learning, 63(1):3–42, 2006

Pierre Geurts, Damien Ernst, and Louis Wehenkel. Extremely randomized trees.Machine Learning, 63(1):3–42, 2006. doi: 10.1007/s10994-006-6226-1

work page doi:10.1007/s10994-006-6226-1 2006
[19]

Friedman

Jerome H. Friedman. Greedy function approximation: A gradient boosting machine.The Annals of Statistics, 29(5):1189–1232, 2001. doi: 10.1214/aos/1013203451

work page doi:10.1214/aos/1013203451 2001
[20]

Machine Learning , year =

Corinna Cortes and Vladimir Vapnik. Support-vector networks.Machine Learning, 20:273– 297, 1995. doi: 10.1007/BF00994018

work page doi:10.1007/bf00994018 1995
[21]

XGBoost: A Scalable Tree Boosting System

Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 785–794, 2016. doi: 10.1145/2939672.2939785

work page doi:10.1145/2939672.2939785 2016
[22]

Extended-connectivity fingerprints

David Rogers and Mathew Hahn. Extended-connectivity fingerprints.Journal of Chemical Information and Modeling, 50(5):742–754, 2010. doi: 10.1021/ci100050t. 17

work page doi:10.1021/ci100050t 2010
[23]

Sereina Riniker and Gregory A. Landrum. Similarity maps – a visualization strategy for molecular fingerprints and machine-learning methods.Journal of Cheminformatics, 5:43, 2013. doi: 10.1186/1758-2946-5-43

work page doi:10.1186/1758-2946-5-43 2013
[24]

Schoenholz, Patrick F

Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. Neural message passing for quantum chemistry. InProceedings of the 34th International Con- ference on Machine Learning, volume 70 ofProceedings of Machine Learning Research, pages 1263–1272, 2017. URLhttps://proceedings.mlr.press/v70/gilmer17a.html

2017
[25]

Geometric deep learning for molecular property prediction: A review.Nature Machine Intelligence, 2023

Xiaomin Fang, Lihang Liu, et al. Geometric deep learning for molecular property prediction: A review.Nature Machine Intelligence, 2023

2023
[26]

Semi-Supervised Classification with Graph Convolutional Networks

Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. InInternational Conference on Learning Representations, 2017. URLhttps:// arxiv.org/abs/1609.02907

work page internal anchor Pith review arXiv 2017
[27]

Graph Attention Networks

Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. InInternational Conference on Learning Repre- sentations, 2018. URLhttps://arxiv.org/abs/1710.10903

work page internal anchor Pith review arXiv 2018
[28]

How Powerful are Graph Neural Networks?

Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? InInternational Conference on Learning Representations, 2019. URLhttps: //arxiv.org/abs/1810.00826

work page internal anchor Pith review arXiv 2019
[29]

Ligandformer: A Graph Neural Network for Predicting Compound Property with Robust Interpretation

Jinjiang Guo, Qi Liu, Han Guo, and Xi Lu. Ligandformer: A graph neural network for predicting compound property with robust interpretation, 2022. URLhttps://arxiv.org/ abs/2202.10873

work page internal anchor Pith review Pith/arXiv arXiv 2022
[30]

V., Wiest, O., and Zhang, X

Taicheng Guo, Kehan Guo, Bozhao Nan, Zixing Liang, Zhichun Guo, Nitesh V. Chawla, Olaf Wiest, and Xiangliang Zhang. What can large language models do in chemistry? a comprehensive benchmark on eight tasks.arXiv preprint arXiv:2305.18365, 2023. doi: 10. 48550/arXiv.2305.18365

work page arXiv 2023
[31]

Smiles, a chemical language and information system

David Weininger. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules.Journal of Chemical Information and Computer Sciences, 28(1):31–36, 1988. doi: 10.1021/ci00057a005

work page doi:10.1021/ci00057a005 1988
[32]

Bemis and Mark A

Guy W. Bemis and Mark A. Murcko. The properties of known drugs. 1. molecular frameworks. Journal of Medicinal Chemistry, 39(15):2887–2893, 1996. doi: 10.1021/jm9602928

work page doi:10.1021/jm9602928 1996
[33]

Best practices for qsar model development, validation, and exploitation

Alexander Tropsha. Best practices for qsar model development, validation, and exploitation. Molecular Informatics, 29(6–7):476–488, 2010. doi: 10.1002/minf.201000061

work page doi:10.1002/minf.201000061 2010
[34]

Drug discovery with explain- able artificial intelligence.Nature Machine Intelligence, 2:573–584, 2020

Jos ’e Jim ’enez-Luna, Francesca Grisoni, and Gisbert Schneider. Drug discovery with explain- able artificial intelligence.Nature Machine Intelligence, 2:573–584, 2020. doi: 10.1038/ s42256-020-00236-4. 18

2020