Recognition: no theorem link
The limits of bio-molecular modeling with large language models : a cross-scale evaluation
Pith reviewed 2026-05-13 20:06 UTC · model grok-4.3
The pith
A 26-task benchmark reveals large language models remain weak on bio-molecular regression despite strengths in classification.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
BioMol-LLM-Bench evaluation of thirteen LLMs demonstrates systematic gaps between model outputs and mechanistic understanding of multi-scale bio-molecular systems, shown through limited or negative effects of chain-of-thought data, advantages of hybrid mamba-attention architectures on long sequences, specialization-generalization trade-offs after supervised fine-tuning, and reliable classification paired with persistent weakness on regression tasks.
What carries the argument
BioMol-LLM-Bench, the proposed cross-scale benchmark framework consisting of 26 downstream tasks at four difficulty levels with tool augmentation.
If this is right
- Chain-of-thought data should be used sparingly or omitted for biological tasks to avoid performance losses.
- Hybrid mamba-attention models merit priority when processing extended bio-molecular sequences.
- Supervised fine-tuning requires safeguards to retain generalization across molecular scales.
- Current LLMs suit classification work on bio-molecular properties but require further advances for accurate regression.
Where Pith is reading between the lines
- Architectures that embed explicit physical constraints could close the regression gap left by pure language modeling.
- Expanding the benchmark with direct molecular-dynamics trajectories would test whether the observed limits hold under more mechanistic conditions.
- Training mixtures that interleave experimental measurements with simulation data might reduce the specialization-generalization trade-off.
Load-bearing premise
The twenty-six chosen tasks sufficiently represent the mechanistic challenges of real multi-scale bio-molecular modeling.
What would settle it
An LLM that matches or exceeds baseline accuracy on held-out regression tasks such as quantitative prediction of binding free energies or reaction rates within the same benchmark setup would directly challenge the reported weakness.
read the original abstract
The modeling of bio-molecular system across molecular scales remains a central challenge in scientific research. Large language models (LLMs) are increasingly applied to bio-molecular discovery, yet systematic evaluation across multi-scale biological problems and rigorous assessment of their tool-augmented capabilities remain limited. We reveal a systematic gap between LLM performance and mechanistic understanding through the proposed cross-scale bio-molecular benchmark: BioMol-LLM-Bench, a unified framework comprising 26 downstream tasks that covers 4 distinct difficulty levels, and computational tools are integrated for a more comprehensive evaluation. Evaluation on 13 representative models reveals 4 main findings: chain-of-thought data provides limited benefit and may even reduce performance on biological tasks; hybrid mamba-attention architectures are more effective for long bio-molecular sequences; supervised fine-tuning improves specialization at the cost of generalization; and current LLMs perform well on classification tasks but remain weak on challenging regression tasks. Together, these findings provide practical guidance for future LLM-based modeling of molecular systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces BioMol-LLM-Bench, a unified benchmark with 26 downstream tasks spanning 4 difficulty levels for cross-scale bio-molecular modeling. It evaluates 13 representative LLMs and reports four findings: chain-of-thought data offers limited or negative benefit on biological tasks; hybrid mamba-attention architectures outperform on long sequences; supervised fine-tuning boosts specialization at the expense of generalization; and LLMs excel at classification but struggle with challenging regression tasks. The authors conclude this reveals a systematic gap between LLM performance and mechanistic understanding.
Significance. If the benchmark tasks genuinely probe multi-scale biophysical mechanisms rather than surface statistics, the empirical results across diverse models would provide actionable guidance for LLM architectures and training strategies in molecular biology and drug discovery. The broad model coverage is a positive aspect of the evaluation.
major comments (2)
- [Benchmark construction] The central claim of a 'systematic gap between LLM performance and mechanistic understanding' depends on the 26 tasks in BioMol-LLM-Bench requiring capture of physical cross-scale phenomena. The paper groups tasks into four difficulty levels but provides no explicit mapping demonstrating that higher levels enforce biophysical constraints such as energy conservation, force-field consistency, or long-range allostery (see abstract and benchmark construction description).
- [Abstract] Abstract: The description of the benchmark and findings omits details on task selection criteria, the statistical tests used to support the four conclusions, and error bars on reported performance metrics, which limits assessment of the robustness of the observed gaps.
minor comments (1)
- [Results] The four findings are listed clearly in the abstract but would be strengthened by explicit quantitative comparisons (e.g., performance deltas) in the main text.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. The feedback highlights important opportunities to strengthen the connection between benchmark tasks and biophysical principles as well as to improve clarity in the abstract. We address each major comment below and will incorporate revisions accordingly.
read point-by-point responses
-
Referee: [Benchmark construction] The central claim of a 'systematic gap between LLM performance and mechanistic understanding' depends on the 26 tasks in BioMol-LLM-Bench requiring capture of physical cross-scale phenomena. The paper groups tasks into four difficulty levels but provides no explicit mapping demonstrating that higher levels enforce biophysical constraints such as energy conservation, force-field consistency, or long-range allostery (see abstract and benchmark construction description).
Authors: We appreciate this observation. The difficulty levels were designed to progressively incorporate tasks that demand modeling of cross-scale interactions (e.g., level 3–4 tasks include multi-domain proteins and allosteric effects), which in practice require capturing biophysical consistency beyond surface statistics. However, we acknowledge that an explicit mapping table linking each level to specific constraints such as energy conservation or force-field consistency was not included. We will add a dedicated subsection (and accompanying table) in the revised benchmark construction section that explicitly maps task levels to the biophysical principles they probe, with concrete examples drawn from the 26 tasks. This will directly support the central claim. revision: yes
-
Referee: [Abstract] Abstract: The description of the benchmark and findings omits details on task selection criteria, the statistical tests used to support the four conclusions, and error bars on reported performance metrics, which limits assessment of the robustness of the observed gaps.
Authors: We agree that the abstract would benefit from greater specificity. In the revised manuscript we will expand the abstract to briefly note: (i) task selection criteria (coverage across molecular scales from sequence to structure-function with four graded difficulty levels), (ii) the statistical tests employed (paired t-tests and Wilcoxon rank-sum tests for model comparisons, with p-values reported in the main text), and (iii) that all performance metrics include error bars (standard deviation across three random seeds, shown in Figures 2–5). These details are already present in the methods and results sections; the abstract revision will make them visible at a glance without exceeding length limits. revision: yes
Circularity Check
No circularity: purely empirical benchmark evaluation
full rationale
The paper introduces BioMol-LLM-Bench as a new collection of 26 tasks across difficulty levels and reports performance of 13 external LLMs on them. All four main findings are direct observations from these runs (e.g., CoT benefit, architecture comparisons, SFT effects, classification vs. regression gaps). No equations, fitted parameters, or predictions are defined in terms of the target results; the benchmark tasks and metrics are external to any model output. Self-citations, if present, are not load-bearing for any derivation. The evaluation is therefore self-contained against external models and tasks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The 26 tasks in BioMol-LLM-Bench adequately represent multi-scale bio-molecular modeling problems
Reference graph
Works this paper leans on
-
[1]
Current opinion in chemical biology8(1), 91–97 (2004)
Kortemme, T., Baker, D.: Computational design of protein–protein interactions. Current opinion in chemical biology8(1), 91–97 (2004)
work page 2004
-
[2]
science 319(5867), 1215–1220 (2008)
Gibson, D.G., Benders, G.A., Andrews-Pfannkoch, C., Denisova, E.A., Baden-Tillson, H., Zaveri, J., Stockwell, T.B., Brownley, A., Thomas, D.W., Algire, M.A.,et al.: Complete chemical synthesis, assembly, and cloning of a mycoplasma genitalium genome. science 319(5867), 1215–1220 (2008)
work page 2008
-
[3]
Nature459(7244), 239–242 (2009)
Powner, M.W., Gerland, B., Sutherland, J.D.: Synthesis of activated pyrimidine ribonu- cleotides in prebiotically plausible conditions. Nature459(7244), 239–242 (2009)
work page 2009
-
[4]
Science335(6070), 831–834 (2012)
Douglas, S.M., Bachelet, I., Church, G.M.: A logic-gated nanorobot for targeted transport of molecular payloads. Science335(6070), 831–834 (2012)
work page 2012
-
[5]
Nature428(6982), 487–492 (2004)
Langer, R., Tirrell, D.A.: Designing materials for biology and medicine. Nature428(6982), 487–492 (2004)
work page 2004
-
[6]
Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[7]
Gemini: A Family of Highly Capable Multimodal Models
Team, G., Anil, R., Borgeaud, S., Alayrac, J.-B., Yu, J., Soricut, R., Schalkwyk, J., Dai, A.M., Hauth, A., Millican, K., et al.: Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[8]
Gemma: Open Models Based on Gemini Research and Technology
Team, G., Mesnard, T., Hardin, C., Dadashi, R., Bhupatiraju, S., Pathak, S., Sifre, L., Rivi`ere, M., Kale, M.S., Love, J., et al.: Gemma: Open models based on gemini research and technology. arXiv preprint arXiv:2403.08295 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[9]
Nature 645(8081), 633–638 (2025)
Guo, D., Yang, D., Zhang, H., Song, J., Wang, P., Zhu, Q., Xu, R., Zhang, R., Ma, S., Bi, X., et al.: Deepseek-r1 incentivizes reasoning in llms through reinforcement learning. Nature 645(8081), 633–638 (2025)
work page 2025
-
[10]
Bai, J., Bai, S., Chu, Y ., Cui, Z., Dang, K., Deng, X., Fan, Y ., Ge, W., Han, Y ., Huang, F., et al.: Qwen technical report. arXiv preprint arXiv:2309.16609 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[11]
LLaMA: Open and Efficient Foundation Language Models
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozi`ere, B., 15 Goyal, N., Hambro, E., Azhar, F., et al.: Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[12]
arXiv preprint arXiv:2508.14444 (2025)
Basant, A., Khairnar, A., Paithankar, A., Khattar, A., Renduchintala, A., Malte, A., Bercovich, A., Hazare, A., Rico, A., Ficek, A., et al.: Nvidia nemotron nano 2: An accurate and efficient hybrid mamba-transformer reasoning model. arXiv preprint arXiv:2508.14444 (2025)
-
[13]
Abdin, M., Aneja, J., Behl, H., Bubeck, S., Eldan, R., Gunasekar, S., Harrison, M., Hewett, R.J., Javaheripi, M., Kauffmann, P., et al.: Phi-4 technical report. arXiv preprint arXiv:2412.08905 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[14]
M., Cox, S., Schilter, O., Baldassari, C., White, A
Bran, A.M., Cox, S., Schilter, O., Baldassari, C., White, A.D., Schwaller, P.: Chemcrow: Augmenting large-language models with chemistry tools. arXiv preprint arXiv:2304.05376 (2023)
-
[15]
Beltagy, I., Lo, K., Cohan, A.: Scibert: A pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3615–3620 (2019)
work page 2019
-
[16]
arXiv preprint arXiv:2504.06196 (2025)
Wang, E., Schmidgall, S., Jaeger, P.F., Zhang, F., Pilgrim, R., Matias, Y ., Barral, J., Fleet, D., Azizi, S.: Txgemma: Efficient and agentic llms for therapeutics. arXiv preprint arXiv:2504.06196 (2025)
-
[17]
In: Findings of the Association for Computational Linguistics: ACL 2024, pp
Pei, Q., Wu, L., Gao, K., Liang, X., Fang, Y ., Zhu, J., Xie, S., Qin, T., Yan, R.: Biot5+: Towards generalized biological understanding with iupac integration and multi-task tuning. In: Findings of the Association for Computational Linguistics: ACL 2024, pp. 1216–1240 (2024)
work page 2024
-
[18]
Xia, Y ., Jin, P., Xie, S., He, L., Cao, C., Luo, R., Liu, G., Wang, Y ., Liu, Z., Chen, Y .-J., et al.: Naturelm: Deciphering the language of nature for scientific discovery. arXiv e-prints, 2502 (2025)
work page 2025
-
[19]
Nature Machine Intelligence7(7), 1154–1167 (2025)
Zhuang, X., Ding, K., Lyu, T., Jiang, Y ., Li, X., Xiang, Z., Wang, Z., Qin, M., Feng, K., Wang, J.,et al.: Advancing biomolecular understanding and design following human instructions. Nature Machine Intelligence7(7), 1154–1167 (2025)
work page 2025
-
[20]
In: First Conference on Language Modeling
Rein, D., Hou, B.L., Stickland, A.C., Petty, J., Pang, R.Y ., Dirani, J., Michael, J., Bow- man, S.R.: Gpqa: A graduate-level google-proof q&a benchmark. In: First Conference on Language Modeling
-
[21]
Advances in Neural Information Processing Systems37, 95266–95290 (2024)
Wang, Y ., Ma, X., Zhang, G., Ni, Y ., Chandra, A., Guo, S., Ren, W., Arulraj, A., He, X., Jiang, Z.,et al.: Mmlu-pro: A more robust and challenging multi-task language understand- ing benchmark. Advances in Neural Information Processing Systems37, 95266–95290 (2024)
work page 2024
-
[22]
arXiv preprint arXiv:2307.10635
Wang, X., Hu, Z., Lu, P., Zhu, Y ., Zhang, J., Subramaniam, S., Loomba, A.R., Zhang, S., Sun, Y ., Wang, W.: Scibench: Evaluating college-level scientific problem-solving abilities of large language models. arXiv preprint arXiv:2307.10635 (2023) 16
-
[23]
In: Proceedings of the AAAI Conference on Artificial Intelligence, vol
Sun, L., Han, Y ., Zhao, Z., Ma, D., Shen, Z., Chen, B., Chen, L., Yu, K.: Scieval: A multi- level large language model evaluation benchmark for scientific research. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 19053–19061 (2024)
work page 2024
-
[24]
Olea, C., Tucker, H., Phelan, J., Pattison, C., Zhang, S., Lieb, M., Schmidt, D., White, J.: Evaluating persona prompting for question answering tasks. In: Proceedings of Th e 10th International Conference on Artificial Intelligence and Soft Computing, Sydney, Australia (2024)
work page 2024
-
[25]
Phan, L., Gatti, A., Han, Z., Li, N., Hu, J., Zhang, H., Zhang, C.B.C., Shaaban, M., Ling, J., Shi, S., et al.: Humanity’s last exam. arXiv preprint arXiv:2501.14249 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[26]
International Journal on Digital Libraries 23(3), 289–301 (2022)
Saikh, T., Ghosal, T., Mittal, A., Ekbal, A., Bhattacharyya, P.: Scienceqa: A novel resource for question answering on scholarly articles. International Journal on Digital Libraries 23(3), 289–301 (2022)
work page 2022
-
[27]
arXiv preprint arXiv:2407.10362 (2024)
Laurent, J.M., Janizek, J.D., Ruzo, M., Hinks, M.M., Hammerling, M.J., Narayanan, S., Ponnapati, M., White, A.D., Rodriques, S.G.: Lab-bench: Measuring capabilities of language models for biology research. arXiv preprint arXiv:2407.10362 (2024)
-
[28]
In: 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp
Shen, Y ., Chen, Z., Mamalakis, M., He, L., Xia, H., Li, T., Su, Y ., He, J., Wang, Y .G.: A fine-tuning dataset and benchmark for large language models for protein understanding. In: 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 2390–2395 (2024). IEEE
work page 2024
-
[29]
Bioinformatics26(23), 3000–3001 (2010)
Walker, T., Grulke, C.M., Pozefsky, D., Tropsha, A.: Chembench: a cheminformatics workbench. Bioinformatics26(23), 3000–3001 (2010)
work page 2010
-
[30]
arXiv preprint arXiv:2402.09391 (2024)
Yu, B., Baker, F.N., Chen, Z., Ning, X., Sun, H.: Llasmol: Advancing large language mod- els for chemistry with a large-scale, comprehensive, high-quality instruction tuning dataset. arXiv preprint arXiv:2402.09391 (2024)
-
[31]
Chemical science 9(2), 513–530 (2018)
Wu, Z., Ramsundar, B., Feinberg, E.N., Gomes, J., Geniesse, C., Pappu, A.S., Leswing, K., Pande, V .: Moleculenet: a benchmark for molecular machine learning. Chemical science 9(2), 513–530 (2018)
work page 2018
-
[32]
arXiv preprint arXiv:2310.00115 (2023)
Zhu, Y ., Hwang, J., Adams, K., Liu, Z., Nan, B., Stenfors, B., Du, Y ., Chauhan, J., Wiest, O., Isayev, O., et al.: Learning over molecular conformer ensembles: Datasets and benchmarks. arXiv preprint arXiv:2310.00115 (2023)
-
[33]
Advances in neural information processing systems32(2019)
Rao, R., Bhattacharya, N., Thomas, N., Duan, Y ., Chen, P., Canny, J., Abbeel, P., Song, Y .: Evaluating protein transfer learning with tape. Advances in neural information processing systems32(2019)
work page 2019
-
[34]
Dallago, C., Mou, J., Johnston, K.E., Wittmann, B.J., Bhattacharya, N., Goldman, S., Madani, A., Yang, K.K.: Flip: Benchmark tasks in fitness landscape inference for proteins. bioRxiv, 2021–11 (2021)
work page 2021
-
[35]
Gao, S., Zhu, R., Kong, Z., Noori, A., Su, X., Ginder, C., Tsiligkaridis, T., Zitnik, M.: Txagent: an ai agent for therapeutic reasoning across a universe of tools. arXiv preprint 17 arXiv:2503.10970 (2025)
-
[36]
Nature Computational Science 5(10), 962–972 (2025)
Ding, K., Yu, J., Huang, J., Yang, Y ., Zhang, Q., Chen, H.: Scitoolagent: a knowledge- graph-driven scientific agent for multitool integration. Nature Computational Science 5(10), 962–972 (2025)
work page 2025
-
[37]
Advances in neural information processing systems 36, 64331–64379 (2023)
Notin, P., Kollasch, A., Ritter, D., Van Niekerk, L., Paul, S., Spinner, H., Rollins, N., Shaw, A., Orenbuch, R., Weitzman, R.,et al.: Proteingym: Large-scale benchmarks for protein fitness prediction and design. Advances in neural information processing systems 36, 64331–64379 (2023)
work page 2023
-
[38]
Nucleic acids research50(W1), 228–234 (2022)
Thumuluri, V ., Almagro Armenteros, J.J., Johansen, A.R., Nielsen, H., Winther, O.: Deeploc 2.0: multi-label subcellular localization prediction using protein language models. Nucleic acids research50(W1), 228–234 (2022)
work page 2022
-
[39]
Bioinformatics36(22-23), 5545–5547 (2020)
Huang, K., Fu, T., Glass, L.M., Zitnik, M., Xiao, C., Sun, J.: Deeppurpose: a deep learn- ing library for drug–target interaction prediction. Bioinformatics36(22-23), 5545–5547 (2020)
work page 2020
-
[40]
arXiv preprint arXiv:2506.04235 (2025)
Zhao, X., Tang, Y .-C., Singh, A., Cantu, V .J., An, K., Lee, J., Stogsdill, A.E., Hamdi, I.M., Ramesh, A.K., An, Z., et al.: Abbibench: A benchmark for antibody binding affinity maturation and design. arXiv preprint arXiv:2506.04235 (2025)
-
[41]
Scientific data6(1), 143 (2019)
Sorkun, M.C., Khetan, A., Er, S.: Aqsoldb, a curated reference set of aqueous solubility and 2d descriptors for a diverse set of compounds. Scientific data6(1), 143 (2019)
work page 2019
-
[42]
Journal of Chemical Information and Modeling64(2), 340–347 (2024)
Li, G., Yao, S., Fan, L.: Prostage: Predicting effects of mutations on protein stability by using protein embeddings and graph convolutional networks. Journal of Chemical Information and Modeling64(2), 340–347 (2024)
work page 2024
-
[43]
Democratizing ai scientists using tooluniverse.arXiv preprint arXiv:2509.23426, 2025
Gao, S., Zhu, R., Sui, P., Kong, Z., Aldogom, S., Huang, Y ., Noori, A., Shamji, R., Par- vataneni, K., Tsiligkaridis, T., et al.: Democratizing ai scientists using tooluniverse. arXiv preprint arXiv:2509.23426 (2025)
-
[44]
Nature630(8016), 493–500 (2024)
Abramson, J., Adler, J., Dunger, J., Evans, R., Green, T., Pritzel, A., Ronneberger, O., Will- more, L., Ballard, A.J., Bambrick, J.,et al.: Accurate structure prediction of biomolecular interactions with alphafold 3. Nature630(8016), 493–500 (2024)
work page 2024
-
[45]
Bioinformatics40(7), 416 (2024)
Swanson, K., Walther, P., Leitz, J., Mukherjee, S., Wu, J.C., Shivnaraine, R.V ., Zou, J.: Admet-ai: a machine learning admet platform for evaluation of large-scale chemical libraries. Bioinformatics40(7), 416 (2024)
work page 2024
-
[46]
Nature biotechnology42(2), 243–246 (2024)
Van Kempen, M., Kim, S.S., Tumescheit, C., Mirdita, M., Lee, J., Gilchrist, C.L., S ¨oding, J., Steinegger, M.: Fast and accurate protein structure search with foldseek. Nature biotechnology42(2), 243–246 (2024)
work page 2024
-
[47]
https://doi.org/10.6084/m9.figshare.25459573
Maziarz, K.: USPTO-50K (2024). https://doi.org/10.6084/m9.figshare.25459573
-
[48]
18 Nucleic acids research52(D1), 1265–1275 (2024)
Knox, C., Wilson, M., Klinger, C.M., Franklin, M., Oler, E., Wilson, A., Pon, A., Cox, J., Chin, N.E., Strawbridge, S.A.,et al.: Drugbank 6.0: the drugbank knowledgebase for 2024. 18 Nucleic acids research52(D1), 1265–1275 (2024)
work page 2024
-
[49]
Nucleic acids research28(1), 235–242 (2000)
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic acids research28(1), 235–242 (2000)
work page 2000
-
[50]
Journal of chemical information and computer sciences44(3), 1000–1005 (2004)
Delaney, J.S.: Esol: estimating aqueous solubility directly from molecular structure. Journal of chemical information and computer sciences44(3), 1000–1005 (2004)
work page 2004
-
[51]
Journal of computer-aided molecular design28(7), 711–720 (2014)
Mobley, D.L., Guthrie, J.P.: Freesolv: a database of experimental and calculated hydration free energies, with input files. Journal of computer-aided molecular design28(7), 711–720 (2014)
work page 2014
-
[52]
The Journal of chemical physics143(8) (2015)
Ramakrishnan, R., Hartmann, M., Tapavicza, E., V on Lilienfeld, O.A.: Electronic spectra from tddft and machine learning in chemical space. The Journal of chemical physics143(8) (2015)
work page 2015
-
[53]
Journal of chemical information and modeling52(11), 2864–2875 (2012)
Ruddigkeit, L., Van Deursen, R., Blum, L.C., Reymond, J.-L.: Enumeration of 166 billion organic small molecules in the chemical universe database gdb-17. Journal of chemical information and modeling52(11), 2864–2875 (2012)
work page 2012
-
[54]
Journal of chemical information and modeling 52(6), 1686–1697 (2012)
Martins, I.F., Teixeira, A.L., Pinheiro, L., Falcao, A.O.: A bayesian approach to in silico blood-brain barrier penetration modeling. Journal of chemical information and modeling 52(6), 1686–1697 (2012)
work page 2012
-
[55]
arXiv preprint arXiv:2103.09430 (2021)
Hu, W., Fey, M., Ren, H., Nakata, M., Dong, Y ., Leskovec, J.: Ogb-lsc: A large-scale challenge for machine learning on graphs. arXiv preprint arXiv:2103.09430 (2021)
-
[56]
Nature methods17(5), 495–503 (2020)
Jarzab, A., Kurzawa, N., Hopf, T., Moerch, M., Zecha, J., Leijten, N., Bian, Y ., Musiol, E., Maschberger, M., Stoehr, G.,et al.: Meltome atlas—thermal proteome stability across the tree of life. Nature methods17(5), 495–503 (2020)
work page 2020
-
[57]
Briefings in Bioinformatics23(2), 555 (2022)
Pancotti, C., Benevenuta, S., Birolo, G., Alberini, V ., Repetto, V ., Sanavia, T., Capriotti, E., Fariselli, P.: Predicting protein stability changes upon single-point mutation: a thorough comparison of the available tools on a new dataset. Briefings in Bioinformatics23(2), 555 (2022)
work page 2022
-
[58]
Wu, N.C., Dai, L., Olson, C.A., Lloyd-Smith, J.O., Sun, R.: Adaptation in protein fitness landscapes is facilitated by indirect paths. elife5, 16965 (2016)
work page 2016
-
[59]
Nature533(7603), 397–401 (2016)
Sarkisyan, K.S., Bolotin, D.A., Meer, M.V ., Usmanova, D.R., Mishin, A.S., Sharonov, G.V ., Ivankov, D.N., Bozhanova, N.G., Baranov, M.S., Soylemez, O.,et al.: Local fitness landscape of the green fluorescent protein. Nature533(7603), 397–401 (2016)
work page 2016
-
[60]
Nature communications12(1), 3168 (2021)
Gligorijevi ´c, V ., Renfrew, P.D., Kosciolek, T., Leman, J.K., Berenberg, D., Vatanen, T., Chandler, C., Taylor, B.C., Fisk, I.M., Vlamakis, H.,et al.: Structure-based protein func- tion prediction using graph convolutional networks. Nature communications12(1), 3168 (2021)
work page 2021
-
[61]
In: International Conference on Machine 19 Learning, pp
Zhou, J., Troyanskaya, O.: Deep supervised and convolutional generative stochastic net- work for protein secondary structure prediction. In: International Conference on Machine 19 Learning, pp. 745–753 (2014). PMLR
work page 2014
-
[62]
Proteins: Structure, Function, and Bioinformatics86, 7–15 (2018) https://doi.org/10.1002/prot.25415
Moult, J., Fidelis, K., Kryshtafovych, A., Schwede, T., Tramontano, A.: Critical assess- ment of methods of protein structure prediction (CASP)-Round XII. Proteins: Structure, Function, and Bioinformatics86, 7–15 (2018) https://doi.org/10.1002/prot.25415
-
[63]
Bioinformatics35(14), 305–314 (2019)
Chen, M., Ju, C.J.-T., Zhou, G., Chen, X., Zhang, T., Chang, K.-W., Zaniolo, C., Wang, W.: Multifaceted protein–protein interaction prediction based on siamese residual rcnn. Bioinformatics35(14), 305–314 (2019)
work page 2019
-
[64]
Briefings in bioinformatics23(2), 558 (2022)
Song, B., Luo, X., Luo, X., Liu, Y ., Niu, Z., Zeng, X.: Learning spatial structures of pro- teins improves protein–protein interaction prediction. Briefings in bioinformatics23(2), 558 (2022)
work page 2022
-
[65]
Bioinformatics35(3), 462–469 (2019)
Jankauskait ˙e, J., Jim ´enez-Garc´ıa, B., Dapk ¯unas, J., Fern ´andez-Recio, J., Moal, I.H.: Skempi 2.0: an updated benchmark of changes in protein–protein binding energy, kinetics and thermodynamics upon mutation. Bioinformatics35(3), 462–469 (2019)
work page 2019
-
[66]
Nucleic acids research53(D1), 1633–1644 (2025)
Liu, T., Hwang, L., Burley, S.K., Nitsche, C.I., Southan, C., Walters, W.P., Gilson, M.K.: Bindingdb in 2024: a fair knowledgebase of protein-small molecule binding data. Nucleic acids research53(D1), 1633–1644 (2025)
work page 2024
-
[67]
Nature biotechnology29(11), 1046–1051 (2011)
Davis, M.I., Hunt, J.P., Herrgard, S., Ciceri, P., Wodicka, L.M., Pallares, G., Hocker, M., Treiber, D.K., Zarrinkar, P.P.: Comprehensive analysis of kinase inhibitor selectivity. Nature biotechnology29(11), 1046–1051 (2011)
work page 2011
-
[68]
https://huggingface.co/deepseek-ai/ DeepSeek-V3.1-Terminus
DeepSeek-AI: DeepSeek-V3.1-Terminus (2025). https://huggingface.co/deepseek-ai/ DeepSeek-V3.1-Terminus
work page 2025
-
[69]
https://developers.openai.com/api/docs/models/gpt-5-mini
OpenAI: GPT-5 mini (2025). https://developers.openai.com/api/docs/models/gpt-5-mini
work page 2025
-
[70]
gpt-oss-120b & gpt-oss-20b Model Card
Agarwal, S., Ahmad, L., Ai, J., Altman, S., Applebaum, A., Arbus, E., Arora, R.K., Bai, Y ., Baker, B., Bao, H., et al.: gpt-oss-120b & gpt-oss-20b model card. arXiv preprint arXiv:2508.10925 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[71]
https://huggingface.co/mistralai/ Mistral-Nemo-Instruct-2407
Team, M.: Mistral-Nemo-Instruct-2407 (2024). https://huggingface.co/mistralai/ Mistral-Nemo-Instruct-2407
work page 2024
-
[72]
In: Proceedings of the 29th Symposium on Operating Systems Principles, pp
Kwon, W., Li, Z., Zhuang, S., Sheng, Y ., Zheng, L., Yu, C.H., Gonzalez, J., Zhang, H., Stoica, I.: Efficient memory management for large language model serving with page- dattention. In: Proceedings of the 29th Symposium on Operating Systems Principles, pp. 611–626 (2023) Appendix A Supplementary Table. 20 Supplementary Table: Detailed descriptions of be...
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.