Recognition: no theorem link
Self Driving Datasets: From 20 Million Papers to Nuanced Biomedical Knowledge at Scale
Pith reviewed 2026-05-11 01:23 UTC · model grok-4.3
The pith
PubMed can be autonomously turned into structured biomedical datasets larger, more nuanced, and more accurate than the curated databases they replace.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A pipeline of ontology-grounded LLM tagging on the entire PubMed corpus, combined with entity-filtered retrieval and a multi-agent system called Starling, autonomously produces structured records for biomedical properties that are larger in scale, richer in nuance, and lower in measured error than the manually curated alternatives they can replace.
What carries the argument
Starling, a multi-agent deep research system that, from a natural-language task description, designs precision- and recall-targeted retrieval filters, induces an extraction schema, and emits structured records together with supporting passages.
Load-bearing premise
Frontier-model rejection rates on the extracted records serve as a reliable proxy for actual correctness, and the multi-agent process does not introduce systematic biases missed by those checks.
What would settle it
Hand verification of a random sample of generated records against the original papers, with direct comparison of error rates to those measured on existing curated sets such as BBB_Martins.
Figures
read the original abstract
Manually curated biomedical repositories -- spanning bioactivity, genomics, and chemistry -- are expensive to maintain, lag behind primary literature, and discard experimental context, obscuring nuances needed to assess data correctness and coverage. We show that PubMed itself can be autonomously and cost-effectively turned into structured datasets that are larger, more nuanced, and more accurate than the curated databases they replace. We present three coupled contributions: (1) an LLM-based entity-tagging pipeline, grounded in nine biomedical ontologies, that tags 4.5B entities across 19 categories in a 22.5M-paper, 2.5T-token PubMed corpus; (2) hybrid sparse-dense retrieval supporting entity-filtered semantic queries over the tagged corpus; and (3) Starling, a multi-agent deep research system that, given only a natural-language task description, designs precision- and recall-targeted retrieval filters, induces an extraction schema, and emits structured records with nuance-rich fields and supporting passages. Across six tasks -- blood-brain barrier permeability, oral bioavailability, acute toxicity (LD50), gene-disease associations, protein subcellular localization, and chemical reactions -- Starling produces ~6.3M records (91K-3M per task); several are, to our knowledge, the largest public datasets for their property. Frontier-model rejection of our extractions is 0.6-7.7% across tasks, far below error rates we measure on widely used curated counterparts (e.g., 16.5% on BBB_Martins, 7.3% on Bioavailability_Ma). Beyond scale and accuracy, the supporting passages carry nuance tabular databases discard -- e.g., oral bioavailability may depend on fed vs. fasted state. Together, the corpus, retrieval, and agent establish a foundation for AI-driven therapeutic design. Code and datasets: https://github.com/starling-labs/starling.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that PubMed can be autonomously converted into structured, nuanced biomedical datasets at scale using an LLM-based entity-tagging pipeline over 22.5M papers, hybrid retrieval, and a multi-agent system (Starling) that designs filters and extracts records for six tasks (BBB permeability, oral bioavailability, LD50 toxicity, gene-disease associations, protein localization, chemical reactions). It reports producing ~6.3M records, several the largest public datasets for their properties, with frontier-model rejection rates of 0.6-7.7% versus higher measured rates on existing curated databases (e.g., 16.5% on BBB_Martins, 7.3% on Bioavailability_Ma), plus retention of experimental nuance in supporting passages. Code and datasets are released.
Significance. If the accuracy and bias claims hold, the work would provide a scalable alternative to manual curation, yielding larger, more current, and context-rich datasets that could accelerate therapeutic design and reduce maintenance costs for repositories. The public release of code and datasets is a clear strength, supporting reproducibility and community extension.
major comments (1)
- [Abstract] Abstract and results on accuracy: the central claim that Starling extractions are more accurate than curated databases rests on frontier-model rejection rates (0.6-7.7%) being lower than measured rates on BBB_Martins (16.5%) and Bioavailability_Ma (7.3%). This comparison presupposes uniform detection of errors by the judge model and absence of systematic biases (e.g., consistent misinterpretation of context or ontology grounding) in the multi-agent pipeline; however, the manuscript provides no human-annotated ground truth, inter-annotator agreement statistics, or external benchmark overlap to calibrate the proxy, leaving the accuracy superiority unverified.
minor comments (2)
- [Methods] The description of the nine-ontology entity-tagging pipeline and the hybrid sparse-dense retrieval could include pseudocode or explicit parameter settings for the retrieval filters to improve reproducibility.
- [Results] Table or figure reporting per-task record counts and rejection rates would benefit from explicit column definitions and confidence intervals on the rejection percentages.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback. We address the major comment on the accuracy evaluation point by point below and outline our planned revisions.
read point-by-point responses
-
Referee: [Abstract] Abstract and results on accuracy: the central claim that Starling extractions are more accurate than curated databases rests on frontier-model rejection rates (0.6-7.7%) being lower than measured rates on BBB_Martins (16.5%) and Bioavailability_Ma (7.3%). This comparison presupposes uniform detection of errors by the judge model and absence of systematic biases (e.g., consistent misinterpretation of context or ontology grounding) in the multi-agent pipeline; however, the manuscript provides no human-annotated ground truth, inter-annotator agreement statistics, or external benchmark overlap to calibrate the proxy, leaving the accuracy superiority unverified.
Authors: We appreciate the referee highlighting this key methodological point. The reported comparison is explicitly relative: the identical frontier-model judge and rejection criteria are applied to both our Starling extractions and the records drawn from the curated databases (BBB_Martins and Bioavailability_Ma). This design isolates the difference in rejection rates (0.6-7.7% vs. 16.5% and 7.3%) under matched evaluation conditions. We agree that the proxy does not constitute absolute accuracy verification and that systematic biases in the judge (e.g., context misinterpretation) could affect results. The manuscript does not contain human-annotated ground truth or IAA statistics because large-scale manual annotation of millions of records is infeasible. We will make a partial revision by (1) rephrasing the abstract and results sections to describe the comparison as a consistent proxy evaluation rather than direct accuracy superiority, and (2) adding a limitations subsection that discusses potential judge biases, the absence of human calibration data, and the release of rejected examples for external review. revision: partial
- Large-scale human-annotated ground truth or inter-annotator agreement statistics for the full 6.3M records
Circularity Check
No circularity: empirical extraction pipeline with external model checks is self-contained
full rationale
The paper presents an LLM-based entity tagging pipeline, hybrid retrieval, and multi-agent system (Starling) that processes a PubMed corpus to emit structured records across six tasks. Claims rest on direct empirical outputs (6.3M records, rejection rates of 0.6-7.7%) compared to measured error rates on existing curated databases. No equations, fitted parameters renamed as predictions, self-definitional quantities, or load-bearing self-citations of uniqueness theorems appear. The derivation chain consists of system description followed by measurement; results are not equivalent to inputs by construction and do not rely on internal redefinitions or ansatzes smuggled via prior work.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Frontier LLMs can perform accurate biomedical entity tagging and structured extraction when grounded in existing ontologies.
Reference graph
Works this paper leans on
-
[1]
Arora, Yu Bai, Bowen Baker, Hai-Biao Bao, Boaz Barak, Ally Bennett, Tyler Bertao, N
OpenAI Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Applebaum, Edwin Arbus, Rahul K. Arora, Yu Bai, Bowen Baker, Hai-Biao Bao, Boaz Barak, Ally Bennett, Tyler Bertao, N. Archer Brett, Eugene Brevdo, Greg Brockman, Sébastien Bubeck, Cheng Chang, Kai Chen, Mark Chen, Enoch Cheung, Aidan Clark, Dan Cook, Marat Dukhan, C. Dvorak, K Fives, Vlad Fom...
work page 2025
-
[2]
Large language models are few-shot clinical information extractors
Monica Agrawal, Stefan Hegselmann, Hunter Lang, Yoon Kim, and David Sontag. Large language models are few-shot clinical information extractors. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors,Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1998–2022, Abu Dhabi, United Arab Emirates, December
work page 2022
-
[3]
Large lan- guage models are few-shot clinical information extractors
Association for Computational Linguistics. doi: 10.18653/v1/2022.emnlp-main.130. URLhttps://aclanthology.org/2022.emnlp-main.130/
-
[4]
Qianxiang Ai, Fanwang Meng, Jiale Shi, Brenden Pelkie, and Connor W. Coley. Extracting struc- tured data from organic synthesis procedures using a fine-tuned large language model.Digital Discovery, 3(9):1822–1831, September 2024. ISSN 2635-098X. doi: 10.1039/D4DD00091A. URLhttps://pubs.rsc.org/en/content/articlelanding/2024/dd/d4dd00091a
-
[5]
Rolf Apweiler, Amos Bairoch, Cathy H. Wu, Winona C. Barker, Brigitte Boeckmann, Serenella Ferro, Elisabeth Gasteiger, Hongzhan Huang, Rodrigo Lopez, Michele Magrane, Maria J. Martin, Darren A. Natale, Claire O’Donovan, Nicole Redaschi, and Lai-Su L. Yeh. UniProt: the Universal Protein knowledgebase.Nucleic Acids Research, 32(Database issue):D115–D119, Jan...
-
[6]
Michael Ashburner, Catherine A. Ball, Judith A. Blake, David Botstein, Heather Butler, J. Michael Cherry, Allan P. Davis, Kara Dolinski, Selina S. Dwight, Janan T. Eppig, Midori A. Harris, David P. Hill, Laurie Issel-Tarver, Andrew Kasarskis, Suzanna Lewis, John C. Matese, 10 Joel E. Richardson, Martin Ringwald, Gerald M. Rubin, and Gavin Sherlock. Gene O...
-
[7]
Parit Bansal, Anne Morgat, Kristian B. Axelsen, Venkatesh Muthukrishnan, Elisabeth Coud- ert, Lucila Aimo, Nevila Hyka-Nouspikel, Elisabeth Gasteiger, Arnaud Kerhornou, Teresa Batista Neto, Monica Pozzato, Marie-Claude Blatter, Alex Ignatchenko, Nicole Redaschi, and Alan Bridge. Rhea, the reaction knowledgebase in 2022.Nucleic Acids Research, 50(D1):D693–...
-
[8]
SM Bell, J Phillips, A Sedykh, A Tandon, C Sprankle, SQ Morefield, A Shapiro, D Allen, R Shah, EA Maull, WM Casey, and NC Kleinstreuer. An integrated chemical environment to support 21st-century toxicology.Environmental Health Perspectives, 125(5):054501, 2017. doi: 10.1289/EHP1759. URLhttps://doi.org/10.1289/EHP1759
-
[9]
Binder, Sune Pletscher-Frankild, Kalliopi Tsafou, Christian Stolte, Sean I
Janos X. Binder, Sune Pletscher-Frankild, Kalliopi Tsafou, Christian Stolte, Sean I. O’Donoghue, Reinhard Schneider, and Lars Juhl Jensen. Compartments: unification and visualization of protein subcellular localization evidence.Database, 2014:bau012, 2014. doi: 10.1093/database/bau012. URLhttps://doi.org/10.1093/database/bau012
-
[10]
Nucleic Acids Research , author =
Olivier Bodenreider. The Unified Medical Language System (UMLS): integrating biomedical terminology.Nucleic Acids Research, 32(Database issue):D267–D270, January 2004. ISSN 0305-1048. doi: 10.1093/nar/gkh061. URL https://pmc.ncbi.nlm.nih.gov/articles/ PMC308795/
-
[11]
Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes
Daniil A. Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes. Autonomous chemical research with large language models.Nature, 624(7992):570–578, December 2023. ISSN 1476-
work page 2023
-
[12]
Autonomous chemical research with large language models
doi: 10.1038/s41586-023-06792-0. URL https://www.nature.com/articles/ s41586-023-06792-0
-
[13]
Bran, Sam Cox, Oliver Schilter, Carlo Baldassari, Andrew D
Andres M. Bran, Sam Cox, Oliver Schilter, Carlo Baldassari, Andrew D. White, and Philippe Schwaller. Augmenting large-language models with chemistry tools.Nature Ma- chine Intelligence, 6(5):525–535, May 2024. doi: 10.1038/s42256-024-00832-8. URL https://www.nature.com/articles/s42256-024-00832-8
-
[14]
Garth R. Brown, Vichet Hem, Kenneth S. Katz, Michael Ovetsky, Craig Wallin, Olga Ermolaeva, Igor Tolstoy, Tatiana Tatusova, Kim D. Pruitt, Donna R. Maglott, and Terence D. Murphy. Gene: a gene-centered information resource at NCBI.Nucleic Acids Research, 43(Database issue):D36–D42, January 2015. ISSN 0305-1048. doi: 10.1093/nar/gku1055. URL https: //pmc.n...
-
[15]
Rosen, Ger- brand Ceder, Kristin A
John Dagdelen, Alexander Dunn, Sanghoon Lee, Nicholas Walker, Andrew S. Rosen, Ger- brand Ceder, Kristin A. Persson, and Anubhav Jain. Structured information extraction from scientific text with large language models.Nature Communications, 15(1):1418, February
-
[16]
doi: 10.1038/s41467-024-45563-x
ISSN 2041-1723. doi: 10.1038/s41467-024-45563-x. URL https://www.nature. com/articles/s41467-024-45563-x
-
[17]
Allan Peter Davis, Thomas C. Wiegers, Robin J. Johnson, Daniela Sciaky, Jolene Wiegers, and Carolyn J. Mattingly. Comparative toxicogenomics database (CTD): update 2023.Nucleic Acids Research, 51(D1):D1257–D1262, 2023. doi: 10.1093/nar/gkac833. URL https://doi. org/10.1093/nar/gkac833
-
[18]
Kirill Degtyarenko, Paula de Matos, Marcus Ennis, Janna Hastings, Martin Zbinden, Alan McNaught, Rafael Alcántara, Michael Darsow, Mickaël Guedj, and Michael Ashburner. ChEBI: a database and ontology for chemical entities of biological interest.Nucleic Acids Research, 36 (Database issue):D344–D350, January 2008. ISSN 0305-1048. doi: 10.1093/nar/gkm791. UR...
-
[19]
ADME evaluation in drug discovery
Tingjun Hou, Junmei Wang, Wei Zhang, and Xiaojie Xu. ADME evaluation in drug discovery
-
[20]
Can oral bioavailability in humans be effectively predicted by simple molecular property- based rules?Journal of Chemical Information and Modeling, 47(2):460–463, 2007. doi: 10.1021/ci6003515. 11
-
[21]
Coley, Cao Xiao, Jimeng Sun, and Marinka Zitnik
Kexin Huang, Tianfan Fu, Wenhao Gao, Yue Zhao, Yusuf Roohani, Jure Leskovec, Con- nor W. Coley, Cao Xiao, Jimeng Sun, and Marinka Zitnik. Artificial intelligence foun- dation for therapeutic science.Nature Chemical Biology, 18:1033–1036, 2022. doi: 10.1038/s41589-022-01131-2. URLhttps://doi.org/10.1038/s41589-022-01131-2
-
[22]
Biomni: A general-purpose biomedical ai agent.bioRxiv preprint, 2025
Kexin Huang, Serena Zhang, Hanchen Wang, Yuanhao Qu, Yingzhou Lu, Yusuf Roohani, Ryan Li, Lin Qiu, Junze Zhang, Yin Di, et al. Biomni: A general-purpose biomedical ai agent.bioRxiv preprint, 2025. doi: 10.1101/2025.05.30.656746
-
[23]
Huntley, Tony Sawford, Prudence Mutowo-Meullenet, Aleksandra Shypitsyna, Carlos Bonilla, Maria J
Rachael P. Huntley, Tony Sawford, Prudence Mutowo-Meullenet, Aleksandra Shypitsyna, Carlos Bonilla, Maria J. Martin, and Claire O’Donovan. The GOA database: Gene Ontology annotation updates for 2015.Nucleic Acids Research, 43(Database issue):D1057–D1063, 2015. doi: 10.1093/nar/gku1113
-
[24]
Alistair E. W. Johnson, Lucas Bulgarelli, Lu Shen, Alvin Gayles, Ayad Shammout, Steven Horng, Tom J. Pollard, Benjamin Moody, Brian Gow, Li wei H. Lehman, Leo Anthony Celi, and Roger G. Mark. Mimic-iv, a freely accessible electronic health record dataset.Scientific Data, 10, 2023. URLhttps://api.semanticscholar.org/CorpusID:255439889
work page 2023
-
[25]
John M. Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ron- neberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon A A Kohl, Andy Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David Reiman, ...
work page 2021
-
[26]
Steven M. Kearnes, Michael R. Maser, Michael Wleklinski, Anton Kast, Abigail G. Doyle, Spencer D. Dreher, Joel M. Hawkins, Klavs F. Jensen, and Connor W. Coley. The Open Reaction Database.Journal of the American Chemical Society, 143(45):18820–18826, November 2021. ISSN 0002-7863. doi: 10.1021/jacs.1c09820. URL https://doi.org/10.1021/jacs. 1c09820
-
[27]
Kim, Alexander Sedykh, Suman K
Marlene T. Kim, Alexander Sedykh, Suman K. Chakravarti, Roustem D. Saiakhov, and Hao Zhu. Critical evaluation of human oral bioavailability for pharmaceutical drugs by using various cheminformatics approaches, 2014
work page 2014
-
[28]
Sunghwan Kim, Paul A. Thiessen, Evan E. Bolton, Jie Chen, Gang Fu, Asta Gindulyte, Lianyi Han, Jane He, Siqian He, Benjamin A. Shoemaker, Jiyao Wang, Bo Yu, Jian Zhang, and Stephen H. Bryant. PubChem Substance and Compound databases.Nucleic Acids Research, 44 (D1):D1202–1213, January 2016. ISSN 1362-4962. doi: 10.1093/nar/gkv951
-
[29]
Kerstin Kläser, Bła˙zej Banaszewski, Samuel Maddrell-Mander, Callum McLean, Luis Müller, Ali Parviz, Shenyang Huang, and Andrew Fitzgibbon. $\texttt{MiniMol}$: A Parameter- Efficient Foundation Model for Molecular Learning, April 2024. URL http://arxiv.org/ abs/2404.14986. arXiv:2404.14986 [cs]
-
[30]
Craig Knox, Mike Wilson, Christen M Klinger, Mark Franklin, Eponine Oler, Alex Wilson, Allison Pon, Jordan Cox, Na Eun (Lucy) Chin, Seth A Strawbridge, Marysol Garcia-Patino, Ray Kruger, Aadhavya Sivakumaran, Selena Sanford, Rahil Doshi, Nitya Khetarpal, Omolola Fatokun, Daphnee Doucet, Ashley Zubkowski, Dorsa Yahya Rayat, Hayley Jackson, Karxena Harford,...
-
[31]
Melissa J. Landrum, Jennifer M. Lee, Mark Benson, Garth R. Brown, Chen Chao, Shanmuga Chitipiralla, Baoshan Gu, Jennifer Hart, Douglas Hoffman, Wonhee Jang, Karen Karapetyan, 12 Kenneth Katz, Chunlei Liu, Zenith Maddipatla, Adriana Malheiro, Kurt McDaniel, Michael Ovetsky, George Riley, George Zhou, J. Bradley Holmes, Brandi L. Kattman, and Donna R. Maglo...
-
[32]
Jiawei Li, Minzhou Li, Qi Yang, and Sanzhong Luo. ReactionSeek: LLM-powered literature data mining and knowledge discovery in organic synthesis.Nature Communications, 17(1):3356,
-
[33]
URL https://www.nature.com/articles/ s41467-026-70180-1
doi: 10.1038/s41467-026-70180-1. URL https://www.nature.com/articles/ s41467-026-70180-1
-
[34]
Yumeng Li, Guang Yang, Hao Liu, Bowen Wang, and Colin Zhang. dots.ocr: Multilingual document layout parsing in a single vision-language model, 2025. URL https://arxiv.org/ abs/2512.02498
-
[35]
Zhiyuan Liu, Yaorui Shi, An Zhang, Sihang Li, Enzhi Zhang, Xiang Wang, Kenji Kawaguchi, and Tat-Seng Chua. ReactXT: Understanding molecular “reaction-ship” via reaction- contextualized molecule-text pretraining. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Findings of the Association for Computational Linguistics: ACL 2024, pages 5353–5377, B...
work page 2024
-
[36]
Daniel M. Lowe, Peter T. Corbett, Peter Murray-Rust, and Robert C. Glen. Chemical Name to Structure: OPSIN, an Open Source Solution.Journal of Chemical Information and Modeling, 51(3):739–753, March 2011. ISSN 1549-9596. doi: 10.1021/ci100384d. URL https: //doi.org/10.1021/ci100384d
-
[37]
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha. The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, September 2024. URL http://arxiv.org/abs/2408.06292. arXiv:2408.06292 [cs]
work page internal anchor Pith review arXiv 2024
-
[38]
Jiang Lu, Lianlian Wu, Ruijiang Li, Mengxuan Wan, Jun Yang, Peng Zan, Hui Bai, Song He, and Xiaochen Bo. Toxacol: an endpoint-aware and task-focused compound representation learning paradigm for acute toxicity assessment.Nature Communications, 16:5992, 2025. doi: 10.1038/s41467-025-60989-7. URLhttps://doi.org/10.1038/s41467-025-60989-7
-
[39]
Jakub Lála, Odhran O’Donoghue, Aleksandar Shtedritski, Sam Cox, Samuel G. Rodriques, and Andrew D. White. PaperQA: Retrieval-Augmented Generative Agent for Scientific Research, December 2023. URLhttp://arxiv.org/abs/2312.07559. arXiv:2312.07559 [cs]
-
[40]
Chang-Ying Ma, Sheng-Yong Yang, Hui Zhang, Mingli Xiang, Qi Huang, and Yuquan Wei. Prediction models of human plasma protein binding rate and oral bioavailability derived by using ga–cg–svm method.Journal of Pharmaceutical and Biomedical Analysis, 47(4–5):677–682,
-
[41]
URL https://doi.org/10.1016/j.jpba.2008
doi: 10.1016/j.jpba.2008.03.023. URL https://doi.org/10.1016/j.jpba.2008. 03.023
-
[42]
Iyad Majid, Vaibhav Mishra, Rohith Ravindranath, and Sophia Y . Wang. Evaluating the Performance of Large Language Models for Named Entity Recognition in Ophthalmology Clinical Free-Text Notes.AMIA Annual Symposium Proceedings, 2024:778–787, May 2025. ISSN 1942-597X. URLhttps://pmc.ncbi.nlm.nih.gov/articles/PMC12099357/
work page 2024
-
[43]
Kamel Mansouri, Agnes L. Karmaus, Jeremy Fitzpatrick, Grace Patlewicz, Prachi Pradeep, Domenico Alberga, Nathalie Alepee, Timothy E. H. Allen, Dave Allen, Vinicius M. Alves, Carolina H. Andrade, Tyler R. Auernhammer, Davide Ballabio, Shannon Bell, Emilio Benfenati, Sudin Bhattacharya, Joyce V . Bastos, Stephen Boyd, J. B. Brown, Stephen J. Capuzzi, Yarosl...
-
[44]
Foulger, Sarah Leigh, Louise C
Antonio Rueda Martin, Eleanor Williams, Rebecca E. Foulger, Sarah Leigh, Louise C. Daugh- erty, Olivia Niblock, Ivone U. S. Leong, Katherine R. Smith, Oleg Gerasimenko, Eik Haraldsdot- tir, Ellen Thomas, Richard H. Scott, Emma Baple, Arianna Tucci, Helen Brittain, Anna de Burca, Kristina Ibañez, Dalia Kasperaviciute, Damian Smedley, Mark Caulfield, August...
-
[45]
Teixeira, Luis Pinheiro, and Andre O
Ines Filipa Martins, Ana L. Teixeira, Luis Pinheiro, and Andre O. Falcao. A bayesian approach to in silico blood-brain barrier penetration modeling.Journal of Chemical Information and Modeling, 52(6):1686–1697, 2012. doi: 10.1021/ci300124c. URL https://doi.org/10. 1021/ci300124c
-
[46]
Fanwang Meng, Yang Xi, Jinfeng Huang, and Paul W. Ayers. A curated diverse molecular database of blood-brain barrier permeability with chemical descriptors.Scientific Data, 8 (1):289, October 2021. ISSN 2052-4463. doi: 10.1038/s41597-021-01069-5. URL https: //www.nature.com/articles/s41597-021-01069-5
-
[47]
Ludovico Mitchener, Angela Yiu, Benjamin Chang, Mathieu Bourdenx, Tyler Nadolski, Arvis Sulovari, Eric C. Landsness, Daniel L. Barabasi, Siddharth Narayanan, Nicky Evans, Shriya Reddy, Martha Foiani, Aizad Kamal, Leah P. Shriver, Fang Cao, Asmamaw T. Wassie, Jon M. Laurent, Edwin Melville-Green, Mayk Caldas, Albert Bou, Kaleigh F. Roberts, Sladjana Zagora...
-
[48]
Ludovico Mitchener, Angela Yiu, Benjamin Chang, Mathieu Bourdenx, Tyler Nadolski, Arvis Sulovari, Eric C. Landsness, Dániel L. Barabási, Siddharth Narayanan, Nicky Evans, Shriya Reddy, Martha S. Foiani, Aizad Kamal, Leah P. Shriver, Fang Cao, Asmamaw T. Wassie, Jon M. Laurent, Edwin Melville-Green, Mayk Caldas Ramos, Albert Bou, Kaleigh F. Roberts, Sladja...
-
[49]
Siddharth Narayanan, James D. Braza, Ryan-Rhys Griffiths, Manu Ponnapati, Albert Bou, Jon Laurent, Ori Kabeli, Geemi Wellawatte, Sam Cox, Samuel G. Rodriques, and Andrew D. White. Aviary: training language agents on challenging scientific tasks, December 2024. URL http://arxiv.org/abs/2412.21154. arXiv:2412.21154 [cs]
-
[50]
Generif: Gene reference into function
National Center for Biotechnology Information. Generif: Gene reference into function. https: //www.ncbi.nlm.nih.gov/gene/about-generif
-
[51]
Mark Neumann, Daniel King, Iz Beltagy, and Waleed Ammar. ScispaCy: Fast and robust models for biomedical natural language processing. In Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, and Junichi Tsujii, editors,Proceedings of the 18th BioNLP Workshop and Shared Task, pages 319–327, Florence, Italy, August 2019. Association for Computationa...
-
[52]
Pharmabench: Enhancing admet benchmarks with large language models.Scientific Data, 11(1):985, 2024
Zhangming Niu, Xianglu Xiao, Wenfan Wu, Qiwei Cai, Yinghui Jiang, Wangzhen Jin, Min- hao Wang, Guojian Yang, Lingkang Kong, Xurui Jin, Guang Yang, and Hongming Chen. Pharmabench: Enhancing admet benchmarks with large language models.Scientific Data, 11(1):985, 2024. doi: 10.1038/s41597-024-03793-0. URL https://doi.org/10.1038/ s41597-024-03793-0
-
[53]
Do LLMs Surpass Encoders for Biomedical NER?Proceedings
Motasem S Obeidat, Md Sultan Al Nahian, and Ramakanth Kavuluru. Do LLMs Surpass Encoders for Biomedical NER?Proceedings. IEEE International Conference on Healthcare Informatics, 2025:352–358, June 2025. ISSN 2575-2626. doi: 10.1109/ICHI64645.2025.00048. URLhttps://pmc.ncbi.nlm.nih.gov/articles/PMC12335919/
-
[54]
Guillaume Ollitrault, Marco Marzo, Alessandra Roncaglioni, Emilio Benfenati, Olivier Taboureau, and Enrico Mombelli. Qsar models for predicting oral bioavailability and vol- ume of distribution and their application in mapping the tk space of endocrine disrup- tors.Journal of Xenobiotics, 15(5):166, 2025. doi: 10.3390/jox15050166. URL https: //doi.org/10....
-
[55]
OpenAI, Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Applebaum, Edwin Arbus, Rahul K. Arora, Yu Bai, Bowen Baker, Haiming Bao, Boaz Barak, Ally Bennett, Tyler Bertao, Nivedita Brett, Eugene Brevdo, Greg Brockman, Sebastien Bubeck, Che Chang, Kai Chen, Mark Chen, Enoch Cheung, Aidan Clark, Dan Cook, Marat Dukhan, Casey Dvorak, Kevin Fives, Vlad...
-
[56]
gpt-oss-120b & gpt-oss-20b Model Card
URLhttp://arxiv.org/abs/2508.10925. arXiv:2508.10925 [cs]
work page internal anchor Pith review Pith/arXiv arXiv
-
[57]
Lukas Minus Orre, Mattias Vesterlund, Yanbo Pan, Taner Arslan, Yafeng Zhu, Alejandro Fernandez Woodbridge, Oliver Frings, Erik Fredlund, and Janne Lehtiö. SubCellBarCode: Proteome-wide mapping of protein localization and relocalization.Molecular Cell, 73(1): 166–182.e7, 2019. doi: 10.1016/j.molcel.2018.11.035
-
[58]
Janet Piñero, Juan Manuel Ramírez-Anguita, Josep Saüch-Pitarch, Francesco Ronzano, Emilio Centeno, Ferran Sanz, and Laura I. Furlong. The DisGeNET knowledge platform for disease genomics: 2019 update.Nucleic Acids Research, 48(D1):D845–D855, 2020. doi: 10.1093/nar/ gkz1021
-
[59]
Sune Pletscher-Frankild, Albert Pallejà, Kalliopi Tsafou, Janos X. Binder, and Lars Juhl Jensen. Diseases: Text mining and data integration of disease–gene associations.Methods, 74:83–89,
-
[60]
URL https://doi.org/10.1016/j.ymeth.2014
doi: 10.1016/j.ymeth.2014.11.020. URL https://doi.org/10.1016/j.ymeth.2014. 11.020
-
[61]
Bran, Malte Franke, Rémi Schlama, Jeremy S
Victor Sabanza Gil, Andres M. Bran, Malte Franke, Rémi Schlama, Jeremy S. Luterbacher, and Philippe Schwaller. Holistic chemical evaluation reveals pitfalls in reaction prediction models. InNeurIPS 2023 AI for Science Workshop, 2023. doi: 10.48550/arXiv.2312.09004. URLhttps://arxiv.org/abs/2312.09004
-
[62]
Conrad L Schoch, Stacy Ciufo, Mikhail Domrachev, Carol L Hotton, Sivakumar Kannan, Rogneda Khovanskaya, Detlef Leipe, Richard Mcveigh, Kathleen O’Neill, Barbara Robbertse, 15 Shobha Sharma, Vladimir Soussov, John P Sullivan, Lu Sun, Seán Turner, and Ilene Karsch- Mizrachi. NCBI Taxonomy: a comprehensive update on curation, resources and tools.Database: Th...
work page 2020
-
[63]
URL https://pmc.ncbi.nlm.nih.gov/articles/ PMC7408187/
doi: 10.1093/database/baaa062. URL https://pmc.ncbi.nlm.nih.gov/articles/ PMC7408187/
-
[64]
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y . K. Li, Y . Wu, and Daya Guo. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models, February 2024. URL https://arxiv. org/abs/2402.03300v3
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[65]
Aaditya Singh, Adam Fry, Adam Perelman, Adam Tart, Adi Ganesh, Ahmed El-Kishky, Aidan McLaughlin, Aiden Low, A. J. Ostrow, Akhila Ananthram, Akshay Nathan, Alan Luo, Alec Helyar, Aleksander Madry, Aleksandr Efremov, Aleksandra Spyra, Alex Baker- Whitcomb, Alex Beutel, Alex Karpenko, Alex Makelov, Alex Neitz, Alex Wei, Alexandra Barr, Alexandre Kirchmeyer,...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[66]
Michael D. Skarlinski, Sam Cox, Jon M. Laurent, James D. Braza, Michaela Hinks, Michael J. Hammerling, Manvitha Ponnapati, Samuel G. Rodriques, and Andrew D. White. Language agents achieve superhuman synthesis of scientific knowledge, September 2024. URL http: //arxiv.org/abs/2409.13740. arXiv:2409.13740 [cs]
-
[67]
Mujeen Sung, Minbyul Jeong, Yonghwa Choi, Donghyeon Kim, Jinhyuk Lee, and Jaewoo Kang. BERN2: an advanced neural biomedical named entity recognition and normalization tool.Bioinformatics, 38(20):4837–4839, October 2022. ISSN 1367-4811. doi: 10.1093/ bioinformatics/btac598. URLhttps://doi.org/10.1093/bioinformatics/btac598
-
[68]
Tongyi deepresearch technical report.arXiv preprint arXiv:2510.24701, 2025
Tongyi DeepResearch Team, Baixuan Li, Bo Zhang, Dingchu Zhang, Fei Huang, Guangyu Li, Guoxin Chen, Huifeng Yin, Jialong Wu, Jingren Zhou, Kuan Li, Liangcai Su, Litu Ou, Liwen Zhang, Pengjun Xie, Rui Ye, Wenbiao Yin, Xinmiao Yu, Xinyu Wang, Xixi Wu, Xuanzhong Chen, Yida Zhao, Zhen Zhang, Zhengwei Tao, Zhongwang Zhang, Zile Qiao, Chenxi Wang, Donglei Yu, Ga...
-
[69]
Peter J. Thul, Lovisa Åkesson, Mikaela Wiking, Diana Mahdessian, Aikaterini Geladaki, Hammou Ait Blal, Tove Alm, Anna Asplund, Lars Björk, Lisa M. Breckels, Anna Bäckström, Frida Danielsson, Linn Fagerberg, Jenny Fall, Laurent Gatto, Christian Gnann, Sophia Hober, Martin Hjelmare, Fredric Johansson, Sunjae Lee, Cecilia Lindskog, Jan Mulder, Claire M. Mulv...
-
[70]
Manthena V . S. Varma, R. Scott Obach, Charles Rotter, Howard R. Miller, George Chang, Stefanus J. Steyn, Ayman El-Kattan, and Matthew D. Troutman. Physicochemical space for optimum oral bioavailability: Contribution of human intestinal absorption and first-pass elimination.Journal of Medicinal Chemistry, 53(3):1098–1108, 2010. doi: 10.1021/jm901371v
-
[71]
Jonathan T. Wall, Risa R. Sayre, Doris Smith, Samuel Winter, Maxwell Groover, Jasmine Hope, Adriana Webb, Katie Paul Friedman, Madison Feshuk, Antony J. Williams, Charles Lowe, Nisha S. Sipes, Jason Lambert, Jennifer H. Olker, Russell S. Thomas, Colleen Elonen, Richard S. Judson, and Chelsea A. Weitekamp. Development of the toxicity values database, toxva...
-
[72]
Collins, and César de la Fuente-Nunez
Fangping Wan, Felix Wong, James J. Collins, and César de la Fuente-Nunez. Machine learning for antimicrobial peptide identification and design.Nature Reviews Bioengineering, 2:392 – 407, 2024. URLhttps://api.semanticscholar.org/CorpusID:268149151
work page 2024
-
[73]
Chih-Hsuan Wei, Alexis Allot, Po-Ting Lai, Robert Leaman, Shubo Tian, Ling Luo, Qiao Jin, Zhizheng Wang, Qingyu Chen, and Zhiyong Lu. Pubtator 3.0: an AI-powered literature resource for unlocking biomedical knowledge.Nucleic Acids Research, 52(W1):W540–W546,
-
[74]
URLhttps://doi.org/10.1093/nar/gkae235
doi: 10.1093/nar/gkae235. URLhttps://doi.org/10.1093/nar/gkae235
-
[75]
Min Wei, Xudong Zhang, Xiaolin Pan, Bo Wang, Changge Ji, Yifei Qi, and John Z. H. Zhang. Hobpre: accurate prediction of human oral bioavailability for small molecules.Journal of Cheminformatics, 14(1), 2022. doi: 10.1186/s13321-021-00580-6. URL https://doi.org/ 10.1186/s13321-021-00580-6
-
[76]
The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search
Yutaro Yamada, Robert Tjarko Lange, Cong Lu, Shengran Hu, Chris Lu, Jakob Foerster, Jeff Clune, and David Ha. The AI Scientist-v2: Workshop-Level Automated Scientific Dis- covery via Agentic Tree Search, April 2025. URL http://arxiv.org/abs/2504.08066. arXiv:2504.08066 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[77]
Barbara Zdrazil, Eloy Felix, Fiona Hunter, Emma J. Manners, James Blackshaw, Sybilla Corbett, Marleen de Veij, Harris Ioannidis, David Mendez Lopez, Juan F. Mosquera, Maria Paula Magarinos, Nicolas Bosc, Ricardo Arcila, Tevfik Kizilören, Anna Gaulton, A. Patrícia Bento, Melissa F. Adasme, Peter Monecke, Gregory A. Landrum, and Andrew R. Leach. The ChEMBL ...
-
[78]
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, Fei Huang, and Jingren Zhou. Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models, June 2025. URL http://arxiv.org/abs/2506.05176. arXiv:2506.05176 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[79]
Wenxuan Zhou, Sheng Zhang, Yu Gu, Muhao Chen, and Hoifung Poon. Univer- salNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition.International Conference on Learning Representations, 2024:12276–12294, May 2024. URL https://proceedings.iclr.cc/paper_files/paper/2024/hash/ 34678d08b36076de986df95c5bbba92f-Abstract-Conference.html
work page 2024
-
[80]
Hao Zhu, Todd M. Martin, Lin Ye, Alexander Sedykh, Douglas M. Young, and Alexander Tropsha. Quantitative structure–activity relationship modeling of rat acute toxicity by oral exposure.Chemical Research in Toxicology, 22(12):1913–1921, 2009. doi: 10.1021/tx900189p. URLhttps://doi.org/10.1021/tx900189p. A Release Plan, Limitations, and Broader Impact A cor...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.