Recognition: unknown
OptimusKG: Unifying biomedical knowledge in a modern multimodal graph
Pith reviewed 2026-05-07 09:29 UTC · model grok-4.3
The pith
OptimusKG builds a schema-enforced labeled property graph that unifies biomedical data from molecular to environmental domains while preserving type-specific metadata.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
OptimusKG is a multimodal biomedical labeled property graph assembled from 18 ontologies and controlled vocabularies that contains 190,531 nodes across 10 entity types, 21,813,816 edges across 26 relation types, and 67,249,863 property instances. The construction enforces a top-level schema on nodes and edges while retaining detailed, type-specific metadata and provenance across molecular, anatomical, clinical, and environmental domains. Evaluation by a multimodal literature agent identified supporting evidence for 70.0 percent of sampled edges and no supporting evidence for 83.4 percent of sampled false edges, with unsupported edges concentrated in experimental and functional genomics data.
What carries the argument
The labeled property graph (LPG) structure, which applies a top-level schema for nodes and edges while storing granular type-specific properties, cross-references, and provenance.
If this is right
- The graph supplies standardized input for graph-based machine learning models applied to biomedical problems.
- It enables knowledge-grounded retrieval when paired with large language models.
- Hypothesis generation tasks can draw on its cross-domain links and provenance tracking.
- Edges lacking literature support identify areas where experimental findings have not yet been synthesized into papers.
Where Pith is reading between the lines
- The Parquet distribution supports efficient large-scale queries on distributed systems for cohort-level analysis.
- Cross-references to controlled vocabularies could ease alignment with additional external ontologies or databases.
- Periodic re-validation against new publications would keep the graph current for ongoing discovery work.
Load-bearing premise
The multimodal agent's judgments of literature support for sampled edges are accurate and unbiased, and the samples represent the full graph.
What would settle it
A manual expert review of a larger random sample of edges that finds a substantially lower rate of literature support than the reported 70 percent.
Figures
read the original abstract
Biomedical knowledge graphs (KGs) are widely used in the life sciences, yet many are derived from unstructured documents and therefore lack schema-level constrains, whereas graphs assembled from structured resources are difficult to harmonize into a unified representation. We present OptimusKG, a multimodal biomedical labeled property graph (LPG) built from structured and semi-structured resources to preserve factual, type-specific metadata across molecular, anatomical, clinical, and environmental domains. OptimusKG contains 190,531 nodes across 10 entity types, 21,813,816 edges across 26 relation types, and 67,249,863 property instances encoding 110,276,843 values across 150 distinct property keys, derived from 18 ontologies and controlled vocabularies. The graph enforces a top-level schema for nodes and edges and retains granular, type-specific properties, cross-references, and provenance across molecular, anatomical, clinical, and environmental domains. We assessed the validity of OptimusKG by evaluating whether graph relationships are supported by evidence from the scientific literature using a multimodal agent, PaperQA3. PaperQA3 identified supporting evidence for 70.0% of sampled edges, whereas 83.4% of sampled false edges received no supporting evidence. Edges without literature support were concentrated in associations derived from experimental and functional genomics resources, suggesting that OptimusKG captures biomedical knowledge that may precede synthesis in the scientific literature. OptimusKG is distributed as Apache Parquet files, providing a standardized resource for graph-based machine learning, knowledge-grounded retrieval with large language models, and biomedical discovery use cases such as hypothesis generation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents OptimusKG, a multimodal biomedical labeled property graph constructed from 18 public ontologies and controlled vocabularies. It reports 190,531 nodes across 10 entity types, 21,813,816 edges across 26 relation types, and 67,249,863 property instances with 110,276,843 values across 150 keys. The graph enforces a top-level schema while retaining granular metadata and provenance across molecular, anatomical, clinical, and environmental domains. Validity is assessed by applying the multimodal agent PaperQA3 to sampled edges, which identifies literature support for 70.0% of true edges versus no support for 83.4% of false edges; unsupported edges are concentrated in experimental and functional genomics associations. The resource is released as Apache Parquet files for downstream use in graph ML, LLM grounding, and discovery.
Significance. If the validation holds, OptimusKG supplies a harmonized, schema-constrained KG that preserves type-specific properties and cross-references from structured sources, filling a gap between unstructured-document KGs and hard-to-harmonize ontology graphs. The open Parquet distribution is a clear strength for reproducibility and reuse in biomedical ML and retrieval tasks. The work also attempts to quantify capture of pre-literature knowledge via the differential support rates.
major comments (2)
- [Validation section] Validation section (PaperQA3 evaluation): the sampling procedure is unspecified, including sample size, selection criteria, stratification by relation type or source ontology, and the exact method used to construct false edges. This directly affects whether the reported 70.0% and 83.4% figures can be interpreted as representative of the full graph.
- [Validation section] Validation section (PaperQA3 evaluation): no benchmarking, accuracy metrics, error rates, or human-expert comparison is provided for PaperQA3 on the task of verifying literature support for biomedical relations. The central validity claim rests on the agent's performance being high and unbiased, yet this is untested in the manuscript.
minor comments (1)
- [Abstract] Abstract: reports the 70.0% and 83.4% figures without any reference to sampling methodology, sample size, or uncertainty estimates.
Simulated Author's Rebuttal
We thank the referee for their careful and constructive review of our manuscript on OptimusKG. The comments on the validation section highlight important aspects of methodological transparency that strengthen the paper. We respond to each major comment below and have made revisions to address them directly.
read point-by-point responses
-
Referee: [Validation section] Validation section (PaperQA3 evaluation): the sampling procedure is unspecified, including sample size, selection criteria, stratification by relation type or source ontology, and the exact method used to construct false edges. This directly affects whether the reported 70.0% and 83.4% figures can be interpreted as representative of the full graph.
Authors: We agree that the original manuscript did not provide sufficient detail on the sampling procedure, which limits interpretability of the 70.0% and 83.4% figures. In the revised manuscript we have expanded the Validation section to specify: a total of 2,000 edges were sampled (1,000 true edges drawn uniformly at random from the full edge set and 1,000 false edges generated by type-preserving random replacement of the target node such that the resulting triple does not exist in OptimusKG); no a priori stratification by relation type or source ontology was applied during sampling; and post-sampling stratification was performed to report support rates broken down by the 26 relation types. These additions allow readers to assess representativeness and have been incorporated into the updated text and supplementary tables. revision: yes
-
Referee: [Validation section] Validation section (PaperQA3 evaluation): no benchmarking, accuracy metrics, error rates, or human-expert comparison is provided for PaperQA3 on the task of verifying literature support for biomedical relations. The central validity claim rests on the agent's performance being high and unbiased, yet this is untested in the manuscript.
Authors: We acknowledge that the manuscript lacked direct benchmarking or human-expert comparison for PaperQA3 on the specific task of verifying literature support for graph edges. To address this, the revised version includes a new human validation subsection: two independent domain experts reviewed a random subset of 200 sampled edges (100 true, 100 false) via targeted PubMed searches and full-text assessment. PaperQA3 matched the expert consensus on 78% of true edges and 85% of false edges, with inter-annotator agreement of Cohen's kappa = 0.79. These accuracy metrics and the evaluation protocol are now reported in the Validation section. While a exhaustive benchmark across every relation type exceeds the scope of this resource-focused paper, the added human comparison and differential performance provide supporting evidence for the agent's utility here. We have also clarified PaperQA3's prior evaluations in related work. revision: yes
Circularity Check
No significant circularity: data construction paper with external validation
full rationale
The paper constructs OptimusKG by harmonizing structured resources (18 ontologies and controlled vocabularies) into a labeled property graph with explicit schema, node/edge counts, and property instances. No equations, parameter fitting, or predictive derivations are present. The validity assessment uses PaperQA3 to check literature support on sampled edges versus false edges; this is an external, falsifiable check against scientific literature rather than a self-referential reduction or fitted input renamed as prediction. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling appear in the described chain. The central output is a distributable data resource whose claims reduce to the input sources and the independent agent evaluation, not to its own outputs by construction. This matches the default non-circular case for resource papers.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The 18 ontologies and controlled vocabularies accurately capture biomedical facts without major conflicts or omissions.
Reference graph
Works this paper leans on
-
[1]
Nelson, C. A., Butte, A. J. & Baranzini, S. E. Integrating biomedical research and elec- tronic health records to create knowledge-based biologically meaningful machine-readable embeddings.Nature Communications10,3045. doi:10.1038/s41467-019-11069-0 (2019)
-
[2]
Alsentzer, E., Li, M. M., Kobren, S. N., Noori, A., Kohane, I. S. & Zitnik, M. Few shot learning for phenotype-driven diagnosis of patients with rare genetic diseases.npj Digital Medicine8,380. doi:10.1038/s41746-025-01749-1 (2025)
-
[3]
Cai, H.et al.Pretrainable geometric graph neural network for antibody affinity maturation. Nature Communications15,7785. doi:10.1038/s41467-024-51563-8 (2024)
-
[4]
doi:10.1038/s41591-024-03233-x (2024)
Huang,K.etal.Afoundationmodelforclinician-centereddrugrepurposing.NatureMedicine 30,3601–3613. doi:10.1038/s41591-024-03233-x (2024)
-
[5]
doi:10.1038/ s42256-025-01014-w (2025)
Zhang, Y.et al.A comprehensive large-scale biomedical knowledge graph for AI-powered data-driven biomedical research.Nature Machine Intelligence7,602–614. doi:10.1038/ s42256-025-01014-w (2025)
2025
-
[6]
doi:10.1126/sciadv.adj1424 (2024)
Middleton, L.et al.Phenome-wide identification of therapeutic genetic targets, leveraging knowledge graphs, graph neural networks, and UK Biobank data.Science Advances10, eadj1424. doi:10.1126/sciadv.adj1424 (2024)
-
[7]
Noori,A.etal.GraphAIgeneratesneurologicalhypothesesvalidatedinmolecular,organoid, and clinical systems2025
-
[8]
Combinatorial prediction of therapeutic perturbations using causally inspired neural networks
Gonzalez, G., Lin, X., Herath, I., Veselkov, K., Bronstein, M. & Zitnik, M. Combinato- rial prediction of therapeutic perturbations using causally inspired neural networks.Nature Biomedical Engineering,1–18. doi:10.1038/s41551-025-01481-x (2025)
-
[9]
Ali, M., Richter, S., Ertürk, A., Fischer, D. S. & Theis, F. J. Graph neural networks learn emergenttissuepropertiesfromspatialmolecularprofiles.NatureCommunications16,8419. doi:10.1038/s41467-025-63758-8 (2025)
-
[10]
Li, M. M., Huang, K. & Zitnik, M. Graph representation learning in biomedicine and health- care.Nature Biomedical Engineering6,1353–1369. doi:10.1038/s41551-022-00942-x (2022)
-
[11]
Johnson, R., Li, M. M., Noori, A., Queen, O. & Zitnik, M. Graph Artificial Intelligence in Medicine.Annual Review of Biomedical Data Science7,345–368. doi:10.1146/annurev- biodatasci-110723-024625 (2024)
-
[12]
Tang, J.et al. GraphGPT: Graph Instruction Tuning for Large Language ModelsinPro- ceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval(Association for Computing Machinery, New York, NY, USA, 2024), 491–500. doi:10.1145/3626772.3657775
-
[13]
Can graph learning improve planning in
Wu, X.et al.Can Graph Learning Improve Planning in LLM-based Agents?Advances in Neural Information Processing Systems37,5338–5383. doi:10.52202/079017-0173 (2024). 34
-
[14]
Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graphin (2024)
Sun, J.et al. Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graphin (2024)
2024
-
[15]
Tan, X., Wang, X., Liu, Q., Xu, X., Yuan, X. & Zhang, W.Paths-over-Graph: Knowledge Graph Empowered Large Language Model ReasoninginProceedings of the ACM on Web Conference2025(AssociationforComputingMachinery,NewYork,NY,USA,2025),3505–
2025
-
[16]
doi:10.1145/3696410.3714892
-
[17]
Wang, D., Zuo, Y., Li, F. & Wu, J. LLMs as Zero-shot Graph Learners: Alignment of GNN Representations with LLM Token Embeddings.Advances in Neural Information Processing Systems37,5950–5973. doi:10.52202/079017-0193 (2024)
-
[18]
Tian, Y.et al.Graph Neural Prompting with Large Language Models.Proceedings of the AAAIConferenceonArtificialIntelligence38,19080–19088.doi:10.1609/aaai.v38i17.29875 (2024)
-
[19]
KGARevion: An AI Agent for Knowledge-Intensive Biomedical QA2025
Su, X.et al. KGARevion: An AI Agent for Knowledge-Intensive Biomedical QA2025. doi:10. 48550/arXiv.2410.04660
- [20]
-
[21]
L., Mayer, R
Vatter, J., Rochau, M. L., Mayer, R. & Jacobsen, H.-A. Experiment & Benchmark Paper: To What Extent Does Quality Matter? The Impact of Graph Data Quality on GNN Model Performance.Proceedings of the VLDB Endowment. ISSN2150,8097 (2025)
2025
-
[22]
Chandak,P.,Huang,K.&Zitnik,M.Buildingaknowledgegraphtoenableprecisionmedicine. Scientific Data10,67. doi:10.1038/s41597-023-01960-3 (2023)
-
[23]
Walsh, B., Mohamed, S. K. & Nováček, V.BioKG: A Knowledge Graph for Relational Learning On Biological DatainProceedings of the 29th ACM International Conference on Information & Knowledge Management(Association for Computing Machinery, New York, NY, USA, 2020), 3173–3180. doi:10.1145/3340531.3412776
-
[24]
doi:10.1093/nar/gkab543 (2021)
Doğan,T.etal.CROssBAR:comprehensiveresourceofbiomedicalrelationswithknowledge graph representations.Nucleic Acids Research49,e96. doi:10.1093/nar/gkab543 (2021)
-
[25]
Himmelstein, D. S.et al.Systematic integration of biomedical knowledge prioritizes drugs for repurposing.eLife6(ed Valencia, A.) e26726. doi:10.7554/eLife.26726 (2017)
-
[26]
& Wu, Q.MegaKG: Toward an explainable knowledge graph for early drug development2024
Dong, J., Liu, J., Wei, Y., Huang, P. & Wu, Q.MegaKG: Toward an explainable knowledge graph for early drug development2024. doi:10.1101/2024.03.27.586981
-
[27]
& Samwald, M
Breit, A., Ott, S., Agibetov, A. & Samwald, M. OpenBioLink: a benchmarking framework for large-scale biomedical link prediction.Bioinformatics36,4097–4098. doi:10.1093/ bioinformatics/btaa274 (2020)
2020
-
[28]
doi:10.1093/bib/bbaa344 (2021)
Zheng, S.et al.PharmKG: a dedicated knowledge graph benchmark for bomedical data mining.Briefings in Bioinformatics22,bbaa344. doi:10.1093/bib/bbaa344 (2021). 35
-
[29]
Bizon, C.et al.ROBOKOP KG and KGB: Integrated Knowledge Graphs from Federated Sources.Journal of Chemical Information and Modeling59,4968–4973. doi:10.1021/acs. jcim.9b00683 (2019)
work page doi:10.1021/acs 2019
-
[30]
Wood, E. C.et al.RTX-KG2: a system for building a semantically standardized knowledge graph for translational biomedicine.BMC Bioinformatics23,400. doi:10.1186/s12859-022- 04932-3 (2022)
-
[31]
H.et al.The scalable precision medicine open knowledge engine (SPOKE): a massive knowledge graph of biomedical information.Bioinformatics39,btad080
Morris, J. H.et al.The scalable precision medicine open knowledge engine (SPOKE): a massive knowledge graph of biomedical information.Bioinformatics39,btad080. doi:10. 1093/bioinformatics/btad080 (2023)
2023
-
[32]
Biological Insights Knowledge Graph: an integrated knowledge graph to support drug development2021
Geleta, D.et al. Biological Insights Knowledge Graph: an integrated knowledge graph to support drug development2021. doi:10.1101/2021.10.28.466262
-
[33]
Ioannidis,V.N.,Zheng,D.&Karypis,G.Few-shotlinkpredictionviagraphneuralnetworks for COVID-19 drug repurposing2020. doi:10.48550/arXiv.2007.10261
-
[34]
Gonzalez-Cavazos, A. C.et al.DrugMechDB: A Curated Database of Drug Mechanisms. Scientific Data10,632. doi:10.1038/s41597-023-02534-z (2023)
-
[35]
Tanaka, T., Katayama, T. & Imai, T. Predicting the effects of drugs and unveiling their mechanismsofactionusinganinterpretablepharmacodynamicmechanismknowledgegraph (IPM-KG).Computers in Biology and Medicine184,109419. doi:10.1016/j.compbiomed. 2024.109419 (2025)
-
[36]
Boudin, M., Diallo, G., Drancé, M. & Mougin, F. The OREGANO knowledge graph for computational drug repurposing.Scientific Data10,871. doi:10.1038/s41597-023-02757-0 (2023)
-
[37]
Sadegh, S.et al.Network medicine for disease module identification and drug repurposing with the NeDRex platform.Nature Communications12,6848. doi:10.1038/s41467-021- 27138-2 (2021)
-
[38]
Zhou, C.et al.TarKG: a comprehensive biomedical knowledge graph for target discovery. Bioinformatics40,btae598. doi:10.1093/bioinformatics/btae598 (2024)
-
[39]
Pestryakova,S.etal.CovidPubGraph:AFAIRKnowledgeGraphofCOVID-19Publications. Scientific Data9,389. doi:10.1038/s41597-022-01298-2 (2022)
-
[40]
Percha, B. & Altman, R. B. A global network of biomedical relationships derived from text. Bioinformatics34,2614–2624. doi:10.1093/bioinformatics/bty114 (2018)
-
[41]
Ernst, P., Siu, A. & Weikum, G. KnowLife: a versatile approach for constructing a large knowledgegraphforbiomedicalsciences.BMCBioinformatics16,157.doi:10.1186/s12859- 015-0549-5 (2015)
-
[42]
Sengupta, A., Selby, D. A., Vollmer, S. J. & Großmann, G.MEDAKA: Construction of Biomedical Knowledge Graphs Using Large Language Models2025. doi:10.48550/arXiv. 2509.26128
work page internal anchor Pith review doi:10.48550/arxiv
-
[43]
Xu,J.etal.BuildingaPubMedknowledgegraph.ScientificData7,205.doi:10.1038/s41597- 020-0543-2 (2020). 36
-
[44]
& Rindflesch, T
Kilicoglu, H., Shin, D., Fiszman, M., Rosemblat, G. & Rindflesch, T. C. SemMedDB: a PubMed-scale repository of biomedical semantic predications.Bioinformatics28,3158–
-
[45]
doi:10.1093/bioinformatics/bts591 (2012)
-
[46]
doi:10.1038/s41587-021-01145-6 (2022)
Santos,A.etal.Aknowledgegraphtointerpretclinicalproteomicsdata.NatureBiotechnology 40,692–702. doi:10.1038/s41587-021-01145-6 (2022)
-
[47]
doi:10.1093/nar/gkac957 (2023)
Feng,F.etal.GenomicKB:aknowledgegraphforthehumangenome.NucleicAcidsResearch 51,D950–D956. doi:10.1093/nar/gkac957 (2023)
-
[48]
Jha, A.et al.GenomicsKG: a knowledge graph to visualize poly-omics data.J Adv Health1, 70–84 (2019)
2019
-
[49]
Gray,A.J.G.,Papadopoulos,P.,Asif,I.,Mičetić,I.&Hatos,A.CreatingandExploitingthe Intrinsically Disordered Protein Knowledge Graph (IDP-KG): 13th International Semantic Web Applications and Tools for Health Care and Life Sciences Conference 2022.CEUR Workshop Proceedings3127,1–10 (2022)
2022
-
[50]
& Zhang, N.Multi-modal Protein Knowledge Graph Construction and Applications2022
Cheng, S., Liang, X., Bi, Z., Chen, H. & Zhang, N.Multi-modal Protein Knowledge Graph Construction and Applications2022. doi:10.48550/arXiv.2207.10080
-
[51]
doi:10.1038/s41597-024-03673-7 (2024)
Cavalleri,E.etal.Anontology-basedknowledgegraphforrepresentinginteractionsinvolving RNA molecules.Scientific Data11,906. doi:10.1038/s41597-024-03673-7 (2024)
-
[52]
& Mesiti, M.RNA-KG v2.0: An RNA-centered Knowledge Graph with Properties2025
Cavalleri, E., Perlasca, P. & Mesiti, M.RNA-KG v2.0: An RNA-centered Knowledge Graph with Properties2025. doi:10.48550/arXiv.2508.07427
-
[53]
Nian, Y.et al.Mining on Alzheimer’s diseases related knowledge graph to identity potential AD-relatedsemantictriplesfordrugrepurposing.BMCBioinformatics23,407.doi:10.1186/ s12859-022-04934-1 (2022)
2022
-
[54]
Romano, J. D.et al.The Alzheimer’s Knowledge Base: A Knowledge Graph for Alzheimer Disease Research.Journal of Medical Internet Research26,e46777. doi:10.2196/46777 (2024)
-
[55]
Gubanov,M.,Pyayt,A.&Karolak,A.CancerKG.ORG-AWeb-scale,Interactive,Verifiable Knowledge Graph-LLM Hybrid for Assisting with Optimal Cancer Treatment and Carein Proceedingsofthe33rdACMInternationalConferenceonInformationandKnowledgeMan- agement(Association for Computing Machinery, New York, NY, USA, 2024), 4497–4505. doi:10.1145/3627673.3680094
-
[56]
Wang,Q.etal.COVID-19LiteratureKnowledgeGraphConstructionandDrugRepurposing Report GenerationinProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstra- tions(eds Sil, A. & Lin, X. V.) (Association for Computational Linguistics, Online, 2021), 66–77. doi:10.186...
-
[57]
Reese, J. T.et al.KG-COVID-19: A Framework to Produce Customized Knowledge Graphs for COVID-19 Response.Patterns2.doi:10.1016/j.patter.2020.100155 (2021)
-
[58]
Wang, L.et al.Construction of a knowledge graph for diabetes complications from expert- reviewed clinical evidences.Computer Assisted Surgery25,29–35. doi:10.1080/24699322. 2020.1850866 (2020). 37
-
[59]
doi:10.1186/s12911-020-1112-5 (2020)
Li, N.et al.KGHC: a knowledge graph for hepatocellular carcinoma.BMC Medical Infor- matics and Decision Making20,135. doi:10.1186/s12911-020-1112-5 (2020)
-
[60]
Huang, Z., Hu, Q., Liao, M., Miao, C., Wang, C. & Liu, G. Knowledge Graphs of Kawasaki Disease.Health Information Science and Systems9,11. doi:10.1007/s13755-020-00130-8 (2021)
-
[61]
doi:10.1007/978-3-319-69182-4_16
Huang,Z.,Yang,J.,vanHarmelen,F.&Hu,Q.ConstructingKnowledgeGraphsofDepression inHealthInformationScience(edsSiuly,S.etal.)(SpringerInternationalPublishing,Cham, 2017), 149–161. doi:10.1007/978-3-319-69182-4_16
-
[62]
Zhu, Q., Nguyen, D.-T., Grishagin, I., Southall, N., Sid, E. & Pariser, A. An integrative knowledge graph for rare diseases, derived from the Genetic and Rare Diseases Information Center (GARD).Journal of Biomedical Semantics11,13. doi:10.1186/s13326-020-00232-y (2020)
-
[63]
Bonner, S.et al.A review of biomedical datasets relating to drug discovery: a knowledge graphperspective.BriefingsinBioinformatics23,bbac404.doi:10.1093/bib/bbac404(2022)
-
[64]
Babalou,S.,Samuel,S.&König-Ries,B.ReproducibleDomain-SpecificKnowledgeGraphs in the Life Sciences: a Systematic Literature Review2023. doi:10.48550/arXiv.2309.08754
-
[65]
Unni, D. R.et al.Biolink Model: A universal schema for knowledge graphs in clinical, biomedical, and translational science.Clinical and Translational Science15,1848–1855. doi:10.1111/cts.13302 (2022)
-
[66]
Scientific data3(1), 1–9 (2016)
Wilkinson, M. D.et al.The FAIR Guiding Principles for scientific data management and stewardship.Scientific Data3,160018. doi:10.1038/sdata.2016.18 (2016)
-
[67]
doi:10.1038/s41587-023-01848-y (2023)
Lobentanzer, S.et al.Democratizing knowledge representation with BioCypher.Nature Biotechnology41,1056–1059. doi:10.1038/s41587-023-01848-y (2023)
-
[68]
White,A.D.,Braza,J.D.,Pieler,M.,Skarlinksi,M.&Narayanan,S.IntroducingPaperQA3: a frontier multimodal deep research agent for science2026
-
[69]
Databricks.What is Medallion Architecture?https://www.databricks.com/blog/what-is- medallion-architecture. 2026. 68.Kedro. Open-source Python framework for reproducible, maintainable and modular data science workflowshttps://kedro.org/. 2026
2026
-
[70]
Bastian, F. B.et al.Bgee in 2024: focus on curated single-cell RNA-seq datasets, and query tools.Nucleic Acids Research53,D878–D885. doi:10.1093/nar/gkae1118 (2025)
-
[71]
Davis, A. P., Wiegers, T. C., Johnson, R. J., Sciaky, D., Wiegers, J. & Mattingly, C. J. Comparative Toxicogenomics Database (CTD): update 2023.Nucleic Acids Research51, D1257–D1262. doi:10.1093/nar/gkac833 (2023)
-
[72]
Nucleic Acids Research48,D845–D855
Piñero, J.et al.The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Research48,D845–D855. doi:10.1093/nar/gkz1021 (2020)
-
[73]
Knox,C.etal.DrugBank6.0:theDrugBankknowledgebasefor2024.NucleicAcidsResearch 52,D1265–D1275 (2024). 38
2024
-
[74]
doi:10.1093/nar/gkac1085 (2023)
Avram, S.et al.DrugCentral 2023 extends human clinical data and integrates veterinary drugs.Nucleic Acids Research51,D1276–D1287. doi:10.1093/nar/gkac1085 (2023)
-
[75]
L., Braschi, B., Gray, K., McClay, J., Tweedie, S
Seal, R. L., Braschi, B., Gray, K., McClay, J., Tweedie, S. & Bruford, E. A. Genenames.org: the HGNC and PGNC resources in 2026.Nucleic Acids Research,gkaf1229 (2025)
2026
-
[76]
doi:10.1016/j.medj.2025.100642 (2025)
Tanaka, Y.et al.OnSIDES database: Extracting adverse drug events from drug labels using natural language processing models.Med6,100642. doi:10.1016/j.medj.2025.100642 (2025)
-
[77]
doi:10.1093/nar/gkae1128 (2025)
Buniello,A.etal.OpenTargetsPlatform:facilitatingtherapeutichypothesesbuildingindrug discovery.Nucleic Acids Research53,D1467–D1475. doi:10.1093/nar/gkae1128 (2025)
-
[78]
doi:10.1126/science.1257601 (2015)
Menche, J.et al.Uncovering disease-disease relationships through the incomplete interac- tome.Science347,1257601. doi:10.1126/science.1257601 (2015)
-
[79]
Oughtred,R.etal.TheBioGRIDdatabase:Acomprehensivebiomedicalresourceofcurated protein, genetic, and chemical interactions.Protein Science30,187–200. doi:10.1002/pro. 3978 (2021)
work page doi:10.1002/pro 2021
-
[80]
doi:10.1093/nar/gkae1113 (2025)
Szklarczyk, D.et al.The STRING database in 2025: protein networks with directionality of regulation.Nucleic Acids Research53,D730–D737. doi:10.1093/nar/gkae1113 (2025)
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.