pith. machine review for the scientific record. sign in

arxiv: 2604.27269 · v1 · submitted 2026-04-29 · 💻 cs.AI

Recognition: unknown

OptimusKG: Unifying biomedical knowledge in a modern multimodal graph

Authors on Pith no claims yet

Pith reviewed 2026-05-07 09:29 UTC · model grok-4.3

classification 💻 cs.AI
keywords biomedical knowledge graphlabeled property graphmultimodal data integrationschema enforcementliterature validationgraph machine learninghypothesis generation
0
0 comments X

The pith

OptimusKG builds a schema-enforced labeled property graph that unifies biomedical data from molecular to environmental domains while preserving type-specific metadata.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs OptimusKG by merging structured and semi-structured biomedical resources into one multimodal graph. It applies a top-level schema for consistency across 10 entity types and 26 relation types yet keeps granular properties, cross-references, and provenance for each domain. Validation with a literature-checking agent finds supporting evidence for 70 percent of sampled edges, while most false edges lack such evidence. Unsupported edges cluster in experimental genomics data, suggesting the graph includes associations that may not yet appear in published papers. The result is a ready-to-use resource distributed in Parquet format for machine learning and knowledge retrieval tasks.

Core claim

OptimusKG is a multimodal biomedical labeled property graph assembled from 18 ontologies and controlled vocabularies that contains 190,531 nodes across 10 entity types, 21,813,816 edges across 26 relation types, and 67,249,863 property instances. The construction enforces a top-level schema on nodes and edges while retaining detailed, type-specific metadata and provenance across molecular, anatomical, clinical, and environmental domains. Evaluation by a multimodal literature agent identified supporting evidence for 70.0 percent of sampled edges and no supporting evidence for 83.4 percent of sampled false edges, with unsupported edges concentrated in experimental and functional genomics data.

What carries the argument

The labeled property graph (LPG) structure, which applies a top-level schema for nodes and edges while storing granular type-specific properties, cross-references, and provenance.

If this is right

  • The graph supplies standardized input for graph-based machine learning models applied to biomedical problems.
  • It enables knowledge-grounded retrieval when paired with large language models.
  • Hypothesis generation tasks can draw on its cross-domain links and provenance tracking.
  • Edges lacking literature support identify areas where experimental findings have not yet been synthesized into papers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The Parquet distribution supports efficient large-scale queries on distributed systems for cohort-level analysis.
  • Cross-references to controlled vocabularies could ease alignment with additional external ontologies or databases.
  • Periodic re-validation against new publications would keep the graph current for ongoing discovery work.

Load-bearing premise

The multimodal agent's judgments of literature support for sampled edges are accurate and unbiased, and the samples represent the full graph.

What would settle it

A manual expert review of a larger random sample of edges that finds a substantially lower rate of literature support than the reported 70 percent.

Figures

Figures reproduced from arXiv: 2604.27269 by Andrew White, Ayush Noori, David A. Clifton, I\~naki Arango, Joaqu\'in Polonuer, Lucas Vittor, Marinka Zitnik, Sam Rodriques.

Figure 1
Figure 1. Figure 1: Overview of OPTIMUSKG. (a) Metagraph of OptimusKG, illustrating the node types and the hetero￾geneous relationships connecting them. (b) Pairwise edge type distribution across the graph. Each cell reports the number of edges between two node types. Edges remain asymmetric; a reverse edge is added only when the undi rect key of the edge is set to True. (c) Total number of nodes (𝑥-axis) and edges (𝑦-axis) f… view at source ↗
Figure 2
Figure 2. Figure 2: OPTIMUSKG data pipeline architecture. Heterogeneous sources are ingested in the Landing layer via data replication and Kedro-managed workflows. Configuration management and URI-based data discovery primitives provide governance and traceability. The pipeline uses a medallion architecture to logically organize the data in increasing structure and quality as it flows throught each layer (Landing, Bronze, Sil… view at source ↗
Figure 3
Figure 3. Figure 3: Property distribution in OPTIMUSKG. (a) Representative subgraph from OptimusKG illustrating a phenotype-gene associaton. A phenotype node (Inguinal hernia, HP_0000023) is connected to a gene node (TGFBR2, ENSG00000163513) via an ASSOCIATED-WITH relation, with supporting evidence, provenance, and cross-refereces embedded as node and edge properties. (b) Distribution of property types across node and edge ty… view at source ↗
Figure 4
Figure 4. Figure 4: Validation of edges in OPTIMUSKG with a multimodal deep research agent. Stacked bar plots 24 view at source ↗
read the original abstract

Biomedical knowledge graphs (KGs) are widely used in the life sciences, yet many are derived from unstructured documents and therefore lack schema-level constrains, whereas graphs assembled from structured resources are difficult to harmonize into a unified representation. We present OptimusKG, a multimodal biomedical labeled property graph (LPG) built from structured and semi-structured resources to preserve factual, type-specific metadata across molecular, anatomical, clinical, and environmental domains. OptimusKG contains 190,531 nodes across 10 entity types, 21,813,816 edges across 26 relation types, and 67,249,863 property instances encoding 110,276,843 values across 150 distinct property keys, derived from 18 ontologies and controlled vocabularies. The graph enforces a top-level schema for nodes and edges and retains granular, type-specific properties, cross-references, and provenance across molecular, anatomical, clinical, and environmental domains. We assessed the validity of OptimusKG by evaluating whether graph relationships are supported by evidence from the scientific literature using a multimodal agent, PaperQA3. PaperQA3 identified supporting evidence for 70.0% of sampled edges, whereas 83.4% of sampled false edges received no supporting evidence. Edges without literature support were concentrated in associations derived from experimental and functional genomics resources, suggesting that OptimusKG captures biomedical knowledge that may precede synthesis in the scientific literature. OptimusKG is distributed as Apache Parquet files, providing a standardized resource for graph-based machine learning, knowledge-grounded retrieval with large language models, and biomedical discovery use cases such as hypothesis generation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents OptimusKG, a multimodal biomedical labeled property graph constructed from 18 public ontologies and controlled vocabularies. It reports 190,531 nodes across 10 entity types, 21,813,816 edges across 26 relation types, and 67,249,863 property instances with 110,276,843 values across 150 keys. The graph enforces a top-level schema while retaining granular metadata and provenance across molecular, anatomical, clinical, and environmental domains. Validity is assessed by applying the multimodal agent PaperQA3 to sampled edges, which identifies literature support for 70.0% of true edges versus no support for 83.4% of false edges; unsupported edges are concentrated in experimental and functional genomics associations. The resource is released as Apache Parquet files for downstream use in graph ML, LLM grounding, and discovery.

Significance. If the validation holds, OptimusKG supplies a harmonized, schema-constrained KG that preserves type-specific properties and cross-references from structured sources, filling a gap between unstructured-document KGs and hard-to-harmonize ontology graphs. The open Parquet distribution is a clear strength for reproducibility and reuse in biomedical ML and retrieval tasks. The work also attempts to quantify capture of pre-literature knowledge via the differential support rates.

major comments (2)
  1. [Validation section] Validation section (PaperQA3 evaluation): the sampling procedure is unspecified, including sample size, selection criteria, stratification by relation type or source ontology, and the exact method used to construct false edges. This directly affects whether the reported 70.0% and 83.4% figures can be interpreted as representative of the full graph.
  2. [Validation section] Validation section (PaperQA3 evaluation): no benchmarking, accuracy metrics, error rates, or human-expert comparison is provided for PaperQA3 on the task of verifying literature support for biomedical relations. The central validity claim rests on the agent's performance being high and unbiased, yet this is untested in the manuscript.
minor comments (1)
  1. [Abstract] Abstract: reports the 70.0% and 83.4% figures without any reference to sampling methodology, sample size, or uncertainty estimates.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful and constructive review of our manuscript on OptimusKG. The comments on the validation section highlight important aspects of methodological transparency that strengthen the paper. We respond to each major comment below and have made revisions to address them directly.

read point-by-point responses
  1. Referee: [Validation section] Validation section (PaperQA3 evaluation): the sampling procedure is unspecified, including sample size, selection criteria, stratification by relation type or source ontology, and the exact method used to construct false edges. This directly affects whether the reported 70.0% and 83.4% figures can be interpreted as representative of the full graph.

    Authors: We agree that the original manuscript did not provide sufficient detail on the sampling procedure, which limits interpretability of the 70.0% and 83.4% figures. In the revised manuscript we have expanded the Validation section to specify: a total of 2,000 edges were sampled (1,000 true edges drawn uniformly at random from the full edge set and 1,000 false edges generated by type-preserving random replacement of the target node such that the resulting triple does not exist in OptimusKG); no a priori stratification by relation type or source ontology was applied during sampling; and post-sampling stratification was performed to report support rates broken down by the 26 relation types. These additions allow readers to assess representativeness and have been incorporated into the updated text and supplementary tables. revision: yes

  2. Referee: [Validation section] Validation section (PaperQA3 evaluation): no benchmarking, accuracy metrics, error rates, or human-expert comparison is provided for PaperQA3 on the task of verifying literature support for biomedical relations. The central validity claim rests on the agent's performance being high and unbiased, yet this is untested in the manuscript.

    Authors: We acknowledge that the manuscript lacked direct benchmarking or human-expert comparison for PaperQA3 on the specific task of verifying literature support for graph edges. To address this, the revised version includes a new human validation subsection: two independent domain experts reviewed a random subset of 200 sampled edges (100 true, 100 false) via targeted PubMed searches and full-text assessment. PaperQA3 matched the expert consensus on 78% of true edges and 85% of false edges, with inter-annotator agreement of Cohen's kappa = 0.79. These accuracy metrics and the evaluation protocol are now reported in the Validation section. While a exhaustive benchmark across every relation type exceeds the scope of this resource-focused paper, the added human comparison and differential performance provide supporting evidence for the agent's utility here. We have also clarified PaperQA3's prior evaluations in related work. revision: yes

Circularity Check

0 steps flagged

No significant circularity: data construction paper with external validation

full rationale

The paper constructs OptimusKG by harmonizing structured resources (18 ontologies and controlled vocabularies) into a labeled property graph with explicit schema, node/edge counts, and property instances. No equations, parameter fitting, or predictive derivations are present. The validity assessment uses PaperQA3 to check literature support on sampled edges versus false edges; this is an external, falsifiable check against scientific literature rather than a self-referential reduction or fitted input renamed as prediction. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling appear in the described chain. The central output is a distributable data resource whose claims reduce to the input sources and the independent agent evaluation, not to its own outputs by construction. This matches the default non-circular case for resource papers.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on the assumption that the 18 source ontologies are accurate and that PaperQA3 can reliably detect literature support; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption The 18 ontologies and controlled vocabularies accurately capture biomedical facts without major conflicts or omissions.
    Invoked when harmonizing sources into the unified LPG schema.

pith-pipeline@v0.9.0 · 5622 in / 1248 out tokens · 39352 ms · 2026-05-07T09:29:34.481367+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

93 extracted references · 74 canonical work pages · 1 internal anchor

  1. [1]

    A., Butte, A

    Nelson, C. A., Butte, A. J. & Baranzini, S. E. Integrating biomedical research and elec- tronic health records to create knowledge-based biologically meaningful machine-readable embeddings.Nature Communications10,3045. doi:10.1038/s41467-019-11069-0 (2019)

  2. [2]

    M., Kobren, S

    Alsentzer, E., Li, M. M., Kobren, S. N., Noori, A., Kohane, I. S. & Zitnik, M. Few shot learning for phenotype-driven diagnosis of patients with rare genetic diseases.npj Digital Medicine8,380. doi:10.1038/s41746-025-01749-1 (2025)

  3. [3]

    Nature Communications15,7785

    Cai, H.et al.Pretrainable geometric graph neural network for antibody affinity maturation. Nature Communications15,7785. doi:10.1038/s41467-024-51563-8 (2024)

  4. [4]

    doi:10.1038/s41591-024-03233-x (2024)

    Huang,K.etal.Afoundationmodelforclinician-centereddrugrepurposing.NatureMedicine 30,3601–3613. doi:10.1038/s41591-024-03233-x (2024)

  5. [5]

    doi:10.1038/ s42256-025-01014-w (2025)

    Zhang, Y.et al.A comprehensive large-scale biomedical knowledge graph for AI-powered data-driven biomedical research.Nature Machine Intelligence7,602–614. doi:10.1038/ s42256-025-01014-w (2025)

  6. [6]

    doi:10.1126/sciadv.adj1424 (2024)

    Middleton, L.et al.Phenome-wide identification of therapeutic genetic targets, leveraging knowledge graphs, graph neural networks, and UK Biobank data.Science Advances10, eadj1424. doi:10.1126/sciadv.adj1424 (2024)

  7. [7]

    Noori,A.etal.GraphAIgeneratesneurologicalhypothesesvalidatedinmolecular,organoid, and clinical systems2025

  8. [8]

    Combinatorial prediction of therapeutic perturbations using causally inspired neural networks

    Gonzalez, G., Lin, X., Herath, I., Veselkov, K., Bronstein, M. & Zitnik, M. Combinato- rial prediction of therapeutic perturbations using causally inspired neural networks.Nature Biomedical Engineering,1–18. doi:10.1038/s41551-025-01481-x (2025)

  9. [9]

    Ali, M., Richter, S., Ertürk, A., Fischer, D. S. & Theis, F. J. Graph neural networks learn emergenttissuepropertiesfromspatialmolecularprofiles.NatureCommunications16,8419. doi:10.1038/s41467-025-63758-8 (2025)

  10. [10]

    M., Huang, K

    Li, M. M., Huang, K. & Zitnik, M. Graph representation learning in biomedicine and health- care.Nature Biomedical Engineering6,1353–1369. doi:10.1038/s41551-022-00942-x (2022)

  11. [11]

    J., & Wistrich, A

    Johnson, R., Li, M. M., Noori, A., Queen, O. & Zitnik, M. Graph Artificial Intelligence in Medicine.Annual Review of Biomedical Data Science7,345–368. doi:10.1146/annurev- biodatasci-110723-024625 (2024)

  12. [12]

    Tang, J.et al. GraphGPT: Graph Instruction Tuning for Large Language ModelsinPro- ceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval(Association for Computing Machinery, New York, NY, USA, 2024), 491–500. doi:10.1145/3626772.3657775

  13. [13]

    Can graph learning improve planning in

    Wu, X.et al.Can Graph Learning Improve Planning in LLM-based Agents?Advances in Neural Information Processing Systems37,5338–5383. doi:10.52202/079017-0173 (2024). 34

  14. [14]

    Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graphin (2024)

    Sun, J.et al. Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graphin (2024)

  15. [15]

    Tan, X., Wang, X., Liu, Q., Xu, X., Yuan, X. & Zhang, W.Paths-over-Graph: Knowledge Graph Empowered Large Language Model ReasoninginProceedings of the ACM on Web Conference2025(AssociationforComputingMachinery,NewYork,NY,USA,2025),3505–

  16. [16]

    doi:10.1145/3696410.3714892

  17. [17]

    Wang, D., Zuo, Y., Li, F. & Wu, J. LLMs as Zero-shot Graph Learners: Alignment of GNN Representations with LLM Token Embeddings.Advances in Neural Information Processing Systems37,5950–5973. doi:10.52202/079017-0193 (2024)

  18. [18]

    Tian, Y.et al.Graph Neural Prompting with Large Language Models.Proceedings of the AAAIConferenceonArtificialIntelligence38,19080–19088.doi:10.1609/aaai.v38i17.29875 (2024)

  19. [19]

    KGARevion: An AI Agent for Knowledge-Intensive Biomedical QA2025

    Su, X.et al. KGARevion: An AI Agent for Knowledge-Intensive Biomedical QA2025. doi:10. 48550/arXiv.2410.04660

  20. [20]

    Luo, D.et al. Learning to Drop: Robust Graph Neural Network via Topological Denoising inProceedings of the 14th ACM International Conference on Web Search and Data Mining (AssociationforComputingMachinery,NewYork,NY,USA,2021),779–787.doi:10.1145/ 3437963.3441734

  21. [21]

    L., Mayer, R

    Vatter, J., Rochau, M. L., Mayer, R. & Jacobsen, H.-A. Experiment & Benchmark Paper: To What Extent Does Quality Matter? The Impact of Graph Data Quality on GNN Model Performance.Proceedings of the VLDB Endowment. ISSN2150,8097 (2025)

  22. [22]

    Scientific Data10,67

    Chandak,P.,Huang,K.&Zitnik,M.Buildingaknowledgegraphtoenableprecisionmedicine. Scientific Data10,67. doi:10.1038/s41597-023-01960-3 (2023)

  23. [23]

    Walsh, B., Mohamed, S. K. & Nováček, V.BioKG: A Knowledge Graph for Relational Learning On Biological DatainProceedings of the 29th ACM International Conference on Information & Knowledge Management(Association for Computing Machinery, New York, NY, USA, 2020), 3173–3180. doi:10.1145/3340531.3412776

  24. [24]

    doi:10.1093/nar/gkab543 (2021)

    Doğan,T.etal.CROssBAR:comprehensiveresourceofbiomedicalrelationswithknowledge graph representations.Nucleic Acids Research49,e96. doi:10.1093/nar/gkab543 (2021)

  25. [25]

    Systematic integration of biomedical knowledge prioritizes drugs for repurposing.eLife6, e26726 (2017)

    Himmelstein, D. S.et al.Systematic integration of biomedical knowledge prioritizes drugs for repurposing.eLife6(ed Valencia, A.) e26726. doi:10.7554/eLife.26726 (2017)

  26. [26]

    & Wu, Q.MegaKG: Toward an explainable knowledge graph for early drug development2024

    Dong, J., Liu, J., Wei, Y., Huang, P. & Wu, Q.MegaKG: Toward an explainable knowledge graph for early drug development2024. doi:10.1101/2024.03.27.586981

  27. [27]

    & Samwald, M

    Breit, A., Ott, S., Agibetov, A. & Samwald, M. OpenBioLink: a benchmarking framework for large-scale biomedical link prediction.Bioinformatics36,4097–4098. doi:10.1093/ bioinformatics/btaa274 (2020)

  28. [28]

    doi:10.1093/bib/bbaa344 (2021)

    Zheng, S.et al.PharmKG: a dedicated knowledge graph benchmark for bomedical data mining.Briefings in Bioinformatics22,bbaa344. doi:10.1093/bib/bbaa344 (2021). 35

  29. [29]

    Performance and Analysis of the Alchemical Transfer Method for Binding-Free-Energy Predictions of Diverse Ligands

    Bizon, C.et al.ROBOKOP KG and KGB: Integrated Knowledge Graphs from Federated Sources.Journal of Chemical Information and Modeling59,4968–4973. doi:10.1021/acs. jcim.9b00683 (2019)

  30. [30]

    C.et al.RTX-KG2: a system for building a semantically standardized knowledge graph for translational biomedicine.BMC Bioinformatics23,400

    Wood, E. C.et al.RTX-KG2: a system for building a semantically standardized knowledge graph for translational biomedicine.BMC Bioinformatics23,400. doi:10.1186/s12859-022- 04932-3 (2022)

  31. [31]

    H.et al.The scalable precision medicine open knowledge engine (SPOKE): a massive knowledge graph of biomedical information.Bioinformatics39,btad080

    Morris, J. H.et al.The scalable precision medicine open knowledge engine (SPOKE): a massive knowledge graph of biomedical information.Bioinformatics39,btad080. doi:10. 1093/bioinformatics/btad080 (2023)

  32. [32]

    Biological Insights Knowledge Graph: an integrated knowledge graph to support drug development2021

    Geleta, D.et al. Biological Insights Knowledge Graph: an integrated knowledge graph to support drug development2021. doi:10.1101/2021.10.28.466262

  33. [33]

    doi:10.48550/arXiv.2007.10261

    Ioannidis,V.N.,Zheng,D.&Karypis,G.Few-shotlinkpredictionviagraphneuralnetworks for COVID-19 drug repurposing2020. doi:10.48550/arXiv.2007.10261

  34. [34]

    2023 , note =

    Gonzalez-Cavazos, A. C.et al.DrugMechDB: A Curated Database of Drug Mechanisms. Scientific Data10,632. doi:10.1038/s41597-023-02534-z (2023)

  35. [35]

    & Imai, T

    Tanaka, T., Katayama, T. & Imai, T. Predicting the effects of drugs and unveiling their mechanismsofactionusinganinterpretablepharmacodynamicmechanismknowledgegraph (IPM-KG).Computers in Biology and Medicine184,109419. doi:10.1016/j.compbiomed. 2024.109419 (2025)

  36. [36]

    & Mougin, F

    Boudin, M., Diallo, G., Drancé, M. & Mougin, F. The OREGANO knowledge graph for computational drug repurposing.Scientific Data10,871. doi:10.1038/s41597-023-02757-0 (2023)

  37. [37]

    Meinel, V

    Sadegh, S.et al.Network medicine for disease module identification and drug repurposing with the NeDRex platform.Nature Communications12,6848. doi:10.1038/s41467-021- 27138-2 (2021)

  38. [38]

    Bioinformatics40,btae598

    Zhou, C.et al.TarKG: a comprehensive biomedical knowledge graph for target discovery. Bioinformatics40,btae598. doi:10.1093/bioinformatics/btae598 (2024)

  39. [39]

    Scientific Data9,389

    Pestryakova,S.etal.CovidPubGraph:AFAIRKnowledgeGraphofCOVID-19Publications. Scientific Data9,389. doi:10.1038/s41597-022-01298-2 (2022)

  40. [40]

    & Altman, R

    Percha, B. & Altman, R. B. A global network of biomedical relationships derived from text. Bioinformatics34,2614–2624. doi:10.1093/bioinformatics/bty114 (2018)

  41. [41]

    & Weikum, G

    Ernst, P., Siu, A. & Weikum, G. KnowLife: a versatile approach for constructing a large knowledgegraphforbiomedicalsciences.BMCBioinformatics16,157.doi:10.1186/s12859- 015-0549-5 (2015)

  42. [42]

    Smith, Edoardo M

    Sengupta, A., Selby, D. A., Vollmer, S. J. & Großmann, G.MEDAKA: Construction of Biomedical Knowledge Graphs Using Large Language Models2025. doi:10.48550/arXiv. 2509.26128

  43. [43]

    Xu,J.etal.BuildingaPubMedknowledgegraph.ScientificData7,205.doi:10.1038/s41597- 020-0543-2 (2020). 36

  44. [44]

    & Rindflesch, T

    Kilicoglu, H., Shin, D., Fiszman, M., Rosemblat, G. & Rindflesch, T. C. SemMedDB: a PubMed-scale repository of biomedical semantic predications.Bioinformatics28,3158–

  45. [45]

    doi:10.1093/bioinformatics/bts591 (2012)

  46. [46]

    doi:10.1038/s41587-021-01145-6 (2022)

    Santos,A.etal.Aknowledgegraphtointerpretclinicalproteomicsdata.NatureBiotechnology 40,692–702. doi:10.1038/s41587-021-01145-6 (2022)

  47. [47]

    doi:10.1093/nar/gkac957 (2023)

    Feng,F.etal.GenomicKB:aknowledgegraphforthehumangenome.NucleicAcidsResearch 51,D950–D956. doi:10.1093/nar/gkac957 (2023)

  48. [48]

    Jha, A.et al.GenomicsKG: a knowledge graph to visualize poly-omics data.J Adv Health1, 70–84 (2019)

  49. [49]

    Gray,A.J.G.,Papadopoulos,P.,Asif,I.,Mičetić,I.&Hatos,A.CreatingandExploitingthe Intrinsically Disordered Protein Knowledge Graph (IDP-KG): 13th International Semantic Web Applications and Tools for Health Care and Life Sciences Conference 2022.CEUR Workshop Proceedings3127,1–10 (2022)

  50. [50]

    & Zhang, N.Multi-modal Protein Knowledge Graph Construction and Applications2022

    Cheng, S., Liang, X., Bi, Z., Chen, H. & Zhang, N.Multi-modal Protein Knowledge Graph Construction and Applications2022. doi:10.48550/arXiv.2207.10080

  51. [51]

    doi:10.1038/s41597-024-03673-7 (2024)

    Cavalleri,E.etal.Anontology-basedknowledgegraphforrepresentinginteractionsinvolving RNA molecules.Scientific Data11,906. doi:10.1038/s41597-024-03673-7 (2024)

  52. [52]

    & Mesiti, M.RNA-KG v2.0: An RNA-centered Knowledge Graph with Properties2025

    Cavalleri, E., Perlasca, P. & Mesiti, M.RNA-KG v2.0: An RNA-centered Knowledge Graph with Properties2025. doi:10.48550/arXiv.2508.07427

  53. [53]

    Nian, Y.et al.Mining on Alzheimer’s diseases related knowledge graph to identity potential AD-relatedsemantictriplesfordrugrepurposing.BMCBioinformatics23,407.doi:10.1186/ s12859-022-04934-1 (2022)

  54. [54]

    D.et al.The Alzheimer’s Knowledge Base: A Knowledge Graph for Alzheimer Disease Research.Journal of Medical Internet Research26,e46777

    Romano, J. D.et al.The Alzheimer’s Knowledge Base: A Knowledge Graph for Alzheimer Disease Research.Journal of Medical Internet Research26,e46777. doi:10.2196/46777 (2024)

  55. [55]

    doi:10.1145/3627673.3680094

    Gubanov,M.,Pyayt,A.&Karolak,A.CancerKG.ORG-AWeb-scale,Interactive,Verifiable Knowledge Graph-LLM Hybrid for Assisting with Optimal Cancer Treatment and Carein Proceedingsofthe33rdACMInternationalConferenceonInformationandKnowledgeMan- agement(Association for Computing Machinery, New York, NY, USA, 2024), 4497–4505. doi:10.1145/3627673.3680094

  56. [56]

    & Lin, X

    Wang,Q.etal.COVID-19LiteratureKnowledgeGraphConstructionandDrugRepurposing Report GenerationinProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstra- tions(eds Sil, A. & Lin, X. V.) (Association for Computational Linguistics, Online, 2021), 66–77. doi:10.186...

  57. [57]

    T.et al.KG-COVID-19: A Framework to Produce Customized Knowledge Graphs for COVID-19 Response.Patterns2.doi:10.1016/j.patter.2020.100155 (2021)

    Reese, J. T.et al.KG-COVID-19: A Framework to Produce Customized Knowledge Graphs for COVID-19 Response.Patterns2.doi:10.1016/j.patter.2020.100155 (2021)

  58. [58]

    doi:10.1080/24699322

    Wang, L.et al.Construction of a knowledge graph for diabetes complications from expert- reviewed clinical evidences.Computer Assisted Surgery25,29–35. doi:10.1080/24699322. 2020.1850866 (2020). 37

  59. [59]

    doi:10.1186/s12911-020-1112-5 (2020)

    Li, N.et al.KGHC: a knowledge graph for hepatocellular carcinoma.BMC Medical Infor- matics and Decision Making20,135. doi:10.1186/s12911-020-1112-5 (2020)

  60. [60]

    & Liu, G

    Huang, Z., Hu, Q., Liao, M., Miao, C., Wang, C. & Liu, G. Knowledge Graphs of Kawasaki Disease.Health Information Science and Systems9,11. doi:10.1007/s13755-020-00130-8 (2021)

  61. [61]

    doi:10.1007/978-3-319-69182-4_16

    Huang,Z.,Yang,J.,vanHarmelen,F.&Hu,Q.ConstructingKnowledgeGraphsofDepression inHealthInformationScience(edsSiuly,S.etal.)(SpringerInternationalPublishing,Cham, 2017), 149–161. doi:10.1007/978-3-319-69182-4_16

  62. [62]

    & Pariser, A

    Zhu, Q., Nguyen, D.-T., Grishagin, I., Southall, N., Sid, E. & Pariser, A. An integrative knowledge graph for rare diseases, derived from the Genetic and Rare Diseases Information Center (GARD).Journal of Biomedical Semantics11,13. doi:10.1186/s13326-020-00232-y (2020)

  63. [63]

    Bonner, S.et al.A review of biomedical datasets relating to drug discovery: a knowledge graphperspective.BriefingsinBioinformatics23,bbac404.doi:10.1093/bib/bbac404(2022)

  64. [64]

    doi:10.48550/arXiv.2309.08754

    Babalou,S.,Samuel,S.&König-Ries,B.ReproducibleDomain-SpecificKnowledgeGraphs in the Life Sciences: a Systematic Literature Review2023. doi:10.48550/arXiv.2309.08754

  65. [65]

    R.et al.Biolink Model: A universal schema for knowledge graphs in clinical, biomedical, and translational science.Clinical and Translational Science15,1848–1855

    Unni, D. R.et al.Biolink Model: A universal schema for knowledge graphs in clinical, biomedical, and translational science.Clinical and Translational Science15,1848–1855. doi:10.1111/cts.13302 (2022)

  66. [66]

    Scientific data3(1), 1–9 (2016)

    Wilkinson, M. D.et al.The FAIR Guiding Principles for scientific data management and stewardship.Scientific Data3,160018. doi:10.1038/sdata.2016.18 (2016)

  67. [67]

    doi:10.1038/s41587-023-01848-y (2023)

    Lobentanzer, S.et al.Democratizing knowledge representation with BioCypher.Nature Biotechnology41,1056–1059. doi:10.1038/s41587-023-01848-y (2023)

  68. [68]

    White,A.D.,Braza,J.D.,Pieler,M.,Skarlinksi,M.&Narayanan,S.IntroducingPaperQA3: a frontier multimodal deep research agent for science2026

  69. [69]

    Databricks.What is Medallion Architecture?https://www.databricks.com/blog/what-is- medallion-architecture. 2026. 68.Kedro. Open-source Python framework for reproducible, maintainable and modular data science workflowshttps://kedro.org/. 2026

  70. [70]

    B.et al.Bgee in 2024: focus on curated single-cell RNA-seq datasets, and query tools.Nucleic Acids Research53,D878–D885

    Bastian, F. B.et al.Bgee in 2024: focus on curated single-cell RNA-seq datasets, and query tools.Nucleic Acids Research53,D878–D885. doi:10.1093/nar/gkae1118 (2025)

  71. [71]

    Wiegers, Robin J

    Davis, A. P., Wiegers, T. C., Johnson, R. J., Sciaky, D., Wiegers, J. & Mattingly, C. J. Comparative Toxicogenomics Database (CTD): update 2023.Nucleic Acids Research51, D1257–D1262. doi:10.1093/nar/gkac833 (2023)

  72. [72]

    Nucleic Acids Research48,D845–D855

    Piñero, J.et al.The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Research48,D845–D855. doi:10.1093/nar/gkz1021 (2020)

  73. [73]

    Knox,C.etal.DrugBank6.0:theDrugBankknowledgebasefor2024.NucleicAcidsResearch 52,D1265–D1275 (2024). 38

  74. [74]

    doi:10.1093/nar/gkac1085 (2023)

    Avram, S.et al.DrugCentral 2023 extends human clinical data and integrates veterinary drugs.Nucleic Acids Research51,D1276–D1287. doi:10.1093/nar/gkac1085 (2023)

  75. [75]

    L., Braschi, B., Gray, K., McClay, J., Tweedie, S

    Seal, R. L., Braschi, B., Gray, K., McClay, J., Tweedie, S. & Bruford, E. A. Genenames.org: the HGNC and PGNC resources in 2026.Nucleic Acids Research,gkaf1229 (2025)

  76. [76]

    doi:10.1016/j.medj.2025.100642 (2025)

    Tanaka, Y.et al.OnSIDES database: Extracting adverse drug events from drug labels using natural language processing models.Med6,100642. doi:10.1016/j.medj.2025.100642 (2025)

  77. [77]

    doi:10.1093/nar/gkae1128 (2025)

    Buniello,A.etal.OpenTargetsPlatform:facilitatingtherapeutichypothesesbuildingindrug discovery.Nucleic Acids Research53,D1467–D1475. doi:10.1093/nar/gkae1128 (2025)

  78. [78]

    doi:10.1126/science.1257601 (2015)

    Menche, J.et al.Uncovering disease-disease relationships through the incomplete interac- tome.Science347,1257601. doi:10.1126/science.1257601 (2015)

  79. [79]

    doi:10.1002/pro

    Oughtred,R.etal.TheBioGRIDdatabase:Acomprehensivebiomedicalresourceofcurated protein, genetic, and chemical interactions.Protein Science30,187–200. doi:10.1002/pro. 3978 (2021)

  80. [80]

    doi:10.1093/nar/gkae1113 (2025)

    Szklarczyk, D.et al.The STRING database in 2025: protein networks with directionality of regulation.Nucleic Acids Research53,D730–D737. doi:10.1093/nar/gkae1113 (2025)

Showing first 80 references.