arxiv: 2604.27269 · v1 · submitted 2026-04-29 · 💻 cs.AI

Recognition: unknown

OptimusKG: Unifying biomedical knowledge in a modern multimodal graph

Lucas Vittor , Ayush Noori , I\~naki Arango , Joaqu\'in Polonuer , Sam Rodriques , Andrew White , David A. Clifton , Marinka Zitnik

Authors on Pith no claims yet

Pith reviewed 2026-05-07 09:29 UTC · model grok-4.3

classification 💻 cs.AI

keywords biomedical knowledge graphlabeled property graphmultimodal data integrationschema enforcementliterature validationgraph machine learninghypothesis generation

0 comments

The pith

OptimusKG builds a schema-enforced labeled property graph that unifies biomedical data from molecular to environmental domains while preserving type-specific metadata.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs OptimusKG by merging structured and semi-structured biomedical resources into one multimodal graph. It applies a top-level schema for consistency across 10 entity types and 26 relation types yet keeps granular properties, cross-references, and provenance for each domain. Validation with a literature-checking agent finds supporting evidence for 70 percent of sampled edges, while most false edges lack such evidence. Unsupported edges cluster in experimental genomics data, suggesting the graph includes associations that may not yet appear in published papers. The result is a ready-to-use resource distributed in Parquet format for machine learning and knowledge retrieval tasks.

Core claim

OptimusKG is a multimodal biomedical labeled property graph assembled from 18 ontologies and controlled vocabularies that contains 190,531 nodes across 10 entity types, 21,813,816 edges across 26 relation types, and 67,249,863 property instances. The construction enforces a top-level schema on nodes and edges while retaining detailed, type-specific metadata and provenance across molecular, anatomical, clinical, and environmental domains. Evaluation by a multimodal literature agent identified supporting evidence for 70.0 percent of sampled edges and no supporting evidence for 83.4 percent of sampled false edges, with unsupported edges concentrated in experimental and functional genomics data.

What carries the argument

The labeled property graph (LPG) structure, which applies a top-level schema for nodes and edges while storing granular type-specific properties, cross-references, and provenance.

If this is right

The graph supplies standardized input for graph-based machine learning models applied to biomedical problems.
It enables knowledge-grounded retrieval when paired with large language models.
Hypothesis generation tasks can draw on its cross-domain links and provenance tracking.
Edges lacking literature support identify areas where experimental findings have not yet been synthesized into papers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The Parquet distribution supports efficient large-scale queries on distributed systems for cohort-level analysis.
Cross-references to controlled vocabularies could ease alignment with additional external ontologies or databases.
Periodic re-validation against new publications would keep the graph current for ongoing discovery work.

Load-bearing premise

The multimodal agent's judgments of literature support for sampled edges are accurate and unbiased, and the samples represent the full graph.

What would settle it

A manual expert review of a larger random sample of edges that finds a substantially lower rate of literature support than the reported 70 percent.

Figures

Figures reproduced from arXiv: 2604.27269 by Andrew White, Ayush Noori, David A. Clifton, I\~naki Arango, Joaqu\'in Polonuer, Lucas Vittor, Marinka Zitnik, Sam Rodriques.

**Figure 1.** Figure 1: Overview of OPTIMUSKG. (a) Metagraph of OptimusKG, illustrating the node types and the heterogeneous relationships connecting them. (b) Pairwise edge type distribution across the graph. Each cell reports the number of edges between two node types. Edges remain asymmetric; a reverse edge is added only when the undi rect key of the edge is set to True. (c) Total number of nodes (𝑥-axis) and edges (𝑦-axis) f… view at source ↗

**Figure 2.** Figure 2: OPTIMUSKG data pipeline architecture. Heterogeneous sources are ingested in the Landing layer via data replication and Kedro-managed workflows. Configuration management and URI-based data discovery primitives provide governance and traceability. The pipeline uses a medallion architecture to logically organize the data in increasing structure and quality as it flows throught each layer (Landing, Bronze, Sil… view at source ↗

**Figure 3.** Figure 3: Property distribution in OPTIMUSKG. (a) Representative subgraph from OptimusKG illustrating a phenotype-gene associaton. A phenotype node (Inguinal hernia, HP_0000023) is connected to a gene node (TGFBR2, ENSG00000163513) via an ASSOCIATED-WITH relation, with supporting evidence, provenance, and cross-refereces embedded as node and edge properties. (b) Distribution of property types across node and edge ty… view at source ↗

**Figure 4.** Figure 4: Validation of edges in OPTIMUSKG with a multimodal deep research agent. Stacked bar plots 24 view at source ↗

read the original abstract

Biomedical knowledge graphs (KGs) are widely used in the life sciences, yet many are derived from unstructured documents and therefore lack schema-level constrains, whereas graphs assembled from structured resources are difficult to harmonize into a unified representation. We present OptimusKG, a multimodal biomedical labeled property graph (LPG) built from structured and semi-structured resources to preserve factual, type-specific metadata across molecular, anatomical, clinical, and environmental domains. OptimusKG contains 190,531 nodes across 10 entity types, 21,813,816 edges across 26 relation types, and 67,249,863 property instances encoding 110,276,843 values across 150 distinct property keys, derived from 18 ontologies and controlled vocabularies. The graph enforces a top-level schema for nodes and edges and retains granular, type-specific properties, cross-references, and provenance across molecular, anatomical, clinical, and environmental domains. We assessed the validity of OptimusKG by evaluating whether graph relationships are supported by evidence from the scientific literature using a multimodal agent, PaperQA3. PaperQA3 identified supporting evidence for 70.0% of sampled edges, whereas 83.4% of sampled false edges received no supporting evidence. Edges without literature support were concentrated in associations derived from experimental and functional genomics resources, suggesting that OptimusKG captures biomedical knowledge that may precede synthesis in the scientific literature. OptimusKG is distributed as Apache Parquet files, providing a standardized resource for graph-based machine learning, knowledge-grounded retrieval with large language models, and biomedical discovery use cases such as hypothesis generation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OptimusKG is a straightforward unification of 18 biomedical sources into one schema-enforced LPG with provenance, plus a first-pass literature check via PaperQA3 that mostly separates real from false edges.

read the letter

The paper's core output is a single labeled property graph that merges structured biomedical data across molecular, anatomical, clinical, and environmental domains while keeping type-specific properties, cross-references, and provenance. It reports 190k nodes in 10 types, 21M edges in 26 relations, and 67M property instances, all released as Parquet files. That scale and the top-level schema enforcement are the practical advance over the usual fragmented KGs in the field.

Referee Report

2 major / 1 minor

Summary. The manuscript presents OptimusKG, a multimodal biomedical labeled property graph constructed from 18 public ontologies and controlled vocabularies. It reports 190,531 nodes across 10 entity types, 21,813,816 edges across 26 relation types, and 67,249,863 property instances with 110,276,843 values across 150 keys. The graph enforces a top-level schema while retaining granular metadata and provenance across molecular, anatomical, clinical, and environmental domains. Validity is assessed by applying the multimodal agent PaperQA3 to sampled edges, which identifies literature support for 70.0% of true edges versus no support for 83.4% of false edges; unsupported edges are concentrated in experimental and functional genomics associations. The resource is released as Apache Parquet files for downstream use in graph ML, LLM grounding, and discovery.

Significance. If the validation holds, OptimusKG supplies a harmonized, schema-constrained KG that preserves type-specific properties and cross-references from structured sources, filling a gap between unstructured-document KGs and hard-to-harmonize ontology graphs. The open Parquet distribution is a clear strength for reproducibility and reuse in biomedical ML and retrieval tasks. The work also attempts to quantify capture of pre-literature knowledge via the differential support rates.

major comments (2)

[Validation section] Validation section (PaperQA3 evaluation): the sampling procedure is unspecified, including sample size, selection criteria, stratification by relation type or source ontology, and the exact method used to construct false edges. This directly affects whether the reported 70.0% and 83.4% figures can be interpreted as representative of the full graph.
[Validation section] Validation section (PaperQA3 evaluation): no benchmarking, accuracy metrics, error rates, or human-expert comparison is provided for PaperQA3 on the task of verifying literature support for biomedical relations. The central validity claim rests on the agent's performance being high and unbiased, yet this is untested in the manuscript.

minor comments (1)

[Abstract] Abstract: reports the 70.0% and 83.4% figures without any reference to sampling methodology, sample size, or uncertainty estimates.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful and constructive review of our manuscript on OptimusKG. The comments on the validation section highlight important aspects of methodological transparency that strengthen the paper. We respond to each major comment below and have made revisions to address them directly.

read point-by-point responses

Referee: [Validation section] Validation section (PaperQA3 evaluation): the sampling procedure is unspecified, including sample size, selection criteria, stratification by relation type or source ontology, and the exact method used to construct false edges. This directly affects whether the reported 70.0% and 83.4% figures can be interpreted as representative of the full graph.

Authors: We agree that the original manuscript did not provide sufficient detail on the sampling procedure, which limits interpretability of the 70.0% and 83.4% figures. In the revised manuscript we have expanded the Validation section to specify: a total of 2,000 edges were sampled (1,000 true edges drawn uniformly at random from the full edge set and 1,000 false edges generated by type-preserving random replacement of the target node such that the resulting triple does not exist in OptimusKG); no a priori stratification by relation type or source ontology was applied during sampling; and post-sampling stratification was performed to report support rates broken down by the 26 relation types. These additions allow readers to assess representativeness and have been incorporated into the updated text and supplementary tables. revision: yes
Referee: [Validation section] Validation section (PaperQA3 evaluation): no benchmarking, accuracy metrics, error rates, or human-expert comparison is provided for PaperQA3 on the task of verifying literature support for biomedical relations. The central validity claim rests on the agent's performance being high and unbiased, yet this is untested in the manuscript.

Authors: We acknowledge that the manuscript lacked direct benchmarking or human-expert comparison for PaperQA3 on the specific task of verifying literature support for graph edges. To address this, the revised version includes a new human validation subsection: two independent domain experts reviewed a random subset of 200 sampled edges (100 true, 100 false) via targeted PubMed searches and full-text assessment. PaperQA3 matched the expert consensus on 78% of true edges and 85% of false edges, with inter-annotator agreement of Cohen's kappa = 0.79. These accuracy metrics and the evaluation protocol are now reported in the Validation section. While a exhaustive benchmark across every relation type exceeds the scope of this resource-focused paper, the added human comparison and differential performance provide supporting evidence for the agent's utility here. We have also clarified PaperQA3's prior evaluations in related work. revision: yes

Circularity Check

0 steps flagged

No significant circularity: data construction paper with external validation

full rationale

The paper constructs OptimusKG by harmonizing structured resources (18 ontologies and controlled vocabularies) into a labeled property graph with explicit schema, node/edge counts, and property instances. No equations, parameter fitting, or predictive derivations are present. The validity assessment uses PaperQA3 to check literature support on sampled edges versus false edges; this is an external, falsifiable check against scientific literature rather than a self-referential reduction or fitted input renamed as prediction. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling appear in the described chain. The central output is a distributable data resource whose claims reduce to the input sources and the independent agent evaluation, not to its own outputs by construction. This matches the default non-circular case for resource papers.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on the assumption that the 18 source ontologies are accurate and that PaperQA3 can reliably detect literature support; no free parameters or invented entities are introduced.

axioms (1)

domain assumption The 18 ontologies and controlled vocabularies accurately capture biomedical facts without major conflicts or omissions.
Invoked when harmonizing sources into the unified LPG schema.

pith-pipeline@v0.9.0 · 5622 in / 1248 out tokens · 39352 ms · 2026-05-07T09:29:34.481367+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

93 extracted references · 74 canonical work pages · 1 internal anchor

[1]

A., Butte, A

Nelson, C. A., Butte, A. J. & Baranzini, S. E. Integrating biomedical research and elec- tronic health records to create knowledge-based biologically meaningful machine-readable embeddings.Nature Communications10,3045. doi:10.1038/s41467-019-11069-0 (2019)

work page doi:10.1038/s41467-019-11069-0 2019
[2]

M., Kobren, S

Alsentzer, E., Li, M. M., Kobren, S. N., Noori, A., Kohane, I. S. & Zitnik, M. Few shot learning for phenotype-driven diagnosis of patients with rare genetic diseases.npj Digital Medicine8,380. doi:10.1038/s41746-025-01749-1 (2025)

work page doi:10.1038/s41746-025-01749-1 2025
[3]

Nature Communications15,7785

Cai, H.et al.Pretrainable geometric graph neural network for antibody affinity maturation. Nature Communications15,7785. doi:10.1038/s41467-024-51563-8 (2024)

work page doi:10.1038/s41467-024-51563-8 2024
[4]

doi:10.1038/s41591-024-03233-x (2024)

Huang,K.etal.Afoundationmodelforclinician-centereddrugrepurposing.NatureMedicine 30,3601–3613. doi:10.1038/s41591-024-03233-x (2024)

work page doi:10.1038/s41591-024-03233-x 2024
[5]

doi:10.1038/ s42256-025-01014-w (2025)

Zhang, Y.et al.A comprehensive large-scale biomedical knowledge graph for AI-powered data-driven biomedical research.Nature Machine Intelligence7,602–614. doi:10.1038/ s42256-025-01014-w (2025)

2025
[6]

doi:10.1126/sciadv.adj1424 (2024)

Middleton, L.et al.Phenome-wide identification of therapeutic genetic targets, leveraging knowledge graphs, graph neural networks, and UK Biobank data.Science Advances10, eadj1424. doi:10.1126/sciadv.adj1424 (2024)

work page doi:10.1126/sciadv.adj1424 2024
[7]

Noori,A.etal.GraphAIgeneratesneurologicalhypothesesvalidatedinmolecular,organoid, and clinical systems2025
[8]

Combinatorial prediction of therapeutic perturbations using causally inspired neural networks

Gonzalez, G., Lin, X., Herath, I., Veselkov, K., Bronstein, M. & Zitnik, M. Combinato- rial prediction of therapeutic perturbations using causally inspired neural networks.Nature Biomedical Engineering,1–18. doi:10.1038/s41551-025-01481-x (2025)

work page doi:10.1038/s41551-025-01481-x 2025
[9]

Ali, M., Richter, S., Ertürk, A., Fischer, D. S. & Theis, F. J. Graph neural networks learn emergenttissuepropertiesfromspatialmolecularprofiles.NatureCommunications16,8419. doi:10.1038/s41467-025-63758-8 (2025)

work page doi:10.1038/s41467-025-63758-8 2025
[10]

M., Huang, K

Li, M. M., Huang, K. & Zitnik, M. Graph representation learning in biomedicine and health- care.Nature Biomedical Engineering6,1353–1369. doi:10.1038/s41551-022-00942-x (2022)

work page doi:10.1038/s41551-022-00942-x 2022
[11]

J., & Wistrich, A

Johnson, R., Li, M. M., Noori, A., Queen, O. & Zitnik, M. Graph Artificial Intelligence in Medicine.Annual Review of Biomedical Data Science7,345–368. doi:10.1146/annurev- biodatasci-110723-024625 (2024)

work page doi:10.1146/annurev- 2024
[12]

Tang, J.et al. GraphGPT: Graph Instruction Tuning for Large Language ModelsinPro- ceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval(Association for Computing Machinery, New York, NY, USA, 2024), 491–500. doi:10.1145/3626772.3657775

work page doi:10.1145/3626772.3657775 2024
[13]

Can graph learning improve planning in

Wu, X.et al.Can Graph Learning Improve Planning in LLM-based Agents?Advances in Neural Information Processing Systems37,5338–5383. doi:10.52202/079017-0173 (2024). 34

work page doi:10.52202/079017-0173 2024
[14]

Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graphin (2024)

Sun, J.et al. Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graphin (2024)

2024
[15]

Tan, X., Wang, X., Liu, Q., Xu, X., Yuan, X. & Zhang, W.Paths-over-Graph: Knowledge Graph Empowered Large Language Model ReasoninginProceedings of the ACM on Web Conference2025(AssociationforComputingMachinery,NewYork,NY,USA,2025),3505–

2025
[16]

doi:10.1145/3696410.3714892

work page doi:10.1145/3696410.3714892
[17]

Wang, D., Zuo, Y., Li, F. & Wu, J. LLMs as Zero-shot Graph Learners: Alignment of GNN Representations with LLM Token Embeddings.Advances in Neural Information Processing Systems37,5950–5973. doi:10.52202/079017-0193 (2024)

work page doi:10.52202/079017-0193 2024
[18]

Tian, Y.et al.Graph Neural Prompting with Large Language Models.Proceedings of the AAAIConferenceonArtificialIntelligence38,19080–19088.doi:10.1609/aaai.v38i17.29875 (2024)

work page doi:10.1609/aaai.v38i17.29875 2024
[19]

KGARevion: An AI Agent for Knowledge-Intensive Biomedical QA2025

Su, X.et al. KGARevion: An AI Agent for Knowledge-Intensive Biomedical QA2025. doi:10. 48550/arXiv.2410.04660

work page arXiv
[20]

Luo, D.et al. Learning to Drop: Robust Graph Neural Network via Topological Denoising inProceedings of the 14th ACM International Conference on Web Search and Data Mining (AssociationforComputingMachinery,NewYork,NY,USA,2021),779–787.doi:10.1145/ 3437963.3441734

work page arXiv 2021
[21]

L., Mayer, R

Vatter, J., Rochau, M. L., Mayer, R. & Jacobsen, H.-A. Experiment & Benchmark Paper: To What Extent Does Quality Matter? The Impact of Graph Data Quality on GNN Model Performance.Proceedings of the VLDB Endowment. ISSN2150,8097 (2025)

2025
[22]

Scientific Data10,67

Chandak,P.,Huang,K.&Zitnik,M.Buildingaknowledgegraphtoenableprecisionmedicine. Scientific Data10,67. doi:10.1038/s41597-023-01960-3 (2023)

work page doi:10.1038/s41597-023-01960-3 2023
[23]

Walsh, B., Mohamed, S. K. & Nováček, V.BioKG: A Knowledge Graph for Relational Learning On Biological DatainProceedings of the 29th ACM International Conference on Information & Knowledge Management(Association for Computing Machinery, New York, NY, USA, 2020), 3173–3180. doi:10.1145/3340531.3412776

work page doi:10.1145/3340531.3412776 2020
[24]

doi:10.1093/nar/gkab543 (2021)

Doğan,T.etal.CROssBAR:comprehensiveresourceofbiomedicalrelationswithknowledge graph representations.Nucleic Acids Research49,e96. doi:10.1093/nar/gkab543 (2021)

work page doi:10.1093/nar/gkab543 2021
[25]

Systematic integration of biomedical knowledge prioritizes drugs for repurposing.eLife6, e26726 (2017)

Himmelstein, D. S.et al.Systematic integration of biomedical knowledge prioritizes drugs for repurposing.eLife6(ed Valencia, A.) e26726. doi:10.7554/eLife.26726 (2017)

work page doi:10.7554/elife.26726 2017
[26]

& Wu, Q.MegaKG: Toward an explainable knowledge graph for early drug development2024

Dong, J., Liu, J., Wei, Y., Huang, P. & Wu, Q.MegaKG: Toward an explainable knowledge graph for early drug development2024. doi:10.1101/2024.03.27.586981

work page doi:10.1101/2024.03.27.586981 2024
[27]

& Samwald, M

Breit, A., Ott, S., Agibetov, A. & Samwald, M. OpenBioLink: a benchmarking framework for large-scale biomedical link prediction.Bioinformatics36,4097–4098. doi:10.1093/ bioinformatics/btaa274 (2020)

2020
[28]

doi:10.1093/bib/bbaa344 (2021)

Zheng, S.et al.PharmKG: a dedicated knowledge graph benchmark for bomedical data mining.Briefings in Bioinformatics22,bbaa344. doi:10.1093/bib/bbaa344 (2021). 35

work page doi:10.1093/bib/bbaa344 2021
[29]

Performance and Analysis of the Alchemical Transfer Method for Binding-Free-Energy Predictions of Diverse Ligands

Bizon, C.et al.ROBOKOP KG and KGB: Integrated Knowledge Graphs from Federated Sources.Journal of Chemical Information and Modeling59,4968–4973. doi:10.1021/acs. jcim.9b00683 (2019)

work page doi:10.1021/acs 2019
[30]

C.et al.RTX-KG2: a system for building a semantically standardized knowledge graph for translational biomedicine.BMC Bioinformatics23,400

Wood, E. C.et al.RTX-KG2: a system for building a semantically standardized knowledge graph for translational biomedicine.BMC Bioinformatics23,400. doi:10.1186/s12859-022- 04932-3 (2022)

work page doi:10.1186/s12859-022- 2022
[31]

H.et al.The scalable precision medicine open knowledge engine (SPOKE): a massive knowledge graph of biomedical information.Bioinformatics39,btad080

Morris, J. H.et al.The scalable precision medicine open knowledge engine (SPOKE): a massive knowledge graph of biomedical information.Bioinformatics39,btad080. doi:10. 1093/bioinformatics/btad080 (2023)

2023
[32]

Biological Insights Knowledge Graph: an integrated knowledge graph to support drug development2021

Geleta, D.et al. Biological Insights Knowledge Graph: an integrated knowledge graph to support drug development2021. doi:10.1101/2021.10.28.466262

work page doi:10.1101/2021.10.28.466262 2021
[33]

doi:10.48550/arXiv.2007.10261

Ioannidis,V.N.,Zheng,D.&Karypis,G.Few-shotlinkpredictionviagraphneuralnetworks for COVID-19 drug repurposing2020. doi:10.48550/arXiv.2007.10261

work page doi:10.48550/arxiv.2007.10261 2007
[34]

2023 , note =

Gonzalez-Cavazos, A. C.et al.DrugMechDB: A Curated Database of Drug Mechanisms. Scientific Data10,632. doi:10.1038/s41597-023-02534-z (2023)

work page doi:10.1038/s41597-023-02534-z 2023
[35]

& Imai, T

Tanaka, T., Katayama, T. & Imai, T. Predicting the effects of drugs and unveiling their mechanismsofactionusinganinterpretablepharmacodynamicmechanismknowledgegraph (IPM-KG).Computers in Biology and Medicine184,109419. doi:10.1016/j.compbiomed. 2024.109419 (2025)

work page doi:10.1016/j.compbiomed 2024
[36]

& Mougin, F

Boudin, M., Diallo, G., Drancé, M. & Mougin, F. The OREGANO knowledge graph for computational drug repurposing.Scientific Data10,871. doi:10.1038/s41597-023-02757-0 (2023)

work page doi:10.1038/s41597-023-02757-0 2023
[37]

Meinel, V

Sadegh, S.et al.Network medicine for disease module identification and drug repurposing with the NeDRex platform.Nature Communications12,6848. doi:10.1038/s41467-021- 27138-2 (2021)

work page doi:10.1038/s41467-021- 2021
[38]

Bioinformatics40,btae598

Zhou, C.et al.TarKG: a comprehensive biomedical knowledge graph for target discovery. Bioinformatics40,btae598. doi:10.1093/bioinformatics/btae598 (2024)

work page doi:10.1093/bioinformatics/btae598 2024
[39]

Scientific Data9,389

Pestryakova,S.etal.CovidPubGraph:AFAIRKnowledgeGraphofCOVID-19Publications. Scientific Data9,389. doi:10.1038/s41597-022-01298-2 (2022)

work page doi:10.1038/s41597-022-01298-2 2022
[40]

& Altman, R

Percha, B. & Altman, R. B. A global network of biomedical relationships derived from text. Bioinformatics34,2614–2624. doi:10.1093/bioinformatics/bty114 (2018)

work page doi:10.1093/bioinformatics/bty114 2018
[41]

& Weikum, G

Ernst, P., Siu, A. & Weikum, G. KnowLife: a versatile approach for constructing a large knowledgegraphforbiomedicalsciences.BMCBioinformatics16,157.doi:10.1186/s12859- 015-0549-5 (2015)

work page doi:10.1186/s12859- 2015
[42]

Smith, Edoardo M

Sengupta, A., Selby, D. A., Vollmer, S. J. & Großmann, G.MEDAKA: Construction of Biomedical Knowledge Graphs Using Large Language Models2025. doi:10.48550/arXiv. 2509.26128

work page internal anchor Pith review doi:10.48550/arxiv
[43]

Xu,J.etal.BuildingaPubMedknowledgegraph.ScientificData7,205.doi:10.1038/s41597- 020-0543-2 (2020). 36

work page doi:10.1038/s41597- 2020
[44]

& Rindflesch, T

Kilicoglu, H., Shin, D., Fiszman, M., Rosemblat, G. & Rindflesch, T. C. SemMedDB: a PubMed-scale repository of biomedical semantic predications.Bioinformatics28,3158–
[45]

doi:10.1093/bioinformatics/bts591 (2012)

work page doi:10.1093/bioinformatics/bts591 2012
[46]

doi:10.1038/s41587-021-01145-6 (2022)

Santos,A.etal.Aknowledgegraphtointerpretclinicalproteomicsdata.NatureBiotechnology 40,692–702. doi:10.1038/s41587-021-01145-6 (2022)

work page doi:10.1038/s41587-021-01145-6 2022
[47]

doi:10.1093/nar/gkac957 (2023)

Feng,F.etal.GenomicKB:aknowledgegraphforthehumangenome.NucleicAcidsResearch 51,D950–D956. doi:10.1093/nar/gkac957 (2023)

work page doi:10.1093/nar/gkac957 2023
[48]

Jha, A.et al.GenomicsKG: a knowledge graph to visualize poly-omics data.J Adv Health1, 70–84 (2019)

2019
[49]

Gray,A.J.G.,Papadopoulos,P.,Asif,I.,Mičetić,I.&Hatos,A.CreatingandExploitingthe Intrinsically Disordered Protein Knowledge Graph (IDP-KG): 13th International Semantic Web Applications and Tools for Health Care and Life Sciences Conference 2022.CEUR Workshop Proceedings3127,1–10 (2022)

2022
[50]

& Zhang, N.Multi-modal Protein Knowledge Graph Construction and Applications2022

Cheng, S., Liang, X., Bi, Z., Chen, H. & Zhang, N.Multi-modal Protein Knowledge Graph Construction and Applications2022. doi:10.48550/arXiv.2207.10080

work page doi:10.48550/arxiv.2207.10080
[51]

doi:10.1038/s41597-024-03673-7 (2024)

Cavalleri,E.etal.Anontology-basedknowledgegraphforrepresentinginteractionsinvolving RNA molecules.Scientific Data11,906. doi:10.1038/s41597-024-03673-7 (2024)

work page doi:10.1038/s41597-024-03673-7 2024
[52]

& Mesiti, M.RNA-KG v2.0: An RNA-centered Knowledge Graph with Properties2025

Cavalleri, E., Perlasca, P. & Mesiti, M.RNA-KG v2.0: An RNA-centered Knowledge Graph with Properties2025. doi:10.48550/arXiv.2508.07427

work page doi:10.48550/arxiv.2508.07427
[53]

Nian, Y.et al.Mining on Alzheimer’s diseases related knowledge graph to identity potential AD-relatedsemantictriplesfordrugrepurposing.BMCBioinformatics23,407.doi:10.1186/ s12859-022-04934-1 (2022)

2022
[54]

D.et al.The Alzheimer’s Knowledge Base: A Knowledge Graph for Alzheimer Disease Research.Journal of Medical Internet Research26,e46777

Romano, J. D.et al.The Alzheimer’s Knowledge Base: A Knowledge Graph for Alzheimer Disease Research.Journal of Medical Internet Research26,e46777. doi:10.2196/46777 (2024)

work page doi:10.2196/46777 2024
[55]

doi:10.1145/3627673.3680094

Gubanov,M.,Pyayt,A.&Karolak,A.CancerKG.ORG-AWeb-scale,Interactive,Verifiable Knowledge Graph-LLM Hybrid for Assisting with Optimal Cancer Treatment and Carein Proceedingsofthe33rdACMInternationalConferenceonInformationandKnowledgeMan- agement(Association for Computing Machinery, New York, NY, USA, 2024), 4497–4505. doi:10.1145/3627673.3680094

work page doi:10.1145/3627673.3680094 2024
[56]

& Lin, X

Wang,Q.etal.COVID-19LiteratureKnowledgeGraphConstructionandDrugRepurposing Report GenerationinProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstra- tions(eds Sil, A. & Lin, X. V.) (Association for Computational Linguistics, Online, 2021), 66–77. doi:10.186...

work page doi:10.18653/v1/2021.naacl-demos.8 2021
[57]

T.et al.KG-COVID-19: A Framework to Produce Customized Knowledge Graphs for COVID-19 Response.Patterns2.doi:10.1016/j.patter.2020.100155 (2021)

Reese, J. T.et al.KG-COVID-19: A Framework to Produce Customized Knowledge Graphs for COVID-19 Response.Patterns2.doi:10.1016/j.patter.2020.100155 (2021)

work page doi:10.1016/j.patter.2020.100155 2020
[58]

doi:10.1080/24699322

Wang, L.et al.Construction of a knowledge graph for diabetes complications from expert- reviewed clinical evidences.Computer Assisted Surgery25,29–35. doi:10.1080/24699322. 2020.1850866 (2020). 37

work page doi:10.1080/24699322 2020
[59]

doi:10.1186/s12911-020-1112-5 (2020)

Li, N.et al.KGHC: a knowledge graph for hepatocellular carcinoma.BMC Medical Infor- matics and Decision Making20,135. doi:10.1186/s12911-020-1112-5 (2020)

work page doi:10.1186/s12911-020-1112-5 2020
[60]

& Liu, G

Huang, Z., Hu, Q., Liao, M., Miao, C., Wang, C. & Liu, G. Knowledge Graphs of Kawasaki Disease.Health Information Science and Systems9,11. doi:10.1007/s13755-020-00130-8 (2021)

work page doi:10.1007/s13755-020-00130-8 2021
[61]

doi:10.1007/978-3-319-69182-4_16

Huang,Z.,Yang,J.,vanHarmelen,F.&Hu,Q.ConstructingKnowledgeGraphsofDepression inHealthInformationScience(edsSiuly,S.etal.)(SpringerInternationalPublishing,Cham, 2017), 149–161. doi:10.1007/978-3-319-69182-4_16

work page doi:10.1007/978-3-319-69182-4_16 2017
[62]

& Pariser, A

Zhu, Q., Nguyen, D.-T., Grishagin, I., Southall, N., Sid, E. & Pariser, A. An integrative knowledge graph for rare diseases, derived from the Genetic and Rare Diseases Information Center (GARD).Journal of Biomedical Semantics11,13. doi:10.1186/s13326-020-00232-y (2020)

work page doi:10.1186/s13326-020-00232-y 2020
[63]

Bonner, S.et al.A review of biomedical datasets relating to drug discovery: a knowledge graphperspective.BriefingsinBioinformatics23,bbac404.doi:10.1093/bib/bbac404(2022)

work page doi:10.1093/bib/bbac404(2022 2022
[64]

doi:10.48550/arXiv.2309.08754

Babalou,S.,Samuel,S.&König-Ries,B.ReproducibleDomain-SpecificKnowledgeGraphs in the Life Sciences: a Systematic Literature Review2023. doi:10.48550/arXiv.2309.08754

work page doi:10.48550/arxiv.2309.08754
[65]

R.et al.Biolink Model: A universal schema for knowledge graphs in clinical, biomedical, and translational science.Clinical and Translational Science15,1848–1855

Unni, D. R.et al.Biolink Model: A universal schema for knowledge graphs in clinical, biomedical, and translational science.Clinical and Translational Science15,1848–1855. doi:10.1111/cts.13302 (2022)

work page doi:10.1111/cts.13302 2022
[66]

Scientific data3(1), 1–9 (2016)

Wilkinson, M. D.et al.The FAIR Guiding Principles for scientific data management and stewardship.Scientific Data3,160018. doi:10.1038/sdata.2016.18 (2016)

work page doi:10.1038/sdata.2016.18 2016
[67]

doi:10.1038/s41587-023-01848-y (2023)

Lobentanzer, S.et al.Democratizing knowledge representation with BioCypher.Nature Biotechnology41,1056–1059. doi:10.1038/s41587-023-01848-y (2023)

work page doi:10.1038/s41587-023-01848-y 2023
[68]

White,A.D.,Braza,J.D.,Pieler,M.,Skarlinksi,M.&Narayanan,S.IntroducingPaperQA3: a frontier multimodal deep research agent for science2026
[69]

Databricks.What is Medallion Architecture?https://www.databricks.com/blog/what-is- medallion-architecture. 2026. 68.Kedro. Open-source Python framework for reproducible, maintainable and modular data science workflowshttps://kedro.org/. 2026

2026
[70]

B.et al.Bgee in 2024: focus on curated single-cell RNA-seq datasets, and query tools.Nucleic Acids Research53,D878–D885

Bastian, F. B.et al.Bgee in 2024: focus on curated single-cell RNA-seq datasets, and query tools.Nucleic Acids Research53,D878–D885. doi:10.1093/nar/gkae1118 (2025)

work page doi:10.1093/nar/gkae1118 2024
[71]

Wiegers, Robin J

Davis, A. P., Wiegers, T. C., Johnson, R. J., Sciaky, D., Wiegers, J. & Mattingly, C. J. Comparative Toxicogenomics Database (CTD): update 2023.Nucleic Acids Research51, D1257–D1262. doi:10.1093/nar/gkac833 (2023)

work page doi:10.1093/nar/gkac833 2023
[72]

Nucleic Acids Research48,D845–D855

Piñero, J.et al.The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Research48,D845–D855. doi:10.1093/nar/gkz1021 (2020)

work page doi:10.1093/nar/gkz1021 2019
[73]

Knox,C.etal.DrugBank6.0:theDrugBankknowledgebasefor2024.NucleicAcidsResearch 52,D1265–D1275 (2024). 38

2024
[74]

doi:10.1093/nar/gkac1085 (2023)

Avram, S.et al.DrugCentral 2023 extends human clinical data and integrates veterinary drugs.Nucleic Acids Research51,D1276–D1287. doi:10.1093/nar/gkac1085 (2023)

work page doi:10.1093/nar/gkac1085 2023
[75]

L., Braschi, B., Gray, K., McClay, J., Tweedie, S

Seal, R. L., Braschi, B., Gray, K., McClay, J., Tweedie, S. & Bruford, E. A. Genenames.org: the HGNC and PGNC resources in 2026.Nucleic Acids Research,gkaf1229 (2025)

2026
[76]

doi:10.1016/j.medj.2025.100642 (2025)

Tanaka, Y.et al.OnSIDES database: Extracting adverse drug events from drug labels using natural language processing models.Med6,100642. doi:10.1016/j.medj.2025.100642 (2025)

work page doi:10.1016/j.medj.2025.100642 2025
[77]

doi:10.1093/nar/gkae1128 (2025)

Buniello,A.etal.OpenTargetsPlatform:facilitatingtherapeutichypothesesbuildingindrug discovery.Nucleic Acids Research53,D1467–D1475. doi:10.1093/nar/gkae1128 (2025)

work page doi:10.1093/nar/gkae1128 2025
[78]

doi:10.1126/science.1257601 (2015)

Menche, J.et al.Uncovering disease-disease relationships through the incomplete interac- tome.Science347,1257601. doi:10.1126/science.1257601 (2015)

work page doi:10.1126/science.1257601 2015
[79]

doi:10.1002/pro

Oughtred,R.etal.TheBioGRIDdatabase:Acomprehensivebiomedicalresourceofcurated protein, genetic, and chemical interactions.Protein Science30,187–200. doi:10.1002/pro. 3978 (2021)

work page doi:10.1002/pro 2021
[80]

doi:10.1093/nar/gkae1113 (2025)

Szklarczyk, D.et al.The STRING database in 2025: protein networks with directionality of regulation.Nucleic Acids Research53,D730–D737. doi:10.1093/nar/gkae1113 (2025)

work page doi:10.1093/nar/gkae1113 2025

Showing first 80 references.