pith. machine review for the scientific record. sign in

arxiv: 2604.23802 · v1 · submitted 2026-04-26 · 💻 cs.MA

Recognition: unknown

EndoGov: A knowledge-governed multi-agent expert system for endometrial cancer risk stratification

Authors on Pith no claims yet

Pith reviewed 2026-05-08 04:56 UTC · model grok-4.3

classification 💻 cs.MA
keywords endometrial cancerrisk stratificationmulti-agent systemclinical guidelinesknowledge graphexpert systemgovernance agentPOLE mutation
0
0 comments X

The pith

EndoGov separates evidence extraction by specialist agents from rule application by a governance agent to enforce clinical guidelines in endometrial cancer risk assignment.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a two-tier multi-agent system that first lets pathology, molecular, and clinical agents pull structured evidence from images or records, then routes that evidence to a governance agent. The governance agent consults an executable rule set drawn from guidelines and applies hard overrides for priority cases such as POLE-mutated tumors. This factorization produces risk labels that remain compliant with mandatory rules while matching or exceeding the discrimination of standard neural models on two separate cohorts. A sympathetic reader would care because current multimodal models often optimize raw accuracy at the expense of violating explicit clinical mandates, leaving decisions that are hard to audit or trust. The approach therefore offers a concrete route to auditable, guideline-respecting automation without sacrificing performance.

Core claim

The decision process is factorized as D(x) = G(P(x), R), where specialist agents P generate schema-constrained evidence reports and the governance agent G applies an executable rule set R drawn from a guideline knowledge graph using deterministic hard paths for overrides and constrained soft-path reasoning for ambiguous cases. On the TCGA-UCEC cohort this yields 0.943 accuracy and 0.973 macro AUC with a conditional logic-violation rate of 0.93 percent among trigger-exposed cases; on the CPTAC-UCEC cohort, where labels are themselves guideline-derived, accuracy reaches 0.842 while locked-transfer neural baselines fall below 0.31. Residual failures localize to upstream evidence extraction, and

What carries the argument

The governance agent G that applies an executable rule set R from the Guideline Knowledge Graph to evidence produced by specialist agents P.

If this is right

  • Mandatory guideline overrides such as the POLE low-risk assignment remain enforced regardless of conflicting morphologic features.
  • Safety failures can be isolated to the evidence-extraction tier rather than the rule-application tier.
  • Hard-path compliance is preserved when the underlying language-model backend is replaced.
  • Performance advantage over standard models is maintained under distribution shift when reference labels follow the same guidelines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same separation of extraction and governance could be ported to other cancer types whose guidelines contain clear override rules.
  • Audit logs produced by the governance layer supply the traceability required for regulatory review of medical AI.
  • Extending the knowledge graph with new guidelines would allow incremental updates without retraining the entire system.

Load-bearing premise

The executable rule set derived from guidelines is complete, conflict-free, and correctly encodes all mandatory overrides while the specialist agents reliably extract every evidence field those rules require.

What would settle it

A single case in which a POLE-mutated high-grade tumor is not assigned to the low-risk group by the governance agent would demonstrate failure of the hard-path override mechanism.

Figures

Figures reproduced from arXiv: 2604.23802 by Dianxiang Sun, Liming Nie, Liyun Shi, Mengyuan Lin, Weiye Dai, Yuling Ma, Zanxiang He.

Figure 1
Figure 1. Figure 1: Overview of EndoGov (two horizontal bands). Upper: runtime pipeline from multimodal inputs (pathology WSI, molecular omics, structured clinical records) to three Tier 1 specialist agents (UNI+prototype pathology matching, scGPT+molecular report, and FIGO-guided clinical summarization), then the Tier 2 chair agent, which uses the Guideline￾KG and routes each case through hard-path priority arbitration or so… view at source ↗
Figure 2
Figure 2. Figure 2: Runtime contract over the Guideline-KG. Input (left): structured patient evidence 𝑋 is extracted from specialist reports (𝑅𝑝𝑎𝑡ℎ, 𝑅𝑚𝑜𝑙, 𝑅𝑐𝑙𝑖) into clinically typed variables, including molecular subtype, FIGO stage, histology, grade, myometrial invasion, and modifier flags such as LVSI, deep myometrial invasion, and no-myometrial-invasion status. Graph query (center): the offline governance memory is querie… view at source ↗
Figure 3
Figure 3. Figure 3: Ablation performance drop on aligned TCGA-UCEC. Bars show Macro-F1 for the full EndoGov pipeline and component-removal controls; the dashed vertical line marks the full-model reference (Macro-F1 = 0.923). Replacing the LLM chair with a logistic-regression soft-path resolver produces a moderate drop, removing the soft-path resolver or semantic KG context produces larger degradation, and replacing in-loop go… view at source ↗
Figure 4
Figure 4. Figure 4: Guideline-grounded reasoning chain for an atypical POLE-mutated case (TCGA-AP-A051). Left: a conventional fusion baseline treats high-grade serous morphology as dominant evidence and predicts High risk, missing the POLE override. Right: EndoGov parses the conflicting evidence, retrieves the source-linked R1_POLE clause, applies priority arbitration rather than averaging evidence streams, and validates the … view at source ↗
read the original abstract

Multimodal artificial intelligence models for endometrial cancer (EC) risk stratification typically optimize aggregate predictive performance but provide limited mechanisms for enforcing mandatory guideline overrides, such as assigning POLE-mutated tumors to the low-risk group despite high-grade morphology. We present EndoGov, a two-tier multi-agent expert system that factorizes the decision process as D(x) = G(P(x), R), where specialist agents P extract structured evidence and a governance agent G applies an executable rule set R. Tier 1 comprises pathology, molecular, and clinical agents that independently generate schema-constrained reports from frozen foundation-model features or structured records. Tier 2 queries an evidence-level-weighted Guideline Knowledge Graph, using deterministic hard-path rules for high-priority overrides and constrained soft-path reasoning for ambiguous cases. In TCGA-UCEC (n=541), EndoGov achieved 0.943 accuracy, 0.973 macro AUC, and a conditional logic-violation rate (C-LVR) of 0.93% among trigger-exposed cases. In CPTAC-UCEC (n=95), where reference labels are guideline-derived, EndoGov reached 0.842 accuracy compared with < 0.31 for locked-transfer neural baselines, supporting governance-pathway transfer under distribution shift rather than validation against independent clinical truth. End-to-end safety decomposition localized residual failures primarily to upstream molecular detection rather than downstream governance. Backend-swap experiments further showed that hard-path compliance is invariant to the LLM backend. These findings indicate that explicit clinical-rule governance can provide guideline-compliant, auditable EC risk assignment while preserving competitive discrimination.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents EndoGov, a two-tier multi-agent expert system for endometrial cancer risk stratification that factorizes the decision as D(x) = G(P(x), R), where Tier-1 specialist agents P extract structured evidence from multimodal inputs and a Tier-2 governance agent G applies an executable rule set R derived from clinical guidelines via a Guideline Knowledge Graph. It reports 0.943 accuracy, 0.973 macro AUC, and 0.93% conditional logic-violation rate (C-LVR) on TCGA-UCEC (n=541), 0.842 accuracy on CPTAC-UCEC (n=95), with failures localized to upstream molecular detection and hard-path compliance invariant to LLM backend.

Significance. If the rule set R is shown to be a complete and faithful encoding of the source guidelines (including mandatory overrides such as POLE-mutated tumors), and the specialist agents reliably extract the required evidence fields, the approach would demonstrate that explicit clinical-rule governance can deliver auditable, guideline-compliant risk assignments with competitive discrimination and robustness under distribution shift. The safety decomposition, backend-swap invariance, and outperformance versus locked-transfer neural baselines on CPTAC are concrete strengths supporting the central claim.

major comments (2)
  1. [Abstract] Abstract and methods (rule-set construction): The central claim that EndoGov provides 'guideline-compliant' EC risk assignment rests on R being a complete, conflict-free encoding of the referenced clinical guidelines (including all mandatory overrides such as POLE-mutated tumors to low-risk). No independent mapping, expert audit, or completeness check of R against the source guidelines is described; the reported metrics only confirm that G follows the implemented R, not that R matches the guidelines. This is load-bearing for the 'guideline-compliant' and 'auditable' assertions.
  2. [Abstract] Abstract (Tier-1 agents): The factorization D(x) = G(P(x), R) and the claim of reliable evidence extraction assume that the pathology, molecular, and clinical agents P extract exactly the schema-constrained fields required by R. No quantitative evaluation of extraction accuracy or error propagation from P to G is provided beyond the overall accuracy and C-LVR; residual failures are localized to 'upstream molecular detection' but without per-agent metrics this remains unverified.
minor comments (2)
  1. The definition and computation of the conditional logic-violation rate (C-LVR) among trigger-exposed cases should be formalized with an equation or pseudocode to allow independent reproduction.
  2. The manuscript would benefit from an explicit table or figure showing the structure of the Guideline Knowledge Graph and the hard-path versus soft-path rules.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, offering clarifications and committing to revisions that strengthen the presentation without overstating what was performed.

read point-by-point responses
  1. Referee: [Abstract] Abstract and methods (rule-set construction): The central claim that EndoGov provides 'guideline-compliant' EC risk assignment rests on R being a complete, conflict-free encoding of the referenced clinical guidelines (including all mandatory overrides such as POLE-mutated tumors to low-risk). No independent mapping, expert audit, or completeness check of R against the source guidelines is described; the reported metrics only confirm that G follows the implemented R, not that R matches the guidelines. This is load-bearing for the 'guideline-compliant' and 'auditable' assertions.

    Authors: We agree that the 'guideline-compliant' and 'auditable' claims require evidence that R faithfully encodes the source guidelines. R was derived by clinical collaborators translating explicit decision criteria and mandatory overrides (including POLE-mutated tumors to low-risk) from NCCN and ESMO guidelines into the executable rules of the Guideline Knowledge Graph. The reported C-LVR of 0.93% verifies that G strictly follows the implemented R. We acknowledge, however, that the manuscript does not include an independent expert audit or formal completeness mapping against the full guideline documents. In the revised version we will add a supplementary table that lists each key guideline statement alongside its corresponding rule in R, together with a methods paragraph describing the construction process. This provides the requested traceability while accurately reflecting that a blinded external audit was not conducted. revision: partial

  2. Referee: [Abstract] Abstract (Tier-1 agents): The factorization D(x) = G(P(x), R) and the claim of reliable evidence extraction assume that the pathology, molecular, and clinical agents P extract exactly the schema-constrained fields required by R. No quantitative evaluation of extraction accuracy or error propagation from P to G is provided beyond the overall accuracy and C-LVR; residual failures are localized to 'upstream molecular detection' but without per-agent metrics this remains unverified.

    Authors: We concur that per-agent extraction metrics would strengthen the factorization claim and clarify error propagation. The manuscript already localizes residual failures to upstream molecular detection via the safety decomposition and shows that hard-path compliance remains high even when upstream extraction is imperfect. To address the gap, the revision will include quantitative accuracy figures for each Tier-1 agent (pathology, molecular, clinical) on cases with available ground-truth annotations, plus an explicit error-propagation analysis showing how extraction mistakes affect final risk assignments. These additions will be placed in the methods and results sections. revision: yes

Circularity Check

0 steps flagged

No significant circularity: rule set R is externally derived from guidelines and performance metrics are independent empirical measurements

full rationale

The paper's derivation chain centers on the explicit factorization D(x) = G(P(x), R), where R is an executable rule set drawn from clinical guidelines (including mandatory overrides such as POLE-mutated tumors) and P consists of specialist agents that extract schema-constrained evidence. This structure treats R as an independent, auditable input rather than a parameter fitted to the target labels or performance data. The reported metrics (0.943 accuracy and 0.973 macro AUC on TCGA-UCEC; 0.842 accuracy on CPTAC-UCEC) are presented as measurements of the system's behavior under this governance, not as the definitional success criterion. The conditional logic-violation rate (C-LVR) quantifies residual implementation deviations (localized to upstream extraction) rather than redefining compliance by construction. No self-citations, uniqueness theorems, or ansatzes from prior author work are invoked as load-bearing justifications. The architecture therefore remains self-contained against external benchmarks, with guideline grounding supplying the non-circular foundation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The system adds a governance mechanism on top of standard foundation-model feature extraction and clinical guidelines; it does not introduce new physical entities or free parameters but relies on the assumption that the guideline rules can be faithfully encoded as executable logic.

axioms (2)
  • domain assumption The clinical guidelines contain a complete, non-conflicting set of mandatory overrides that can be expressed as deterministic hard-path rules and constrained soft-path reasoning.
    The governance agent G applies an executable rule set R drawn from guidelines; correctness of risk assignment depends on this encoding being faithful and exhaustive.
  • domain assumption Specialist agents P can reliably extract the exact schema-constrained evidence fields required by the rule set from frozen foundation-model features or structured records.
    Tier 1 performance directly determines whether the governance layer receives the inputs it needs; any systematic extraction error propagates to final risk labels.

pith-pipeline@v0.9.0 · 5608 in / 1601 out tokens · 19702 ms · 2026-05-08T04:56:31.562984+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

62 extracted references · 23 canonical work pages · 2 internal anchors

  1. [1]

    MRI-based radiomic model for preoperative risk stratification in stage I endometrial cancer

    Chen, J., Gu, H., Fan, W., Wang, Y., Chen, S., Chen, X., Wang, Z., 2021a. MRI-based radiomic model for preoperative risk stratification in stage I endometrial cancer. J. Cancer 12, 726–734. doi:10.7150/jca.50872

  2. [2]

    Multimodal co-attention transformer for survival prediction in gigapixel whole slide images, in: ICCV, pp

    Chen, R., Lu, M., et al., 2021b. Multimodal co-attention transformer for survival prediction in gigapixel whole slide images, in: ICCV, pp. 4015–4025

  3. [3]

    Pan-cancer integrative histology-genomic analysis via multimodal deep learning

    Chen, R., Lu, M., et al., 2022a. Pan-cancer integrative histology-genomic analysis via multimodal deep learning. Cancer Cell 40, 865–878

  4. [4]

    Scalingvisiontransformerstogigapixelimagesviahierarchicalself-supervisedlearning,in:CVPR,pp.16144–16155

    Chen,R.,etal.,2022b. Scalingvisiontransformerstogigapixelimagesviahierarchicalself-supervisedlearning,in:CVPR,pp.16144–16155

  5. [5]

    A general-purpose self-supervised model for computational pathology, in: CVPR, pp

    Chen, R., et al., 2024. A general-purpose self-supervised model for computational pathology, in: CVPR, pp. 1–10

  6. [6]

    Prognostic risk modeling of endometrial cancer using programmed cell death-related genes: a comprehensive machine learning approach

    Chen, T., Yang, Y., Huang, Z., Pan, F., Xiao, Z., Gong, K., Huang, W., Xu, L., Liu, X., Fang, C., 2025. Prognostic risk modeling of endometrial cancer using programmed cell death-related genes: a comprehensive machine learning approach. Discover Oncology 16, 280. doi:10.1007/s12672-025-02039-8

  7. [7]

    Chen, Y., Zhao, W., Yu, L., 2023. Transformer-based multimodal fusion for survival prediction by integrating whole slide images, clinical, and genomic data, in: 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), pp. 1–5. doi:10.1109/ISBI53787.2023. 10230804

  8. [8]

    Prognostic significance of POLE proofreading mutations in endometrial cancer

    Church, D.N., Stelloo, E., Nout, R.A., et al., 2015. Prognostic significance of POLE proofreading mutations in endometrial cancer. Journal of the National Cancer Institute 107, dju402

  9. [9]

    ESGO/ESTRO/ESP guidelines for the management of patients with endometrial carcinoma

    Concin, N., Matias-Guiu, X., Vergote, I., Cibula, D., Mirza, M., Marnitz, S., Ledermann, J., Bosse, T., et al., 2021. ESGO/ESTRO/ESP guidelines for the management of patients with endometrial carcinoma. Int. J. Gynecol. Cancer 31, 12–39

  10. [10]

    Immunologic signatures across molecular subtypes and potential biomarkers for sub-stratification in endometrial cancer

    Costas, L., Frias-Gomez, J., et al., 2023. Immunologic signatures across molecular subtypes and potential biomarkers for sub-stratification in endometrial cancer. International Journal of Molecular Sciences 24, 1791

  11. [11]

    scGPT: towards building a foundation model for single-cell multi-omics using generative AI

    Cui, H., Wang, C., Maan, H., et al., 2024. scGPT: towards building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 21, 933–944. Dai et al.:Preprint submitted to ElsevierPage 25 of 27 EndoGov: Knowledge-Governed Multi-Agent EC Risk Stratification

  12. [12]

    Comparingtheareasundertwoormorecorrelatedreceiveroperatingcharacteristic curves: a nonparametric approach

    DeLong,E.R.,DeLong,D.M.,Clarke-Pearson,D.L.,1988. Comparingtheareasundertwoormorecorrelatedreceiveroperatingcharacteristic curves: a nonparametric approach. Biometrics 44, 837–845

  13. [13]

    TITAN: A multimodal whole-slide foundation model for pathology

    Ding, T., Wagner, S.J., Song, A.H., Chen, R.J., Lu, M.Y., Zhang, A., et al., 2024. TITAN: A multimodal whole-slide foundation model for pathology. arXiv preprint arXiv:2411.19666

  14. [14]

    Proteogenomic characterization of endometrial carcinoma

    Dou, Y., et al., 2020. Proteogenomic characterization of endometrial carcinoma. Cell 182, 1–22

  15. [15]

    AI-based histopathology image analysis reveals a distinct subset of endometrial cancers

    Fremond, S., et al., 2024. AI-based histopathology image analysis reveals a distinct subset of endometrial cancers. Nat. Commun. 15, 1–12

  16. [16]

    The Llama 3 Herd of Models

    Grattafiori, A., Dubey, A., Jauhri, A., et al., 2024. The Llama 3 herd of models. arXiv preprint arXiv:2407.21783

  17. [17]

    Improved preoperative risk stratification in endometrial carcinoma patients: external validation of the ENDORISK Bayesian network model in a large population-based case series

    Grube, M., Reijnen, C., Lucas, P.J.F., et al., 2023. Improved preoperative risk stratification in endometrial carcinoma patients: external validation of the ENDORISK Bayesian network model in a large population-based case series. J. Cancer Res. Clin. Oncol. 149, 7555–7565

  18. [18]

    Population-based screening for endometrial cancer: Human vs

    Hart, G.R., Yan, V., Huang, G.S., Liang, Y., Nartowt, B.J., Muhammad, W., Deng, J., 2020. Population-based screening for endometrial cancer: Human vs. machine intelligence. Front. Artif. Intell. 3, 539879

  19. [19]

    arXiv preprint arXiv:2512.15398 doi:10.48550/arXiv.2512.15398

    He,Z.,Li,M.,Shi,L.,Dai,W.,Nie,L.,2025.Mapis:Aknowledge-graphgroundedmulti-agentframeworkforevidence-basedPCOSdiagnosis. arXiv preprint arXiv:2512.15398 doi:10.48550/arXiv.2512.15398

  20. [20]

    Attention-based deep multiple instance learning, in: ICML, pp

    Ilse, M., Tomczak, J., Welling, M., 2018. Attention-based deep multiple instance learning, in: ICML, pp. 2127–2136

  21. [21]

    Modeling dense multimodal interactions between biological pathways and histology for survival prediction, in: CVPR, pp

    Jaume, G., Vaidya, A., Chen, R.J., Williamson, D.F., Liang, P.P., Mahmood, F., 2024. Modeling dense multimodal interactions between biological pathways and histology for survival prediction, in: CVPR, pp. 11579–11590

  22. [22]

    Risk stratification of endometrial cancer patients: FIGO stage, biomarkers and molecular classification

    Kasius, J.C., Pijnenborg, J.M.A., Lindemann, K., Forsse, D., van Zwol, J., Kristensen, G.B., Krakstad, C., Werner, H.M.J., Amant, F., 2021. Risk stratification of endometrial cancer patients: FIGO stage, biomarkers and molecular classification. Cancers 13, 5848

  23. [23]

    Deep learning models differentiate tumor grades from H&E stained histology sections

    Khoshdeli, M., Borowsky, A., Parvin, B., 2018. Deep learning models differentiate tumor grades from H&E stained histology sections. 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) , 620–623

  24. [24]

    EndometrialcancerriskstratificationusingMRIradiomics:corroboratingwithcholinemetabolism

    Lin,Y.,Wu,R.C.,Lin,Y.C.,Huang,Y.L.,Lin,C.Y.,Lo,C.J.,Lu,H.Y.,Lu,K.Y.,Tsai,S.Y.,Hsieh,C.Y.,Yang,L.Y.,Cheng,M.L.,Chao,A., Lai,C.H.,Lin,G.,2024. EndometrialcancerriskstratificationusingMRIradiomics:corroboratingwithcholinemetabolism. CancerImaging 24, 112. doi:10.1186/s40644-024-00756-x

  25. [25]

    EvoMDT: A self-evolving multi-agent system for structured clinical decision-making in multi-cancer

    Liu, Q., Hu, Z., Huang, T., Niu, Y., Zhang, X., Ma, S., Lin, C., Goh, K.H., Kwon, H.E., Gao, F., Sun, X., Ying, Z., Qiang, G., 2026. EvoMDT: A self-evolving multi-agent system for structured clinical decision-making in multi-cancer. npj Digital Medicine 9, 124. doi:10.1038/s41746-025-02304-8

  26. [26]

    Predicting risk stratification in early-stage endometrial carcinoma: significance of multiparametric MRI radiomics model

    Meng, H., Sun, Y., Zhang, Y., et al., 2024. Predicting risk stratification in early-stage endometrial carcinoma: significance of multiparametric MRI radiomics model. Journal of Digital Imaging 37, 230–239. doi:10.1007/s10278-023-00936-4

  27. [27]

    Foundation models for generalist medical artificial intelligence

    Moor, M., et al., 2023. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265

  28. [28]

    Nzomo, M., Moodley, D., 2025. Integrating knowledge graphs and bayesian networks: A hybrid approach for explainable disease risk prediction, in: Proceedings of the 2025 IEEE 49th Annual Computers, Software, and Applications Conference, pp. 834–844

  29. [29]

    Endometrial cancer: ESMO Clinical Practice Guideline for diagnosis, treatment and follow-up

    Oaknin,A.,Bosse,T.,Creutzberg,C.,Giornelli,G.,Harter,P.,Joly,F.,Lorusso,D.,Marth,C.,Makker,V.,Mirza,M.,etal.,2022. Endometrial cancer: ESMO Clinical Practice Guideline for diagnosis, treatment and follow-up. Ann. Oncol. 33, 860–877

  30. [30]

    GPT-4o system card

    OpenAI, 2024. GPT-4o system card. OpenAI Technical Report Available athttps://openai.com/index/gpt-4o-system-card/

  31. [31]

    Staining invariant features for improving generalization of deep convolutional neural networks in computational pathology

    Otálora, S., Atzori, M., Andrearczyk, V., Khan, A., Müller, H., 2019. Staining invariant features for improving generalization of deep convolutional neural networks in computational pathology. Frontiers in Bioengineering and Biotechnology 7, 198

  32. [32]

    Construction of prognostic risk assessment model of endometrial cancer based on miRNAs

    Peng, X., Kong, Y., Yan, Z., 2023. Construction of prognostic risk assessment model of endometrial cancer based on miRNAs. Molecular and Cellular Biochemistry 478, 2767–2783

  33. [33]

    Clinical-grade AI model for molecular subtyping of endometrial cancer: a multi-center cohort study in China

    Qi, P., Yao, T., Li, H., et al., 2025. Clinical-grade AI model for molecular subtyping of endometrial cancer: a multi-center cohort study in China. Molecular Biomedicine 6, 72. doi:10.1186/s43556-025-00341-z

  34. [34]

    POLE-relatedgenesignaturepredictsprognosis,immunefeature,anddrugtherapyinhumanendometrioid carcinoma

    Qiu,W.,Zhang,R.,Qian,Y.,2024. POLE-relatedgenesignaturepredictsprognosis,immunefeature,anddrugtherapyinhumanendometrioid carcinoma. Heliyon 10, e29548. doi:10.1016/j.heliyon.2024.e29548

  35. [35]

    Preoperativeriskstratificationinendometrialcancer(ENDORISK)byaBayesiannetwork model: A development and validation study

    Reijnen,C.,Gogou,E.,Visser,N.C.M.,etal.,2020. Preoperativeriskstratificationinendometrialcancer(ENDORISK)byaBayesiannetwork model: A development and validation study. PLoS Med. 17, e1003111

  36. [36]

    Improved preoperative risk stratification with CA-125 in low-grade endometrial cancer: a multicenter prospective cohort study

    Reijnen, C., Visser, N.C.M., Kasius, J.C., 2019. Improved preoperative risk stratification with CA-125 in low-grade endometrial cancer: a multicenter prospective cohort study. Journal of Gynecologic Oncology 30, e70

  37. [37]

    Agentclinic: a multimodal agent benchmark to evaluate ai in simulated clinical envi- ronments

    Schmidgall, S., et al., 2024. AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments. arXiv preprint arXiv:2405.07960

  38. [38]

    Clarity: Clinical assistant for routing, inference, and triage

    Shaposhnikov, V., Nesterov, A., Kopanichuk, I., Bakulin, I., Zhelvakov, E., Abramov, R., et al., 2025. Clarity: Clinical assistant for routing, inference, and triage. arXiv preprint arXiv:2510.02463

  39. [39]

    Large language models encode clinical knowledge

    Singhal, K., et al., 2023. Large language models encode clinical knowledge. Nature 620, 172–180

  40. [40]

    Deep neural network models for computational histopathology: A survey

    Srinidhi, C.L., Ciga, O., Martel, A.L., 2021. Deep neural network models for computational histopathology: A survey. Med. Image Anal. 67, 101813. doi:10.1016/j.media.2020.101813

  41. [41]

    Improved risk assessment by integrating molecular and clinicopathological factors in early- stage endometrial cancer: Combined analysis of the PORTEC cohorts

    Stelloo, E., Nout, R.A., Osse, E.M., et al., 2016. Improved risk assessment by integrating molecular and clinicopathological factors in early- stage endometrial cancer: Combined analysis of the PORTEC cohorts. Clinical Cancer Research 22, 4215–4224

  42. [42]

    Cpath-omni: A unified multimodal foundation model for patch and whole slide image analysis in computational pathology

    Sun, Y., Si, Y., Zhu, C., Gong, X., Zhang, K., Chen, P., Zhang, Y., Shui, Z., Lin, T., Yang, L., 2024. Cpath-omni: A unified multimodal foundation model for patch and whole slide image analysis in computational pathology. arXiv preprint arXiv:2412.12077

  43. [43]

    Integrated genomic characterization of endometrial carcinoma

    The Cancer Genome Atlas Research Network, 2013. Integrated genomic characterization of endometrial carcinoma. Nature 497, 67–73

  44. [44]

    Towards generalist biomedical AI

    Tu, T., et al., 2024. Towards generalist biomedical AI. NEJM AI 1, AIoa2300138

  45. [45]

    Clinical Cancer Research 21, 3347–3355

    VanGool,I.C.,Eggink,F.A.,Freeman-Mills,L.,etal.,2015.POLEproofreadingmutationselicitanantitumorimmuneresponseinendometrial cancer. Clinical Cancer Research 21, 3347–3355

  46. [46]

    Prediction of recurrence risk in endometrial cancer with multimodal deep learning

    Volinsky-Fremond, S., et al., 2024. Prediction of recurrence risk in endometrial cancer with multimodal deep learning. Nat. Med. 30, 1962–1973. Dai et al.:Preprint submitted to ElsevierPage 26 of 27 EndoGov: Knowledge-Governed Multi-Agent EC Risk Stratification

  47. [47]

    A foundation model for clinical-grade computational pathology and rare cancers detection

    Vorontsov, E., et al., 2024. A foundation model for clinical-grade computational pathology and rare cancers detection. Nat. Med. 30, 2924– 2935

  48. [48]

    A machine learning-based immune response signature to facilitate prognosis prediction in patients with endometrial cancer

    Wang, X., Guan, J., Feng, L., 2024. A machine learning-based immune response signature to facilitate prognosis prediction in patients with endometrial cancer. Journal of Translational Medicine 22, 1–14

  49. [49]

    Molecular and AI enabled prognostication in endometrial cancer: a 2015 to 2024 bibliometric atlas and critical review

    Wang, X., Wang, Q., Ding, G., Wang, J., Feng, Y., 2026. Molecular and AI enabled prognostication in endometrial cancer: a 2015 to 2024 bibliometric atlas and critical review. Discover Oncology 17, 521. doi:10.1007/s12672-026-04734-6

  50. [50]

    Frequent POLE-driven hypermutation in ovarian endometrioid cancer revealed by mutational signatures in RNA sequencing

    Wang, Y., et al., 2021. Frequent POLE-driven hypermutation in ovarian endometrioid cancer revealed by mutational signatures in RNA sequencing. BMC Medical Genomics 14, 186

  51. [51]

    Automated construction of medical indicator knowledge graphs using retrieval augmented large language models

    Wang, Z., Shi, D., Zhao, J., Diao, X., Tang, X., Qin, Y., 2025. Automated construction of medical indicator knowledge graphs using retrieval augmented large language models. arXiv preprint arXiv:2511.13526

  52. [52]

    Wu, J., Zhu, J., Qi, Y., Chen, J., Xu, M., Menolascina, F., Jin, Y., Grau, V., 2025. Medical Graph RAG: Towards safe medical large language modelviagraphretrieval-augmentedgeneration,in:Proceedingsofthe63rdAnnualMeetingoftheAssociationforComputationalLinguistics (ACL), pp. 28443–28467

  53. [53]

    HGTDG-net: an interpretable heterogeneous graph transformer framework for cancer driver gene prediction

    Xiong, S., Wang, Z., Zhang, J., 2023. HGTDG-net: an interpretable heterogeneous graph transformer framework for cancer driver gene prediction. Briefings in Bioinformatics 24, bbad223

  54. [54]

    A whole-slide foundation model for digital pathology from real-world data

    Xu, H., et al., 2024. A whole-slide foundation model for digital pathology from real-world data. Nature 630, 181–188

  55. [55]

    et al.: MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow (Jul 2025)

    Xu, J., et al., 2025a. MedAgent-Pro: Towards evidence-based multi-modal medical diagnosis via reasoning agentic workflow. arXiv preprint arXiv:2503.18968

  56. [56]

    Multimodaloptimaltransport-basedco-attentiontransformerwithglobalstructureconsistencyforsurvivalprediction, in: ICCV, pp

    Xu,Y.,Chen,H.,2023. Multimodaloptimaltransport-basedco-attentiontransformerwithglobalstructureconsistencyforsurvivalprediction, in: ICCV, pp. 21241–21251

  57. [57]

    2507.17303

    Xu,Z.,Liu,Z.,Hou,J.,Ma,J.,Jin,C.,Wang,Y.,Chen,Z.,Zhang,Z.,Huang,F.,Guo,Z.,Zhou,F.,Xu,Y.,Wang,X.,Chan,R.C.K.,Liang,L., Chen,H.,2025b. Aversatilepathologyco-pilotviareasoningenhancedmultimodallargelanguagemodel. arXivpreprintarXiv:2507.17303

  58. [58]

    PathOrchestra:acomprehensivefoundationmodelforcomputationalpathology with over 100 diverse clinical-grade tasks

    Yan,F.,Wu,J.,Li,J.,Wang,W.,Lu,J.,Chen,W.,etal.,2025. PathOrchestra:acomprehensivefoundationmodelforcomputationalpathology with over 100 diverse clinical-grade tasks. npj Digit. Med. 8, 1–15

  59. [59]

    Qwen2.5 Technical Report

    Yang, A., Yang, B., Zhang, B., et al., 2024. Qwen2.5 technical report. arXiv preprint arXiv:2412.15115

  60. [60]

    Uncovering stromal cell fate genes and a novel risk stratification in UCEC by integrating single-cell RNA sequencing and multi-omics analysis

    Zhang, R., Ma, H., Yang, Y., Lv, S., et al., 2026a. Uncovering stromal cell fate genes and a novel risk stratification in UCEC by integrating single-cell RNA sequencing and multi-omics analysis. Genes & Diseases 13, 101460. doi:10.1016/j.gendis.2025.101460

  61. [61]

    OMGs: A multi-agent system supporting MDT decision-making across the ovarian tumour care continuum

    Zhang, Z., Wang, Z., Xu, J., et al., 2026b. OMGs: A multi-agent system supporting MDT decision-making across the ovarian tumour care continuum. arXiv preprint arXiv:2602.13793

  62. [62]

    Kg4diagnosis: A hierarchical multi-agent LLM framework with knowledge graph enhancement

    Zuo, K., et al., 2024. Kg4diagnosis: A hierarchical multi-agent LLM framework with knowledge graph enhancement. arXiv preprint arXiv:2404.14510 . Dai et al.:Preprint submitted to ElsevierPage 27 of 27