pith. sign in

arxiv: 2605.26376 · v1 · pith:OR5TKGAFnew · submitted 2026-05-25 · 💻 cs.CV · cs.AI· cs.LG

BioFact-MoE: Biologically Factorized Mixture of Experts for Vision-Language Prognostic Modeling in Hepatocellular Carcinoma

Pith reviewed 2026-06-29 22:18 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG
keywords hepatocellular carcinomasurvival predictionmixture of expertsvision-language modelmultimodal prognosisbiological factorizationMRI radiology reportsphenotype stratification
0
0 comments X

The pith

A biologically factorized mixture-of-experts model separates liver and tumor factors from MRI and reports to improve hepatocellular carcinoma survival prediction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing vision-language models for HCC prognosis learn one entangled representation that mixes hepatic reserve and tumor factors, which reduces both accuracy and the ability to link predictions to distinct biology. BioFact-MoE inserts biologically supervised experts inside a residual mixture-of-experts survival network so that one expert pathway learns liver-related features and another learns tumor-related features. On a cohort of 588 patients pretrained on over 4,500 image-report pairs, the model raises 12-, 18-, and 24-month AUCs to 75.33 percent, 75.85 percent, and 73.96 percent while producing gated weights that stratify phenotypes and embeddings that correlate selectively with liver-function or tumor-burden markers without any direct supervision on those markers.

Core claim

BioFact-MoE explicitly decomposes liver and tumor factors via biologically supervised experts within a residual MoE survival architecture; the resulting model improves scalar survival prediction across time horizons and yields gated expert weights and latent embeddings whose selective associations with clinical markers arise from the factorization.

What carries the argument

Biologically factorized Mixture of Experts (MoE) with biologically supervised experts inside a residual MoE survival architecture that decomposes hepatic and tumor latent factors from vision-language inputs.

If this is right

  • Gated expert weights produce phenotype-aware risk groups whose survival curves differ by treatment history.
  • Hepatic embeddings correlate with liver-function markers and tumor embeddings correlate with tumor-burden markers at p less than 0.05 without explicit supervision on those markers.
  • The same architecture yields higher AUCs than standard vision-language or non-factorized MoE baselines at every tested horizon.
  • Pathway-informed gating reveals treatment-associated survival heterogeneity in held-out validation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the factorization holds, the same supervised-expert pattern could be tested on other multimodal cancer datasets where two dominant biological axes drive outcome.
  • The selective marker associations suggest the model could be used to flag patients whose risk is driven more by liver decompensation than by tumor progression, an angle not directly tested in the paper.
  • Because the experts remain inside a residual MoE, the architecture may tolerate addition of further supervised pathways for additional biological axes without retraining the entire network.

Load-bearing premise

The biological supervision on the experts is sufficient to force cleanly separable liver and tumor latent factors rather than simply capturing dataset-specific correlations.

What would settle it

Remove the biological supervision from the expert pathways, retrain on the same data, and check whether the selective correlations between embeddings and liver-function versus tumor-burden markers disappear while prediction AUCs fall to baseline levels.

Figures

Figures reproduced from arXiv: 2605.26376 by Annabella Shewarega, James S. Duncan, Julius Chapiro, Junlin Yang, Lawrence H. Staib, Nicha C. Dvornek, Peiyu Duan, Tian Yu, Yuexi Du.

Figure 1
Figure 1. Figure 1: BioFact-MoE framework. Stage 1: LLM-guided report decomposition super￾vises three pathway-specific LoRA adapters via contrastive pretraining with anatomical patch masking. Stage 2: Frozen pathway encoders are integrated by a residual MoE survival head with adaptive gating for Cox-based survival prediction. 2 Method 2.1 Stage 1: Pathway-Specific Biological Factorized Pretraining BioFact-MoE framework operat… view at source ↗
Figure 2
Figure 2. Figure 2: Phenotype-Aware Stratification: Beyond scalar risk, gate weights stratify pa￾tients by dominant biological axis. Among patients receiving the same TACE treat￾ment, liver-driven and tumor-driven subgroups show significantly different survival trajectories, capturing heterogeneity invisible to clinical staging alone [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: In an exploratory treatment analysis, patients identified as liver low-risk by the hepatic pathway showed the greatest predicted benefit from TACE, consistent with clinical expectations. 3.5 Ablation Studies [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
read the original abstract

Hepatocellular carcinoma (HCC) is biologically heterogeneous, shaped by the interplay between hepatic functional reserve and tumor-related oncologic factors; thus, similar survival outcomes may reflect fundamentally different underlying biological processes. Prognostic modeling in HCC is informed by rich multimodal information from multiparametric MRI and radiology reports from routine clinical practice. Existing prognostic vision-language models (VLMs) learn a single entangled latent representation that blends hepatic and tumor-related factors, limiting both accuracy and biological interpretability. We present BioFact-MoE, a biologically factorized Mixture of Experts (MoE) framework that explicitly decomposes liver and tumor factors via biologically supervised experts within a residual MoE survival architecture. On a HCC cohort of N=588 patients (pretrained on 4,582 3D MRI image-report pairs), BioFact-MoE consistently improves survival prediction over all baselines across time horizons, achieving 12-, 18-, and 24-month AUCs of 75.33%, 75.85%, and 73.96%. Beyond scalar risk prediction, gated expert weights enable phenotype-aware risk stratification. Pathway-informed gating uncovers clinically meaningful treatment-associated survival heterogeneity. In held-out validation, hepatic and tumor embeddings show selective associations with liver function and tumor burden markers, respectively (p<0.05), without supervision. The code is available at https://github.com/jy-639/BioFact-MoE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces BioFact-MoE, a residual Mixture-of-Experts architecture for vision-language survival modeling in HCC. Biologically supervised experts are used to explicitly decompose hepatic functional reserve and tumor-related factors from multiparametric MRI and radiology reports. On a cohort of N=588 patients (pretrained on 4,582 image-report pairs), the model reports improved time-to-event AUCs (12-month 75.33%, 18-month 75.85%, 24-month 73.96%) over baselines, phenotype-aware risk stratification via gated expert weights, and selective associations of the resulting hepatic and tumor embeddings with liver-function and tumor-burden markers (p<0.05) in held-out data without direct supervision on those markers. Code is released.

Significance. If the reported factorization is shown to be robust and independent of the supervision signals, the work would supply a concrete template for disentangling biologically distinct pathways inside multimodal prognostic models for heterogeneous cancers. The public code release is a clear strength that supports reproducibility.

major comments (2)
  1. [Abstract and §3] Abstract and §3 (Model): The statement that hepatic and tumor embeddings exhibit selective marker associations 'without supervision' is load-bearing for the central factorization claim. The manuscript must specify the exact supervision signals, loss terms, and pathway annotations used to train the liver and tumor experts; without this, it is impossible to rule out that the reported p<0.05 associations arise by construction from supervision signals that already correlate with the tested clinical markers.
  2. [§4 and §5] §4 (Experiments) and §5 (Results): No ablation is presented that removes biological supervision while retaining the same expert count, residual MoE structure, and gating mechanism. Such a control is required to demonstrate that performance gains and embedding selectivity are attributable to the factorization rather than to the supervision itself or to dataset-specific correlations in the N=588 cohort.
minor comments (2)
  1. [Abstract] Abstract: AUC values are reported to two decimal places without accompanying standard deviations, number of runs, or confidence intervals; adding these would allow readers to assess stability of the reported gains.
  2. [§5] Figure captions and §5: The description of 'pathway-informed gating' should explicitly state which clinical variables or annotations are used as pathway inputs so that the stratification analysis can be reproduced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful comments, which highlight key aspects needed to strengthen the central claims of the manuscript. We address each major comment below and will revise the paper to provide the requested clarifications and experiments.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (Model): The statement that hepatic and tumor embeddings exhibit selective marker associations 'without supervision' is load-bearing for the central factorization claim. The manuscript must specify the exact supervision signals, loss terms, and pathway annotations used to train the liver and tumor experts; without this, it is impossible to rule out that the reported p<0.05 associations arise by construction from supervision signals that already correlate with the tested clinical markers.

    Authors: We agree that explicit specification of the supervision is necessary to support the factorization claim. The biological supervision for the experts relies on pathway annotations automatically extracted from the pretraining radiology reports (4,582 pairs) and aligned to MRI features via dedicated contrastive and reconstruction loss terms: the liver expert uses hepatic functional reserve pathway labels (e.g., fibrosis and cirrhosis indicators), while the tumor expert uses oncologic pathway labels (e.g., vascular invasion and nodule descriptors). These annotations are distinct from the specific held-out clinical markers (liver function tests and tumor burden metrics) used for the post-hoc p<0.05 association tests in the N=588 cohort. In the revision we will expand §3 with a dedicated subsection and table that lists the exact annotations, loss formulations, and gating mechanism, explicitly noting that the tested markers were never part of training supervision. This will demonstrate that the selective associations are emergent. revision: yes

  2. Referee: [§4 and §5] §4 (Experiments) and §5 (Results): No ablation is presented that removes biological supervision while retaining the same expert count, residual MoE structure, and gating mechanism. Such a control is required to demonstrate that performance gains and embedding selectivity are attributable to the factorization rather than to the supervision itself or to dataset-specific correlations in the N=588 cohort.

    Authors: We concur that an ablation isolating the effect of biological supervision is important. The current baselines include non-MoE VLMs and a standard residual MoE without pathway supervision, but we will add the requested control: a 'non-biological' variant that retains the identical expert count, residual connections, and gating network but replaces the pathway-specific losses with generic reconstruction objectives. We will report the resulting 12-/18-/24-month AUCs and embedding-marker correlations on the same splits. This experiment will be included in the revised §4 and §5 to quantify the incremental benefit of the biological factorization. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained with no reducible steps shown

full rationale

The abstract and provided text contain no equations, training details, or derivation chain that can be inspected for reduction to inputs. Claims such as 'without supervision' for marker associations and 'biologically supervised experts' are stated at a high level but do not exhibit self-definitional structure, fitted inputs renamed as predictions, or load-bearing self-citations. No specific reduction (e.g., Eq. X = Eq. Y by construction) is quotable. The central performance claims rest on empirical results in an N=588 cohort with external validation, which is independent of the factorization description. This is the expected honest non-finding when no load-bearing circular step is exhibited.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Only the abstract is available, so the ledger is populated from stated claims rather than from explicit equations or methods sections. The central claim rests on the existence of separable biological factors that can be supervised into distinct experts and on the representativeness of the N=588 cohort.

axioms (1)
  • domain assumption The HCC patient cohort (N=588, pretrained on 4,582 image-report pairs) is representative of the target clinical population for survival modeling.
    Stated cohort size and pretraining scale in abstract; no further justification supplied.
invented entities (1)
  • Biologically supervised experts (liver and tumor pathways) no independent evidence
    purpose: Explicit decomposition of hepatic functional reserve versus tumor-related oncologic factors inside the MoE survival model.
    Introduced in abstract as the core architectural novelty; no independent evidence of separability provided beyond the reported embedding correlations.

pith-pipeline@v0.9.1-grok · 5826 in / 1505 out tokens · 24303 ms · 2026-06-29T22:18:06.389350+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

19 extracted references · 7 canonical work pages · 2 internal anchors

  1. [1]

    Radiology314(2), e241613 (2025)

    AkinciD’Antonoli,T.,Berger,L.K.,Indrakanti,A.K.,Vishwanathan,N.,Weiss,J., Jung, M., Berkarda, Z., Rau, A., Reisert, M., Küstner, T., et al.: Totalsegmentator mri: robust sequence-independent segmentation of multiple anatomic structures in mri. Radiology314(2), e241613 (2025)

  2. [2]

    In: Proceedings of the 62nd Annual Meeting of the AssociationforComputationalLinguistics(Volume1:LongPapers).pp.1932–1945 (2024)

    Dou, S., Zhou, E., Liu, Y., Gao, S., Shen, W., Xiong, L., Zhou, Y., Wang, X., Xi, Z., Fan, X., et al.: Loramoe: Alleviating world knowledge forgetting in large language models via moe-style plugin. In: Proceedings of the 62nd Annual Meeting of the AssociationforComputationalLinguistics(Volume1:LongPapers).pp.1932–1945 (2024)

  3. [3]

    In: International Con- ference on Information Processing in Medical Imaging

    Du, Y., Onofrey, J.A., Dvornek, N.C.: Multi-view and multi-scale alignment for contrastive language-image pre-training in mammography. In: International Con- ference on Information Processing in Medical Imaging. pp. 247–262. Springer (2025)

  4. [4]

    Foglia, B., Turato, C., Cannito, S.: Hepatocellular carcinoma: latest research in pathogenesis, detection and treatment (2023)

  5. [5]

    Iclr1(2), 3 (2022)

    Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W., et al.: Lora: Low-rank adaptation of large language models. Iclr1(2), 3 (2022)

  6. [6]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Huang, S.C., Shen, L., Lungren, M.P., Yeung, S.: Gloria: A multimodal global-local representation learning framework for label-efficient medical image recognition. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 3942–3951 (2021)

  7. [7]

    Nature methods18(2), 203–211 (2021)

    Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods18(2), 203–211 (2021)

  8. [8]

    BMC medical research methodology18(1), 24 (2018)

    Katzman, J.L., Shaham, U., Cloninger, A., Bates, J., Jiang, T., Kluger, Y.: Deep- surv:personalizedtreatmentrecommendersystemusingacoxproportionalhazards deep neural network. BMC medical research methodology18(1), 24 (2018)

  9. [9]

    arXiv preprint arXiv:2404.15159 , year=

    Li, D., Ma, Y., Wang, N., Ye, Z., Cheng, Z., Tang, Y., Zhang, Y., Duan, L., Zuo, J., Yang, C., et al.: Mixlora: Enhancing large language models fine-tuning with lora-based mixture of experts. arXiv preprint arXiv:2404.15159 (2024)

  10. [10]

    In: International Conference on Medical Image Computing and Computer-Assisted Intervention

    Li, Y., Lai, H., Zhou, X., Ming, S., Ma, W., Wei, W., Zhou, S.K.: More perfor- mant and scalable: Rethinking contrastive vision-language pre-training of radiol- ogy in the llm era. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 348–357. Springer (2025)

  11. [11]

    Journal of hepatology64(3), 601–608 (2016)

    Liu, P.H., Hsu, C.Y., Hsia, C.Y., Lee, Y.H., Su, C.W., Huang, Y.H., Lee, F.Y., Lin, H.C., Huo, T.I.: Prognosis of hepatocellular carcinoma: assessment of eleven staging systems. Journal of hepatology64(3), 601–608 (2016)

  12. [12]

    In: International conference on machine learning

    Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International conference on machine learning. pp. 8748–8763. PmLR (2021) 10 F. Author et al

  13. [13]

    Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

    Shazeer, N., Mirhoseini, A., Maziarz, K., Davis, A., Le, Q., Hinton, G., Dean, J.: Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538 (2017)

  14. [14]

    arXiv preprint arXiv:2511.21889 (2025)

    Willis, R., Bakos, J.: Exploring fusion strategies for multimodal vision-language systems. arXiv preprint arXiv:2511.21889 (2025)

  15. [15]

    Nature communications16(1), 3504 (2025)

    Wu, Y., Liu, Y., Yang, Y., Yao, M.S., Yang, W., Shi, X., Yang, L., Li, D., Liu, Y., Yin, S., et al.: A concept-based interpretable model for the diagnosis of choroid neoplasias using multimodal data. Nature communications16(1), 3504 (2025)

  16. [16]

    Qwen3 Technical Report

    Yang, A., Li, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Gao, C., Huang, C., Lv, C., et al.: Qwen3 technical report. arXiv preprint arXiv:2505.09388 (2025)

  17. [17]

    arXiv preprint arXiv:2309.05444 , year=

    Zadouri, T., Üstün, A., Ahmadian, A., Ermiş, B., Locatelli, A., Hooker, S.: Pushing mixture of experts to the limit: Extremely parameter efficient moe for instruction tuning. arXiv preprint arXiv:2309.05444 (2023)

  18. [18]

    arXiv preprint arXiv:2601.06847 (2026)

    Zhang, M., Wu, X., Luo, H., Wang, F., Lv, Y.: Medground: Bridging the evidence gap in medical vision-language models with verified grounding data. arXiv preprint arXiv:2601.06847 (2026)

  19. [19]

    NEJM AI , year=

    Zhang, S., Xu, Y., Usuyama, N., Xu, H., Bagga, J., Tinn, R., Preston, S., Rao, R., Wei, M., Valluri, N., Wong, C., Tupini, A., Wang, Y., Mazzola, M., Shukla, S., Liden, L., Gao, J., Crabtree, A., Piening, B., Bifulco, C., Lungren, M.P., Naumann, T., Wang, S., Poon, H.: A multimodal biomedical foundation model trained from fifteen million image–text pairs....