Recognition: 1 theorem link
· Lean TheoremQuantifying Cross-Modal Interactions in Multimodal Glioma Survival Prediction via InterSHAP: Evidence for Additive Signal Integration
Pith reviewed 2026-05-13 23:41 UTC · model grok-4.3
The pith
Multimodal glioma survival models gain performance by adding image and RNA signals rather than learning their interactions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Adapting InterSHAP to survival models on TCGA-GBM and TCGA-LGG data (n=575), the study finds that architectures with rising C-index from 0.64 to 0.82 display cross-modal interaction shares falling from 4.8% to 3.0%. Variance decomposition shows consistent additive splits across all tested fusion strategies, with whole-slide images contributing approximately 40% and RNA-seq approximately 55%, leaving interaction at roughly 4%. Performance therefore stems from complementary signal aggregation instead of learned synergy between modalities.
What carries the argument
InterSHAP, the Shapley interaction index metric adapted to Cox proportional hazards models, which isolates the fraction of output variance due to cross-modal interactions between whole-slide image and RNA-seq features.
If this is right
- Performance gains arise from better aggregation of independent modality signals rather than from modeling interactions.
- Cross-modal interactions remain small and stable at around 4% of variance no matter which fusion architecture is used.
- Simpler fusion methods can reach high discrimination without added complexity for interaction terms.
- The metric supplies an auditing method to compare fusion strategies by separating additive from interactive contributions.
Where Pith is reading between the lines
- Designers could favor separate feature extractors over joint layers when building multimodal survival models.
- The same additive pattern may appear in other cancers or modality pairs and could be checked by reusing the metric.
- Low interaction reduces the need for joint training across sites, supporting privacy-preserving federated setups.
Load-bearing premise
The adaptation of InterSHAP from classification to Cox survival models measures genuine cross-modal interactions without distortion from architecture or preprocessing choices.
What would settle it
An experiment showing a high-C-index fusion architecture with interaction contribution above 5% or one where the interaction share changes sharply under different preprocessing would falsify the stability and small-size claims.
Figures
read the original abstract
Multimodal deep learning for cancer prognosis is commonly assumed to benefit from synergistic cross-modal interactions, yet this assumption has not been directly tested in survival prediction settings. This work adapts InterSHAP, a Shapley interaction index-based metric, from classification to Cox proportional hazards models and applies it to quantify cross-modal interactions in glioma survival prediction. Using TCGA-GBM and TCGA-LGG data (n=575), we evaluate four fusion architectures combining whole-slide image (WSI) and RNA-seq features. Our central finding is an inverse relationship between predictive performance and measured interaction: architectures achieving superior discrimination (C-index 0.64$\to$0.82) exhibit equivalent or lower cross-modal interaction (4.8\%$\to$3.0\%). Variance decomposition reveals stable additive contributions across all architectures (WSI${\approx}$40\%, RNA${\approx}$55\%, Interaction${\approx}$4\%), indicating that performance gains arise from complementary signal aggregation rather than learned synergy. These findings provide a practical model auditing tool for comparing fusion strategies, reframe the role of architectural complexity in multimodal fusion, and have implications for privacy-preserving federated deployment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This paper adapts the InterSHAP metric from classification to Cox proportional hazards models to quantify cross-modal interactions in multimodal glioma survival prediction. Using TCGA-GBM and TCGA-LGG data (n=575), it evaluates four fusion architectures combining WSI and RNA-seq features and reports an inverse relationship between predictive performance (C-index 0.64 to 0.82) and measured cross-modal interaction (4.8% to 3.0%), with variance decomposition showing stable additive contributions (WSI ≈40%, RNA ≈55%, interaction ≈4%) across architectures, concluding that performance gains arise from complementary additive signal integration rather than learned synergy.
Significance. If the InterSHAP adaptation to Cox models is validated as unbiased, the results would provide concrete evidence against the assumption that multimodal fusion benefits primarily from synergistic interactions in survival settings. This reframes architectural choices toward simpler additive models, supplies a practical auditing metric for fusion strategies, and carries implications for efficient deployment including privacy-preserving federated scenarios.
major comments (2)
- [Methods] Methods section: The adaptation replaces classification logits with the Cox partial likelihood or linear predictor for the Shapley interaction index, but the manuscript supplies no simulation studies on synthetic censored data with known interaction strengths, no comparison to alternative interaction metrics, and no analysis of potential bias from censoring rates or baseline hazard estimation; this is load-bearing because the central claim of an inverse performance-interaction relationship and the 3–4.8% interaction range rest entirely on the metric's fidelity.
- [Results] Results section (referenced via abstract ranges): The reported C-index values and interaction percentages are given as point estimates without error bars, cross-validation standard deviations, or statistical tests for differences across the four architectures, so it is impossible to determine whether the claimed inverse relationship (higher C-index with lower interaction) is robust or could be explained by sampling variability.
minor comments (2)
- [Abstract] Abstract: The cohort size n=575 is stated but the breakdown between TCGA-GBM and TCGA-LGG cases, the censoring rate, and the exact train/validation/test splits are not provided, which would aid reproducibility of the variance decomposition.
- [Discussion] Discussion: The claim that lower interaction aids privacy-preserving federated deployment is asserted without any supporting calculation or reference to how interaction strength correlates with information leakage in multimodal settings.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive review. We address each major comment below and will revise the manuscript to incorporate additional validation and statistical reporting.
read point-by-point responses
-
Referee: [Methods] Methods section: The adaptation replaces classification logits with the Cox partial likelihood or linear predictor for the Shapley interaction index, but the manuscript supplies no simulation studies on synthetic censored data with known interaction strengths, no comparison to alternative interaction metrics, and no analysis of potential bias from censoring rates or baseline hazard estimation; this is load-bearing because the central claim of an inverse performance-interaction relationship and the 3–4.8% interaction range rest entirely on the metric's fidelity.
Authors: We acknowledge that explicit validation simulations for the Cox adaptation are absent from the current manuscript. In the revised version we will add simulation experiments on synthetic censored survival data with controlled ground-truth interaction strengths, vary censoring rates, and compare the adapted InterSHAP against alternative interaction metrics. We will also report sensitivity to baseline hazard estimation. These additions will directly substantiate the metric's fidelity and the reported interaction range. revision: yes
-
Referee: [Results] Results section (referenced via abstract ranges): The reported C-index values and interaction percentages are given as point estimates without error bars, cross-validation standard deviations, or statistical tests for differences across the four architectures, so it is impossible to determine whether the claimed inverse relationship (higher C-index with lower interaction) is robust or could be explained by sampling variability.
Authors: We agree that uncertainty measures and statistical testing are required. The revised manuscript will report C-index and interaction values together with cross-validation standard deviations, include error bars on all relevant figures, and add statistical comparisons (e.g., paired tests with multiple-comparison correction) across the four architectures. These changes will allow readers to assess the robustness of the observed inverse relationship. revision: yes
Circularity Check
No circularity: InterSHAP-derived interactions computed independently of C-index
full rationale
The paper adapts InterSHAP to Cox proportional hazards models and applies the metric to trained fusion architectures to obtain interaction percentages and variance decomposition. These quantities are direct outputs of the Shapley-based computation on model predictions rather than parameters fitted to or defined by the reported C-index values. No equation equates interaction strength to discrimination performance, and the observed inverse relationship is an empirical result from separate measurements. The derivation relies on external metric adaptation and data application without self-definitional reduction, fitted-input renaming, or load-bearing self-citation chains.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption InterSHAP can be validly adapted from classification to Cox proportional hazards models without introducing systematic bias in interaction estimates
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
Variance decomposition reveals stable additive contributions across all architectures (WSI≈40%, RNA≈55%, Interaction≈4%)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Communications Medicine3(1), 44 (2023)
Steyaert, S., Qiu, Y.L., Zheng, Y., Mukherjee, P., Vogel, H., Gevaert, O.: Multimodal deep learning to predict prognosis in adult and pediatric brain tumors. Communications Medicine3(1), 44 (2023). Nature Publishing Group
work page 2023
-
[2]
Nature Communications14, 4122 (2023)
Zheng, Y., Carrillo-Perez, F., Pizurica, M., Heiland, D.H., Gevaert, O.: Spatial cellular architecture predicts prognosis in glioblastoma. Nature Communications14, 4122 (2023). Nature Publishing Group
work page 2023
-
[3]
Bakas, S., Reyes, M., Jakab, A., Bauer, S., Rempfler, M., Crimi, A., Shinohara, R.T., Berger, C., Ha, S.M., Rozycki, M., et al.: Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the BRATS challenge. arXiv preprint arXiv:1811.02629 (2018)
work page Pith review arXiv 2018
-
[4]
Proceedings of the National Academy of Sciences115(13), E2970–E2979 (2018)
Mobadersany, P., Yousefi, S., Amgad, M., Gutman, D.A., Barnholtz-Sloan, J.S., Velazquez Vega, J.E., Brat, D.J., Cooper, L.A.D.: Predicting cancer outcomes from histology and genomics using convolutional networks. Proceedings of the National Academy of Sciences115(13), E2970–E2979 (2018). National Academy of Sciences
work page 2018
-
[5]
In: Proceedings of the 39th AAAI Conference on Artificial Intelligence (AAAI-25), pp
Wenderoth, L., Hemker, K., Simidjievski, N., Jamnik, M.: Measuring cross-modal interactions in multimodal models. In: Proceedings of the 39th AAAI Conference on Artificial Intelligence (AAAI-25), pp. 21501–21509. AAAI Press (2025)
work page 2025
-
[6]
International Journal of Game Theory28(4), 547–565 (1999)
Grabisch, M., Roubens, M.: An axiomatic approach to the concept of interaction among players in cooperative games. International Journal of Game Theory28(4), 547–565 (1999). Springer
work page 1999
-
[7]
In: Advances in Neural Information Processing Systems30, pp
Lundberg, S.M., Lee, S.-I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems30, pp. 4765–4774. Curran Associates (2017)
work page 2017
-
[8]
Lipton, Z.C.: The mythos of model interpretability. Queue16(3), 31–57 (2018). ACM
work page 2018
-
[9]
Nature Machine Intelligence1(5), 206–215 (2019)
Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence1(5), 206–215 (2019). Nature Publishing Group
work page 2019
-
[10]
ACM Transactions on Knowledge Discovery from Data6(4), Article 15, 1–21 (2012)
Kaufman, S., Rosset, S., Perlich, C., Stitelman, O.: Leakage in data mining: Formulation, detection, and avoidance. ACM Transactions on Knowledge Discovery from Data6(4), Article 15, 1–21 (2012). ACM
work page 2012
-
[11]
Journal of the Royal Statistical Society: Series B (Methodological)34(2), 187–220 (1972)
Cox, D.R.: Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological)34(2), 187–220 (1972). Wiley
work page 1972
-
[12]
In: Proceedings of the 35th International Conference on Machine Learning (ICML), pp
Ilse, M., Tomczak, J., Welling, M.: Attention-based deep multiple instance learning. In: Proceedings of the 35th International Conference on Machine Learning (ICML), pp. 2127–2136. PMLR (2018)
work page 2018
-
[13]
Nature Biomedical Engineering 5, 555–570 (2021)
Lu, M.Y., Williamson, D.F.K., Chen, T.Y., Chen, R.J., Barbieri, M., Mahmood, F.: Data-efficient and weakly supervised computational pathology on whole-slide images. Nature Biomedical Engineering 5, 555–570 (2021). Nature Publishing Group
work page 2021
-
[14]
McMahan, H.B., Moore, E., Ramage, D., Hampson, S., Agüera y Arcas, B.: Communication-efficient learning of deep networks from decentralized data. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 1273–1282. PMLR (2017)
work page 2017
-
[15]
doi:10.7937/K9/TCIA.2016.RNYFUYE9
The Cancer Imaging Archive: TCGA-GBM data collection (2016). doi:10.7937/K9/TCIA.2016.RNYFUYE9
-
[16]
doi:10.7937/K9/TCIA.2016.L4LTD3TK
The Cancer Imaging Archive: TCGA-LGG data collection (2016). doi:10.7937/K9/TCIA.2016.L4LTD3TK
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.