A structural causal framework for interventions on evolutionary accumulation models
Pith reviewed 2026-06-30 11:28 UTC · model grok-4.3
The pith
Evolutionary accumulation models yield valid intervention predictions when interpreted as structural causal models and interventions are applied via the do-operator as targeted parameter changes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By recasting EvAMs as structural causal models, interventions become well-defined operations that are realized, for most methods, by specific modifications to the model's parameters; these operations are distinct from simple conditioning on the absence of a mutation. Drawing on individual-level causal graphs that treat fitness as an explicit variable, the framework distinguishes killing interventions, which remove a clone, from inactivating interventions, which merely block further mutations in that clone. The same machinery supplies three explicit ranking objectives and an evaluation protocol for how well any EvAM orders intervention candidates.
What carries the argument
Pearl's do-operator applied to EvAM parameters under the modularity assumption, together with the distinction between killing and inactivating interventions obtained from individual-level causal DAGs that include fitness.
If this is right
- Each EvAM admits at least one equivalent implementation of an intervention as a parameter modification.
- Conditioning on the absence of a mutation produces different predictions from the do-operator intervention and is therefore not the correct procedure.
- Killing and inactivating interventions are distinguishable once fitness is represented explicitly and produce different downstream accumulation patterns.
- Three concrete ranking objectives allow systematic comparison of how well different EvAMs prioritize targets.
- The same intervention formalism applies to any fitted computational model that can be read as a structural causal model.
Where Pith is reading between the lines
- The framework could be used to compare the robustness of different EvAMs by measuring how much their intervention rankings change when the underlying data are generated from alternative fitness landscapes.
- Longitudinal sequencing data could serve as a direct test of whether the predicted post-intervention trajectories match observed shifts after a therapeutic change.
- The same do-operator treatment might be applied to accumulation models outside oncology, such as those describing the order of trait gains in phylogenetics or the spread of cultural practices.
Load-bearing premise
The modularity assumption holds so that an intervention on one mutation can be represented by changing only the parameters that directly involve that mutation without altering the rest of the model.
What would settle it
A concrete test would be to simulate data from an individual-level process with explicit fitness, fit an EvAM, apply the derived parameter change for a killing intervention, and check whether the predicted change in accumulation probabilities matches the change observed when the same killing rule is applied directly in the simulator.
read the original abstract
Evolutionary accumulation models (EvAMs), also known as cancer progression models (CPMs), infer dependencies in the order of accumulation of mutations during tumor progression from cross-sectional data. It has been suggested that EvAMs could be used to identify therapeutic targets, but there is no procedure in the literature for how to extract predictions under intervention from these models. A simple approach of conditioning on the absence of a mutation gives incorrect predictions. We address this gap by formalizing what "intervene" means for all currently available EvAM methods (OT, OncoBN, CBN, H-ESBCN, MHN, HyperHMM, HyperTraPS), using Pearl's do operator and conditional interventions. For each model, we show how to implement the intervention (in most cases as specific parameter modifications), identify equivalent implementation procedures, and analyze whether the modularity assumption -- required for the intervention to be well-defined -- is justified. Drawing on individual-level causal DAGs that make fitness an explicit variable, we distinguish two types of intervention (killing and inactivating) that are conflated in standard EvAM representations. Since the goal is to prioritize intervention candidates, we recast the problem as one of ranking: we define three intervention objectives and provide a protocol for evaluating how well EvAMs rank targets. Our framework is not specific to cancer or EvAMs; it applies wherever fitted computational models can be interpreted as structural causal models. Code available from https://github.com/rdiaz02/scm-interv-evams.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to provide a structural causal framework for interventions on evolutionary accumulation models (EvAMs) by formalizing the meaning of 'intervene' using Pearl's do-operator and conditional interventions for methods including OT, OncoBN, CBN, H-ESBCN, MHN, HyperHMM, and HyperTraPS. It shows implementations as parameter modifications, identifies equivalent procedures, analyzes the modularity assumption using individual-level causal DAGs that include fitness to distinguish killing and inactivating interventions, and defines three intervention objectives with a protocol for evaluating target ranking. The framework is presented as general for fitted computational models interpreted as SCMs, with code provided.
Significance. If the central claims hold, this work would enable principled extraction of intervention predictions from fitted EvAMs, addressing a gap where simple conditioning fails. The explicit procedures for each model and the distinction between intervention types add clarity. The availability of code is a positive feature that supports reproducibility. The approach could have impact in cancer modeling and potentially other fields using similar accumulation models, provided the causal embedding is sound.
major comments (2)
- [Abstract and section on individual-level causal DAGs] Abstract (paragraph on drawing on individual-level causal DAGs): The justification for applying Pearl's do-operator rests on the modularity assumption being valid for the fitted EvAMs. This is justified by invoking individual-level causal DAGs that treat fitness as an explicit variable. However, no explicit construction or verification is supplied showing that the standard EvAM parameterizations (which do not include fitness) embed into these DAGs while preserving the conditional independence structure used by the original models. This mapping is load-bearing for the claim that the described parameter modifications implement well-defined interventions rather than ad-hoc changes.
- [Section on implementation for each model] Section describing implementations for each EvAM method: While the abstract states that interventions are implemented via specific parameter modifications and that equivalent procedures are identified, the provided description supplies no full derivations, error bounds, or validation against ground-truth interventions. Without these, it is not possible to confirm that the procedures are equivalent across models or that they correctly correspond to the do-operator semantics.
minor comments (1)
- The GitHub link for code is a strength; ensure the repository contains at least one fully reproducible example that applies the intervention protocol to a fitted model and compares it to an alternative (e.g., conditioning) to allow readers to verify the claims.
Simulated Author's Rebuttal
We thank the referee for the constructive review and positive assessment of the work's potential significance. We address each major comment below, indicating planned revisions where appropriate to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract and section on individual-level causal DAGs] Abstract (paragraph on drawing on individual-level causal DAGs): The justification for applying Pearl's do-operator rests on the modularity assumption being valid for the fitted EvAMs. This is justified by invoking individual-level causal DAGs that treat fitness as an explicit variable. However, no explicit construction or verification is supplied showing that the standard EvAM parameterizations (which do not include fitness) embed into these DAGs while preserving the conditional independence structure used by the original models. This mapping is load-bearing for the claim that the described parameter modifications implement well-defined interventions rather than ad-hoc changes.
Authors: We agree that the embedding of standard EvAM parameterizations into individual-level causal DAGs (with fitness as an explicit variable) is central to justifying the modularity assumption and the applicability of the do-operator. The manuscript discusses this connection in the relevant section to distinguish killing versus inactivating interventions and to ground the parameter modifications. However, we acknowledge that a more explicit formal construction and verification of conditional independence preservation for the standard parameterizations is not fully detailed. We will add a dedicated subsection with the explicit mapping and independence verification for representative models (CBN, MHN, and HyperTraPS) in the revised version. revision: yes
-
Referee: [Section on implementation for each model] Section describing implementations for each EvAM method: While the abstract states that interventions are implemented via specific parameter modifications and that equivalent procedures are identified, the provided description supplies no full derivations, error bounds, or validation against ground-truth interventions. Without these, it is not possible to confirm that the procedures are equivalent across models or that they correctly correspond to the do-operator semantics.
Authors: The implementations are obtained by applying the do-operator (and conditional interventions) to each model's structural representation, with equivalent procedures identified for structurally similar models. The main text focuses on the resulting parameter modifications for brevity. We agree that full derivations, a validation study against ground-truth interventions on synthetic data, and explicit discussion of error bounds (or their absence under exact model assumptions) would improve rigor and verifiability. We will move detailed derivations to the supplement, add a validation subsection using simulated data, and clarify the exact (non-approximate) nature of the correspondence in the revised manuscript. revision: yes
Circularity Check
No significant circularity; external causal framework applied to EvAMs
full rationale
The paper formalizes interventions on existing EvAM methods by direct application of Pearl's do-operator and conditional interventions, with implementation via parameter modifications. It draws on external individual-level causal DAG concepts (with fitness as explicit variable) to distinguish killing vs. inactivating interventions and to examine modularity, without reducing any derived quantity to a fitted parameter or self-defined input by construction. No load-bearing step equates a prediction to its own inputs, invokes a uniqueness theorem from the same authors, or renames a known result. The derivation remains self-contained against the external benchmarks of Pearl causality and standard EvAM representations.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Modularity assumption is justified for the EvAMs under study
- domain assumption EvAMs can be interpreted as structural causal models
Reference graph
Works this paper leans on
-
[1]
For every genotype𝑥(that does not contain𝑔mutated), set𝑟 𝑥→𝑥 to𝑟 𝑥→𝑥+𝑔, and set 𝑟 𝑥→𝑥+𝑔 to 0 (i.e., genotypes that contain gene𝑔cannot be transitioned to, equivalent to havingtheirrowszeroed.) Thisyields𝑅 −𝑔 (theHyperHMMimplementationof𝑀 𝑚,−𝑔). 12
-
[2]
With𝑅 −𝑔 obtain ˆh−𝑔, the predicted hitting probabilities after intervening in gene𝑔
-
[3]
With𝑅 −𝑔 obtain ˆf 𝑣 −𝑔, the predicted distribution of genotypes after𝑣steps (𝑣taking all integer values from 0 to the number of loci), as per the usualˆf 𝑣 −𝑔 =f 0 𝑅𝑣 −𝑔, wheref 0 is the (row) vector(1,0,0, ....)
-
[4]
Equivalently (since in the original, unintervened data, each step produces exactly one mutation),𝑃(𝑉=𝑣)is the frequencyofsamplesthathaveundergone𝑣steps
Obtain the predicted population composition via the weighted sum ˆf−𝑔 = Í𝑣=number of loci 𝑣=0 ˆf 𝑣 −𝑔 𝑃(𝑉=𝑣).𝑃(𝑉=𝑣)is the empirical (observed) frequency, in the training sample, of observations with𝑣mutations. Equivalently (since in the original, unintervened data, each step produces exactly one mutation),𝑃(𝑉=𝑣)is the frequencyofsamplesthathaveundergone𝑣s...
2024
-
[5]
identify
Quantifying intervention effects: intervention objectives Section 3 discusses how to modify the models to obtain predictions under interventions. To properly assess if EvAMs can be used to identify therapeutic targets, we also need to specify what “identify” means, as different objectives can lead to different rankings of the same genes as targets. 13 Her...
2025
-
[6]
From the trueQ(section 4.1) (or, ifQis not available for the evolutionary regime considered, using simulations), generate a sample
-
[7]
Use this sample as input for each EvAM method
-
[8]
Modifying fitted EvAM models to predict the consequences of an intervention
From the output of each EvAM, obtain a modified EvAM after intervening on (making lethal the mutated allele of) gene𝑔. Let𝑀𝑚 denote the fitted model from method𝑚; 𝑀𝑚,−𝑔 isthemodificationof𝑀 𝑚thatresultsfromthekillinginterventionon𝑔(i.e.,after theinterventionthatmakesamutantingene𝑔alethalmutation). Howweobtain𝑀 𝑚,−𝑔 is explained in section“Modifying fitted...
-
[9]
Genotype pre- dictions and hitting probabilities from EvAMs and error model
From𝑀 𝑚,−𝑔, use the standard procedures for each method (Appendix,“Genotype pre- dictions and hitting probabilities from EvAMs and error model”, section C.6) to obtain the predicted hitting probabilities,ˆh𝑚,−𝑔, and the predicted distribution of genotypes, ˆf𝑚,−𝑔, after targeting gene𝑔. For example, in Fig. 3 this yields each of the rows after “None” in p...
2019
-
[10]
no causation without manipulation
Discussion Afterdifferentiatingbetweentwotypesofintervention(killingandinactivating)andexplaining why a naive approach to intervention leads to incorrect predictions for EvAMs, we have presentedaconceptualizationofinterventionsoneachofthecurrentlyavailableEvAMmethods 17 (OT, OncoBN, CBN, H-ESBCN, MHN, HyperHMM, HyperTraPS). We show both what an interventi...
2024
-
[11]
Acknowledgments Álvaro San Martín and Eric Macías Fasio, for literature searches, and initial discussion. MembersoftheStochasticBiologygroupattheUniversityofBergenfordiscussion. Supported bygrantsPID2024-156888OB-I00fundedbyMICIU/AEI/10.13039/501100011033/FEDER, EU and PID2019-111256RB-I00 funded by MCIN /AEI/10.13039/501100011033 to RDU. This work was su...
-
[12]
Code availability Code available fromhttps://github.com/rdiaz02/scm-interv-evams
-
[13]
Bibliography Aga, O. N. L., Brun, M., Dauda, K. A., Diaz-Uriarte, R., Giannakis, K., and Johnston, I. G. 2024. HyperTraPS-CT: Inference and prediction for accumulation pathways with flexible data and model structures.PLOS Computational Biology,20(9), e1012393. URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pc bi.1012393, DOI: https...
-
[14]
URL:https://jitc.bmj.com/content/11/3/e006056, DOI: https://doi.org/10.1136/jitc-2022-006056
Inactivation of interleukin-30 in colon cancer stem cells via CRISPR/Cas9 genome editing inhibits their oncogenicity and improves host survival.Journal for ImmunoTherapy of Cancer,11(3), e006056. URL:https://jitc.bmj.com/content/11/3/e006056, DOI: https://doi.org/10.1136/jitc-2022-006056. Desper, R., Jiang, F., Kallioniemi, O. P., Moch, H., Papadimitriou,...
-
[15]
URL:http://view.ncbi.nlm.nih.gov/pubmed/10223663
Inferringtreemodelsforoncogenesisfromcomparativegenomehybridizationdata.J ComputBiol,6(1),37–51. URL:http://view.ncbi.nlm.nih.gov/pubmed/10223663. Diaz-Uriarte, R. 2017. OncoSimulR: Genetic simulation with arbitrary epistasis and mutator genesinasexualpopulations.Bioinformatics,33(12),1898–1899. URL:https://academ ic.oup.com/bioinformatics/article/33/12/1...
-
[16]
URL:https://onlinelibrary.wiley.com/doi/ 10.1002/cso2.1027, DOI: https://doi.org/10.1002/cso2.1027
Oncogenetic network estimation with disjunctive Bayesian networks.Computational and Systems Oncology,1(2), e1027. URL:https://onlinelibrary.wiley.com/doi/ 10.1002/cso2.1027, DOI: https://doi.org/10.1002/cso2.1027. Norris, J. R. 1997.Markov Chains. Cambridge Series in Statistical and Probabilistic Math- ematics. Cambridge University Press, Cambridge. URL:h...
-
[17]
evolution only proceeds uphill in the fitnesslandscape,i.e.,itonlyevolvesbyfixingbeneficialmutations
URL:http://science.sciencemag.org/content/312/5770/111, DOI: https://doi.org/10.1126/science.1123539. Weissman, D. B., Desai, M. M., Fisher, D. S., and Feldman, M. W. 2009. The rate at which asexual populations cross fitness valleys.Theoretical Population Biology,75(4), 286–300. URL:http://dx.doi.org/10.1016/j.tpb.2009.02.006, DOI: https://doi.org/10.1016...
-
[18]
scaled transition rate matrix
nor the role of bulk sequencing (Diaz-Uriarte and Johnston, 2025). 27 A.2.1. Scaling the transition rate matrix We can scale the transition rate so that the process is faster or slower, or expressed in different units of time (for example, Gillespie 1984, shows expressions for the scaling of the transition rates so that time is in units of𝑁generations). W...
2025
-
[19]
Focal genotypes are all accessible genotypes in the true fitness landscape with a certain number of mutations (e.g., 3 to 5 mutations); we use𝑧to denote any one of the focal genotypes
-
[20]
If under SSWM, fromQ, the true transition rate matrix of the fitness landscape, obtain thetransitionmatrixcorrespondingtotheembeddeddiscrete-timechain;fromit,obtain the hitting (or first passage) probabilities (Norris, 1997; Privault, 2018),h, starting from the WT state (hitting probabilities can be readily obtained using, for example, the markovchainRpac...
1997
-
[21]
From it, obtain the hitting probability of genoypes under the intervention,h−𝑔.ℎ −𝑔 (𝑧)is the hitting probability of genotype𝑧when intervening on gene𝑔
Foreverygene𝑔,obtainQ −𝑔,thetransitionratematrixwheninterveningongene𝑔and, from it, the transition matrix corresponding to the embedded discrete-time chain. From it, obtain the hitting probability of genoypes under the intervention,h−𝑔.ℎ −𝑔 (𝑧)is the hitting probability of genotype𝑧when intervening on gene𝑔. (IfQ −𝑔 is not available for the evolutionary r...
-
[22]
no intervention
For each EvAM,𝑀𝑚: 4.1. From𝑀 𝑚,obtainthepredictedhittingprobabilityofgenotypesundernointerven- tion ˆh𝑚. ˆℎ𝑚(𝑧)is the hitting probability, for method𝑚, of genotype𝑧under no intervention. 4.2. Modifythemodelresultsundereachinterventiontoobtain𝑀 𝑚,−𝑔 and,fromit,the predicted hitting probability of genotypesˆh𝑚,−𝑔. ˆℎ𝑚,−𝑔 (𝑧)is the predicted hitting probabil...
-
[23]
FromQ,thetruetransitionratematrixofthefitnesslandscape,obtainf,thedistribution of genotypes, using the same time distribution as used when generating the samples (section 4.1). (As above, ifQis not available, we can instead use forward genetic simulations to estimatef.) 31 Fromf, obtain the true population mean number of mutations as𝐷= Í 𝑧 𝑛(𝑧)𝑓(𝑧), where...
-
[24]
For every𝑔, fromQ(or using forward genetic simulations on the modified fitness landscapes),obtainf −𝑔,andfromitthetruemeannumberofmutationsunderintervention on𝑔:𝐷 −𝑔 = Í 𝑧 𝑛(𝑧)𝑓 −𝑔 (𝑧)
-
[25]
no inter- vention
For each EvAM,𝑀𝑚: 3.1. From𝑀 𝑚, obtain the predicted distribution of genotypes under no intervention ˆf𝑚. ˆ𝑓𝑚(𝑧)is the predicted frequency, for method𝑚, of genotype𝑧under no intervention. From ˆf𝑚 obtain the predicted mean number of driver mutations, ˆ𝐷 𝑚 = Í 𝑧 𝑛(𝑧) ˆ𝑓𝑚(𝑧). 3.2. From𝑀 𝑚,−𝑔 obtain ˆf𝑚,−𝑔 and from this, the predicted mean number of driver m...
-
[26]
Obtain𝑓(𝑊𝑇), i.e.,𝑓(𝑧)when𝑧is the WT genotype
-
[27]
For every𝑔, obtain𝑓 −𝑔 (𝑊𝑇), the frequency of the WT genotype when intervening on gene𝑔
-
[28]
Obtain ˆ𝑓𝑚(𝑊𝑇)(predicted frequency of WT for model𝑚)
For each EvAM,𝑀𝑚: 3.1. Obtain ˆ𝑓𝑚(𝑊𝑇)(predicted frequency of WT for model𝑚). 3.2. For every𝑔, obtain ˆ𝑓𝑚,−𝑔 (𝑊𝑇)(predicted frequency of WT for model𝑚under intervention𝑔). 3.3. For each EvAM, compute the rank correlation between the vectors (𝑓(𝑊𝑇), 𝑓 −𝐴 (𝑊𝑇), 𝑓 −𝐵 (𝑊𝑇), . . .)and( ˆ𝑓(𝑊𝑇), ˆ𝑓−𝐴 (𝑊𝑇), ˆ𝑓−𝐵 (𝑊𝑇), . . .). 32 a) O mut Targets None A B C Truth C...
-
[29]
Identify the genotypes that become non-viable after the intervention: all those that have 𝑔
-
[30]
that appear in formerly viable genotypes only with𝑔
Find all genes,ℎ1, ℎ2, . . .that appear in formerly viable genotypes only with𝑔
-
[31]
MHN:graphinterventions
Remove, from the DAG, any edges that involve𝑔, ℎ1, ℎ2, . . .(i.e., any edges that involve 𝑔, ℎ 1, ℎ2, . . .as origin or destination). Code:Theabovealgorithmisinfunctionkill_gene_DAG(calledfromkill_gene)infile kill-gene-and-output-from-cpm.R. Code:Functionintervene_cpm_every_gene, in fileintervention.R, is the main in- tervention function, which callskill_...
2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.