A structural causal framework for interventions on evolutionary accumulation models

Iain G. Johnston; \'I\~nigo R\'ios-Arroyo; Ramon Diaz-Uriarte

arxiv: 2606.12597 · v2 · pith:BVZHOK7Lnew · submitted 2026-06-10 · 🧬 q-bio.QM · q-bio.PE

A structural causal framework for interventions on evolutionary accumulation models

Ramon Diaz-Uriarte , \'I\~nigo R\'ios-Arroyo , Iain G. Johnston This is my paper

Pith reviewed 2026-06-30 11:28 UTC · model grok-4.3

classification 🧬 q-bio.QM q-bio.PE

keywords evolutionary accumulation modelsstructural causal modelsinterventionscancer progression modelsmutation accumulationPearl's do-operatortherapeutic target ranking

0 comments

The pith

Evolutionary accumulation models yield valid intervention predictions when interpreted as structural causal models and interventions are applied via the do-operator as targeted parameter changes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to define interventions on EvAMs so that predictions about mutation accumulation under change remain consistent with the model's structure. It does this by treating each EvAM as a structural causal model and using Pearl's do-operator to cut the influence of a chosen mutation while leaving other dependencies intact. For every current method the authors give the exact parameter adjustments that realize the intervention and show which adjustments are equivalent. They also separate two distinct kinds of intervention, killing versus inactivating, that standard EvAM representations normally mix together. The resulting framework turns the models into tools that can rank candidate targets by how strongly each would alter the remaining accumulation process.

Core claim

By recasting EvAMs as structural causal models, interventions become well-defined operations that are realized, for most methods, by specific modifications to the model's parameters; these operations are distinct from simple conditioning on the absence of a mutation. Drawing on individual-level causal graphs that treat fitness as an explicit variable, the framework distinguishes killing interventions, which remove a clone, from inactivating interventions, which merely block further mutations in that clone. The same machinery supplies three explicit ranking objectives and an evaluation protocol for how well any EvAM orders intervention candidates.

What carries the argument

Pearl's do-operator applied to EvAM parameters under the modularity assumption, together with the distinction between killing and inactivating interventions obtained from individual-level causal DAGs that include fitness.

If this is right

Each EvAM admits at least one equivalent implementation of an intervention as a parameter modification.
Conditioning on the absence of a mutation produces different predictions from the do-operator intervention and is therefore not the correct procedure.
Killing and inactivating interventions are distinguishable once fitness is represented explicitly and produce different downstream accumulation patterns.
Three concrete ranking objectives allow systematic comparison of how well different EvAMs prioritize targets.
The same intervention formalism applies to any fitted computational model that can be read as a structural causal model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could be used to compare the robustness of different EvAMs by measuring how much their intervention rankings change when the underlying data are generated from alternative fitness landscapes.
Longitudinal sequencing data could serve as a direct test of whether the predicted post-intervention trajectories match observed shifts after a therapeutic change.
The same do-operator treatment might be applied to accumulation models outside oncology, such as those describing the order of trait gains in phylogenetics or the spread of cultural practices.

Load-bearing premise

The modularity assumption holds so that an intervention on one mutation can be represented by changing only the parameters that directly involve that mutation without altering the rest of the model.

What would settle it

A concrete test would be to simulate data from an individual-level process with explicit fitness, fit an EvAM, apply the derived parameter change for a killing intervention, and check whether the predicted change in accumulation probabilities matches the change observed when the same killing rule is applied directly in the simulator.

read the original abstract

Evolutionary accumulation models (EvAMs), also known as cancer progression models (CPMs), infer dependencies in the order of accumulation of mutations during tumor progression from cross-sectional data. It has been suggested that EvAMs could be used to identify therapeutic targets, but there is no procedure in the literature for how to extract predictions under intervention from these models. A simple approach of conditioning on the absence of a mutation gives incorrect predictions. We address this gap by formalizing what "intervene" means for all currently available EvAM methods (OT, OncoBN, CBN, H-ESBCN, MHN, HyperHMM, HyperTraPS), using Pearl's do operator and conditional interventions. For each model, we show how to implement the intervention (in most cases as specific parameter modifications), identify equivalent implementation procedures, and analyze whether the modularity assumption -- required for the intervention to be well-defined -- is justified. Drawing on individual-level causal DAGs that make fitness an explicit variable, we distinguish two types of intervention (killing and inactivating) that are conflated in standard EvAM representations. Since the goal is to prioritize intervention candidates, we recast the problem as one of ranking: we define three intervention objectives and provide a protocol for evaluating how well EvAMs rank targets. Our framework is not specific to cancer or EvAMs; it applies wherever fitted computational models can be interpreted as structural causal models. Code available from https://github.com/rdiaz02/scm-interv-evams.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives the first explicit mapping from do-interventions to parameter changes across all major EvAM methods and separates killing from inactivating effects, but the modularity claim rests on an embedding into fitness DAGs that is asserted rather than derived from the fitted models.

read the letter

The useful part is the concrete procedure: for each EvAM (CBN, MHN, OncoBN, etc.) they show how to implement a do-intervention as a change to the model's parameters, plus three ranking objectives for target prioritization. That fills the gap they identify and the code is available. They also correctly note that simple conditioning fails and that fitness must be treated explicitly to separate the two intervention types.

The soft spot is the modularity step. The argument invokes individual-level DAGs with fitness as a node to justify that an intervention on one mutation does not directly affect other mechanisms. Standard EvAMs do not include fitness, so the parameter-change rules only correspond to a true do-intervention if the embedding preserves the relevant conditional independencies. The abstract says they analyze whether the assumption is justified, but without the explicit mapping shown, it is hard to check whether the justification carries through for the fitted parameters.

This is aimed at people already using EvAMs for oncology target ranking who want a defensible way to move from fitted model to intervention prediction. The framework is general enough to apply beyond cancer. It is clear enough on its own terms to deserve referee time; the main question for reviewers will be whether the DAG embedding step is made rigorous enough in the full text.

Referee Report

2 major / 1 minor

Summary. The paper claims to provide a structural causal framework for interventions on evolutionary accumulation models (EvAMs) by formalizing the meaning of 'intervene' using Pearl's do-operator and conditional interventions for methods including OT, OncoBN, CBN, H-ESBCN, MHN, HyperHMM, and HyperTraPS. It shows implementations as parameter modifications, identifies equivalent procedures, analyzes the modularity assumption using individual-level causal DAGs that include fitness to distinguish killing and inactivating interventions, and defines three intervention objectives with a protocol for evaluating target ranking. The framework is presented as general for fitted computational models interpreted as SCMs, with code provided.

Significance. If the central claims hold, this work would enable principled extraction of intervention predictions from fitted EvAMs, addressing a gap where simple conditioning fails. The explicit procedures for each model and the distinction between intervention types add clarity. The availability of code is a positive feature that supports reproducibility. The approach could have impact in cancer modeling and potentially other fields using similar accumulation models, provided the causal embedding is sound.

major comments (2)

[Abstract and section on individual-level causal DAGs] Abstract (paragraph on drawing on individual-level causal DAGs): The justification for applying Pearl's do-operator rests on the modularity assumption being valid for the fitted EvAMs. This is justified by invoking individual-level causal DAGs that treat fitness as an explicit variable. However, no explicit construction or verification is supplied showing that the standard EvAM parameterizations (which do not include fitness) embed into these DAGs while preserving the conditional independence structure used by the original models. This mapping is load-bearing for the claim that the described parameter modifications implement well-defined interventions rather than ad-hoc changes.
[Section on implementation for each model] Section describing implementations for each EvAM method: While the abstract states that interventions are implemented via specific parameter modifications and that equivalent procedures are identified, the provided description supplies no full derivations, error bounds, or validation against ground-truth interventions. Without these, it is not possible to confirm that the procedures are equivalent across models or that they correctly correspond to the do-operator semantics.

minor comments (1)

The GitHub link for code is a strength; ensure the repository contains at least one fully reproducible example that applies the intervention protocol to a fitted model and compares it to an alternative (e.g., conditioning) to allow readers to verify the claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and positive assessment of the work's potential significance. We address each major comment below, indicating planned revisions where appropriate to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and section on individual-level causal DAGs] Abstract (paragraph on drawing on individual-level causal DAGs): The justification for applying Pearl's do-operator rests on the modularity assumption being valid for the fitted EvAMs. This is justified by invoking individual-level causal DAGs that treat fitness as an explicit variable. However, no explicit construction or verification is supplied showing that the standard EvAM parameterizations (which do not include fitness) embed into these DAGs while preserving the conditional independence structure used by the original models. This mapping is load-bearing for the claim that the described parameter modifications implement well-defined interventions rather than ad-hoc changes.

Authors: We agree that the embedding of standard EvAM parameterizations into individual-level causal DAGs (with fitness as an explicit variable) is central to justifying the modularity assumption and the applicability of the do-operator. The manuscript discusses this connection in the relevant section to distinguish killing versus inactivating interventions and to ground the parameter modifications. However, we acknowledge that a more explicit formal construction and verification of conditional independence preservation for the standard parameterizations is not fully detailed. We will add a dedicated subsection with the explicit mapping and independence verification for representative models (CBN, MHN, and HyperTraPS) in the revised version. revision: yes
Referee: [Section on implementation for each model] Section describing implementations for each EvAM method: While the abstract states that interventions are implemented via specific parameter modifications and that equivalent procedures are identified, the provided description supplies no full derivations, error bounds, or validation against ground-truth interventions. Without these, it is not possible to confirm that the procedures are equivalent across models or that they correctly correspond to the do-operator semantics.

Authors: The implementations are obtained by applying the do-operator (and conditional interventions) to each model's structural representation, with equivalent procedures identified for structurally similar models. The main text focuses on the resulting parameter modifications for brevity. We agree that full derivations, a validation study against ground-truth interventions on synthetic data, and explicit discussion of error bounds (or their absence under exact model assumptions) would improve rigor and verifiability. We will move detailed derivations to the supplement, add a validation subsection using simulated data, and clarify the exact (non-approximate) nature of the correspondence in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity; external causal framework applied to EvAMs

full rationale

The paper formalizes interventions on existing EvAM methods by direct application of Pearl's do-operator and conditional interventions, with implementation via parameter modifications. It draws on external individual-level causal DAG concepts (with fitness as explicit variable) to distinguish killing vs. inactivating interventions and to examine modularity, without reducing any derived quantity to a fitted parameter or self-defined input by construction. No load-bearing step equates a prediction to its own inputs, invokes a uniqueness theorem from the same authors, or renames a known result. The derivation remains self-contained against the external benchmarks of Pearl causality and standard EvAM representations.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the applicability of Pearl's causal framework to EvAMs and on the justification of the modularity assumption; no free parameters or new entities are introduced in the abstract.

axioms (2)

domain assumption Modularity assumption is justified for the EvAMs under study
Required for the intervention to be well-defined; paper states it analyzes whether this holds.
domain assumption EvAMs can be interpreted as structural causal models
Foundation for applying Pearl's do-operator to define interventions.

pith-pipeline@v0.9.1-grok · 5822 in / 1394 out tokens · 38570 ms · 2026-06-30T11:28:04.164747+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 6 canonical work pages

[1]

For every genotype𝑥(that does not contain𝑔mutated), set𝑟 𝑥→𝑥 to𝑟 𝑥→𝑥+𝑔, and set 𝑟 𝑥→𝑥+𝑔 to 0 (i.e., genotypes that contain gene𝑔cannot be transitioned to, equivalent to havingtheirrowszeroed.) Thisyields𝑅 −𝑔 (theHyperHMMimplementationof𝑀 𝑚,−𝑔). 12
[2]

With𝑅 −𝑔 obtain ˆh−𝑔, the predicted hitting probabilities after intervening in gene𝑔
[3]

With𝑅 −𝑔 obtain ˆf 𝑣 −𝑔, the predicted distribution of genotypes after𝑣steps (𝑣taking all integer values from 0 to the number of loci), as per the usualˆf 𝑣 −𝑔 =f 0 𝑅𝑣 −𝑔, wheref 0 is the (row) vector(1,0,0, ....)
[4]

Equivalently (since in the original, unintervened data, each step produces exactly one mutation),𝑃(𝑉=𝑣)is the frequencyofsamplesthathaveundergone𝑣steps

Obtain the predicted population composition via the weighted sum ˆf−𝑔 = Í𝑣=number of loci 𝑣=0 ˆf 𝑣 −𝑔 𝑃(𝑉=𝑣).𝑃(𝑉=𝑣)is the empirical (observed) frequency, in the training sample, of observations with𝑣mutations. Equivalently (since in the original, unintervened data, each step produces exactly one mutation),𝑃(𝑉=𝑣)is the frequencyofsamplesthathaveundergone𝑣s...

2024
[5]

identify

Quantifying intervention effects: intervention objectives Section 3 discusses how to modify the models to obtain predictions under interventions. To properly assess if EvAMs can be used to identify therapeutic targets, we also need to specify what “identify” means, as different objectives can lead to different rankings of the same genes as targets. 13 Her...

2025
[6]

From the trueQ(section 4.1) (or, ifQis not available for the evolutionary regime considered, using simulations), generate a sample
[7]

Use this sample as input for each EvAM method
[8]

Modifying fitted EvAM models to predict the consequences of an intervention

From the output of each EvAM, obtain a modified EvAM after intervening on (making lethal the mutated allele of) gene𝑔. Let𝑀𝑚 denote the fitted model from method𝑚; 𝑀𝑚,−𝑔 isthemodificationof𝑀 𝑚thatresultsfromthekillinginterventionon𝑔(i.e.,after theinterventionthatmakesamutantingene𝑔alethalmutation). Howweobtain𝑀 𝑚,−𝑔 is explained in section“Modifying fitted...
[9]

Genotype pre- dictions and hitting probabilities from EvAMs and error model

From𝑀 𝑚,−𝑔, use the standard procedures for each method (Appendix,“Genotype pre- dictions and hitting probabilities from EvAMs and error model”, section C.6) to obtain the predicted hitting probabilities,ˆh𝑚,−𝑔, and the predicted distribution of genotypes, ˆf𝑚,−𝑔, after targeting gene𝑔. For example, in Fig. 3 this yields each of the rows after “None” in p...

2019
[10]

no causation without manipulation

Discussion Afterdifferentiatingbetweentwotypesofintervention(killingandinactivating)andexplaining why a naive approach to intervention leads to incorrect predictions for EvAMs, we have presentedaconceptualizationofinterventionsoneachofthecurrentlyavailableEvAMmethods 17 (OT, OncoBN, CBN, H-ESBCN, MHN, HyperHMM, HyperTraPS). We show both what an interventi...

2024
[11]

ECAL vs HCAL

Acknowledgments Álvaro San Martín and Eric Macías Fasio, for literature searches, and initial discussion. MembersoftheStochasticBiologygroupattheUniversityofBergenfordiscussion. Supported bygrantsPID2024-156888OB-I00fundedbyMICIU/AEI/10.13039/501100011033/FEDER, EU and PID2019-111256RB-I00 funded by MCIN /AEI/10.13039/501100011033 to RDU. This work was su...

work page doi:10.13039/501100011033/feder
[12]

Code availability Code available fromhttps://github.com/rdiaz02/scm-interv-evams
[13]

Bibliography Aga, O. N. L., Brun, M., Dauda, K. A., Diaz-Uriarte, R., Giannakis, K., and Johnston, I. G. 2024. HyperTraPS-CT: Inference and prediction for accumulation pathways with flexible data and model structures.PLOS Computational Biology,20(9), e1012393. URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pc bi.1012393, DOI: https...

work page doi:10.1371/journal.pc 2024
[14]

URL:https://jitc.bmj.com/content/11/3/e006056, DOI: https://doi.org/10.1136/jitc-2022-006056

Inactivation of interleukin-30 in colon cancer stem cells via CRISPR/Cas9 genome editing inhibits their oncogenicity and improves host survival.Journal for ImmunoTherapy of Cancer,11(3), e006056. URL:https://jitc.bmj.com/content/11/3/e006056, DOI: https://doi.org/10.1136/jitc-2022-006056. Desper, R., Jiang, F., Kallioniemi, O. P., Moch, H., Papadimitriou,...

work page doi:10.1136/jitc-2022-006056 2022
[15]

URL:http://view.ncbi.nlm.nih.gov/pubmed/10223663

Inferringtreemodelsforoncogenesisfromcomparativegenomehybridizationdata.J ComputBiol,6(1),37–51. URL:http://view.ncbi.nlm.nih.gov/pubmed/10223663. Diaz-Uriarte, R. 2017. OncoSimulR: Genetic simulation with arbitrary epistasis and mutator genesinasexualpopulations.Bioinformatics,33(12),1898–1899. URL:https://academ ic.oup.com/bioinformatics/article/33/12/1...

work page doi:10.1093/bioinformatics/btx077 2017
[16]

URL:https://onlinelibrary.wiley.com/doi/ 10.1002/cso2.1027, DOI: https://doi.org/10.1002/cso2.1027

Oncogenetic network estimation with disjunctive Bayesian networks.Computational and Systems Oncology,1(2), e1027. URL:https://onlinelibrary.wiley.com/doi/ 10.1002/cso2.1027, DOI: https://doi.org/10.1002/cso2.1027. Norris, J. R. 1997.Markov Chains. Cambridge Series in Statistical and Probabilistic Math- ematics. Cambridge University Press, Cambridge. URL:h...

work page doi:10.1002/cso2.1027 1997
[17]

evolution only proceeds uphill in the fitnesslandscape,i.e.,itonlyevolvesbyfixingbeneficialmutations

URL:http://science.sciencemag.org/content/312/5770/111, DOI: https://doi.org/10.1126/science.1123539. Weissman, D. B., Desai, M. M., Fisher, D. S., and Feldman, M. W. 2009. The rate at which asexual populations cross fitness valleys.Theoretical Population Biology,75(4), 286–300. URL:http://dx.doi.org/10.1016/j.tpb.2009.02.006, DOI: https://doi.org/10.1016...

work page doi:10.1126/science.1123539 2009
[18]

scaled transition rate matrix

nor the role of bulk sequencing (Diaz-Uriarte and Johnston, 2025). 27 A.2.1. Scaling the transition rate matrix We can scale the transition rate so that the process is faster or slower, or expressed in different units of time (for example, Gillespie 1984, shows expressions for the scaling of the transition rates so that time is in units of𝑁generations). W...

2025
[19]

Focal genotypes are all accessible genotypes in the true fitness landscape with a certain number of mutations (e.g., 3 to 5 mutations); we use𝑧to denote any one of the focal genotypes
[20]

If under SSWM, fromQ, the true transition rate matrix of the fitness landscape, obtain thetransitionmatrixcorrespondingtotheembeddeddiscrete-timechain;fromit,obtain the hitting (or first passage) probabilities (Norris, 1997; Privault, 2018),h, starting from the WT state (hitting probabilities can be readily obtained using, for example, the markovchainRpac...

1997
[21]

From it, obtain the hitting probability of genoypes under the intervention,h−𝑔.ℎ −𝑔 (𝑧)is the hitting probability of genotype𝑧when intervening on gene𝑔

Foreverygene𝑔,obtainQ −𝑔,thetransitionratematrixwheninterveningongene𝑔and, from it, the transition matrix corresponding to the embedded discrete-time chain. From it, obtain the hitting probability of genoypes under the intervention,h−𝑔.ℎ −𝑔 (𝑧)is the hitting probability of genotype𝑧when intervening on gene𝑔. (IfQ −𝑔 is not available for the evolutionary r...
[22]

no intervention

For each EvAM,𝑀𝑚: 4.1. From𝑀 𝑚,obtainthepredictedhittingprobabilityofgenotypesundernointerven- tion ˆh𝑚. ˆℎ𝑚(𝑧)is the hitting probability, for method𝑚, of genotype𝑧under no intervention. 4.2. Modifythemodelresultsundereachinterventiontoobtain𝑀 𝑚,−𝑔 and,fromit,the predicted hitting probability of genotypesˆh𝑚,−𝑔. ˆℎ𝑚,−𝑔 (𝑧)is the predicted hitting probabil...
[23]

FromQ,thetruetransitionratematrixofthefitnesslandscape,obtainf,thedistribution of genotypes, using the same time distribution as used when generating the samples (section 4.1). (As above, ifQis not available, we can instead use forward genetic simulations to estimatef.) 31 Fromf, obtain the true population mean number of mutations as𝐷= Í 𝑧 𝑛(𝑧)𝑓(𝑧), where...
[24]

For every𝑔, fromQ(or using forward genetic simulations on the modified fitness landscapes),obtainf −𝑔,andfromitthetruemeannumberofmutationsunderintervention on𝑔:𝐷 −𝑔 = Í 𝑧 𝑛(𝑧)𝑓 −𝑔 (𝑧)
[25]

no inter- vention

For each EvAM,𝑀𝑚: 3.1. From𝑀 𝑚, obtain the predicted distribution of genotypes under no intervention ˆf𝑚. ˆ𝑓𝑚(𝑧)is the predicted frequency, for method𝑚, of genotype𝑧under no intervention. From ˆf𝑚 obtain the predicted mean number of driver mutations, ˆ𝐷 𝑚 = Í 𝑧 𝑛(𝑧) ˆ𝑓𝑚(𝑧). 3.2. From𝑀 𝑚,−𝑔 obtain ˆf𝑚,−𝑔 and from this, the predicted mean number of driver m...
[26]

Obtain𝑓(𝑊𝑇), i.e.,𝑓(𝑧)when𝑧is the WT genotype
[27]

For every𝑔, obtain𝑓 −𝑔 (𝑊𝑇), the frequency of the WT genotype when intervening on gene𝑔
[28]

Obtain ˆ𝑓𝑚(𝑊𝑇)(predicted frequency of WT for model𝑚)

For each EvAM,𝑀𝑚: 3.1. Obtain ˆ𝑓𝑚(𝑊𝑇)(predicted frequency of WT for model𝑚). 3.2. For every𝑔, obtain ˆ𝑓𝑚,−𝑔 (𝑊𝑇)(predicted frequency of WT for model𝑚under intervention𝑔). 3.3. For each EvAM, compute the rank correlation between the vectors (𝑓(𝑊𝑇), 𝑓 −𝐴 (𝑊𝑇), 𝑓 −𝐵 (𝑊𝑇), . . .)and( ˆ𝑓(𝑊𝑇), ˆ𝑓−𝐴 (𝑊𝑇), ˆ𝑓−𝐵 (𝑊𝑇), . . .). 32 a) O mut Targets None A B C Truth C...
[29]

Identify the genotypes that become non-viable after the intervention: all those that have 𝑔
[30]

that appear in formerly viable genotypes only with𝑔

Find all genes,ℎ1, ℎ2, . . .that appear in formerly viable genotypes only with𝑔
[31]

MHN:graphinterventions

Remove, from the DAG, any edges that involve𝑔, ℎ1, ℎ2, . . .(i.e., any edges that involve 𝑔, ℎ 1, ℎ2, . . .as origin or destination). Code:Theabovealgorithmisinfunctionkill_gene_DAG(calledfromkill_gene)infile kill-gene-and-output-from-cpm.R. Code:Functionintervene_cpm_every_gene, in fileintervention.R, is the main in- tervention function, which callskill_...

2020

[1] [1]

For every genotype𝑥(that does not contain𝑔mutated), set𝑟 𝑥→𝑥 to𝑟 𝑥→𝑥+𝑔, and set 𝑟 𝑥→𝑥+𝑔 to 0 (i.e., genotypes that contain gene𝑔cannot be transitioned to, equivalent to havingtheirrowszeroed.) Thisyields𝑅 −𝑔 (theHyperHMMimplementationof𝑀 𝑚,−𝑔). 12

[2] [2]

With𝑅 −𝑔 obtain ˆh−𝑔, the predicted hitting probabilities after intervening in gene𝑔

[3] [3]

With𝑅 −𝑔 obtain ˆf 𝑣 −𝑔, the predicted distribution of genotypes after𝑣steps (𝑣taking all integer values from 0 to the number of loci), as per the usualˆf 𝑣 −𝑔 =f 0 𝑅𝑣 −𝑔, wheref 0 is the (row) vector(1,0,0, ....)

[4] [4]

Equivalently (since in the original, unintervened data, each step produces exactly one mutation),𝑃(𝑉=𝑣)is the frequencyofsamplesthathaveundergone𝑣steps

Obtain the predicted population composition via the weighted sum ˆf−𝑔 = Í𝑣=number of loci 𝑣=0 ˆf 𝑣 −𝑔 𝑃(𝑉=𝑣).𝑃(𝑉=𝑣)is the empirical (observed) frequency, in the training sample, of observations with𝑣mutations. Equivalently (since in the original, unintervened data, each step produces exactly one mutation),𝑃(𝑉=𝑣)is the frequencyofsamplesthathaveundergone𝑣s...

2024

[5] [5]

identify

Quantifying intervention effects: intervention objectives Section 3 discusses how to modify the models to obtain predictions under interventions. To properly assess if EvAMs can be used to identify therapeutic targets, we also need to specify what “identify” means, as different objectives can lead to different rankings of the same genes as targets. 13 Her...

2025

[6] [6]

From the trueQ(section 4.1) (or, ifQis not available for the evolutionary regime considered, using simulations), generate a sample

[7] [7]

Use this sample as input for each EvAM method

[8] [8]

Modifying fitted EvAM models to predict the consequences of an intervention

From the output of each EvAM, obtain a modified EvAM after intervening on (making lethal the mutated allele of) gene𝑔. Let𝑀𝑚 denote the fitted model from method𝑚; 𝑀𝑚,−𝑔 isthemodificationof𝑀 𝑚thatresultsfromthekillinginterventionon𝑔(i.e.,after theinterventionthatmakesamutantingene𝑔alethalmutation). Howweobtain𝑀 𝑚,−𝑔 is explained in section“Modifying fitted...

[9] [9]

Genotype pre- dictions and hitting probabilities from EvAMs and error model

From𝑀 𝑚,−𝑔, use the standard procedures for each method (Appendix,“Genotype pre- dictions and hitting probabilities from EvAMs and error model”, section C.6) to obtain the predicted hitting probabilities,ˆh𝑚,−𝑔, and the predicted distribution of genotypes, ˆf𝑚,−𝑔, after targeting gene𝑔. For example, in Fig. 3 this yields each of the rows after “None” in p...

2019

[10] [10]

no causation without manipulation

Discussion Afterdifferentiatingbetweentwotypesofintervention(killingandinactivating)andexplaining why a naive approach to intervention leads to incorrect predictions for EvAMs, we have presentedaconceptualizationofinterventionsoneachofthecurrentlyavailableEvAMmethods 17 (OT, OncoBN, CBN, H-ESBCN, MHN, HyperHMM, HyperTraPS). We show both what an interventi...

2024

[11] [11]

ECAL vs HCAL

Acknowledgments Álvaro San Martín and Eric Macías Fasio, for literature searches, and initial discussion. MembersoftheStochasticBiologygroupattheUniversityofBergenfordiscussion. Supported bygrantsPID2024-156888OB-I00fundedbyMICIU/AEI/10.13039/501100011033/FEDER, EU and PID2019-111256RB-I00 funded by MCIN /AEI/10.13039/501100011033 to RDU. This work was su...

work page doi:10.13039/501100011033/feder

[12] [12]

Code availability Code available fromhttps://github.com/rdiaz02/scm-interv-evams

[13] [13]

Bibliography Aga, O. N. L., Brun, M., Dauda, K. A., Diaz-Uriarte, R., Giannakis, K., and Johnston, I. G. 2024. HyperTraPS-CT: Inference and prediction for accumulation pathways with flexible data and model structures.PLOS Computational Biology,20(9), e1012393. URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pc bi.1012393, DOI: https...

work page doi:10.1371/journal.pc 2024

[14] [14]

URL:https://jitc.bmj.com/content/11/3/e006056, DOI: https://doi.org/10.1136/jitc-2022-006056

Inactivation of interleukin-30 in colon cancer stem cells via CRISPR/Cas9 genome editing inhibits their oncogenicity and improves host survival.Journal for ImmunoTherapy of Cancer,11(3), e006056. URL:https://jitc.bmj.com/content/11/3/e006056, DOI: https://doi.org/10.1136/jitc-2022-006056. Desper, R., Jiang, F., Kallioniemi, O. P., Moch, H., Papadimitriou,...

work page doi:10.1136/jitc-2022-006056 2022

[15] [15]

URL:http://view.ncbi.nlm.nih.gov/pubmed/10223663

Inferringtreemodelsforoncogenesisfromcomparativegenomehybridizationdata.J ComputBiol,6(1),37–51. URL:http://view.ncbi.nlm.nih.gov/pubmed/10223663. Diaz-Uriarte, R. 2017. OncoSimulR: Genetic simulation with arbitrary epistasis and mutator genesinasexualpopulations.Bioinformatics,33(12),1898–1899. URL:https://academ ic.oup.com/bioinformatics/article/33/12/1...

work page doi:10.1093/bioinformatics/btx077 2017

[16] [16]

URL:https://onlinelibrary.wiley.com/doi/ 10.1002/cso2.1027, DOI: https://doi.org/10.1002/cso2.1027

Oncogenetic network estimation with disjunctive Bayesian networks.Computational and Systems Oncology,1(2), e1027. URL:https://onlinelibrary.wiley.com/doi/ 10.1002/cso2.1027, DOI: https://doi.org/10.1002/cso2.1027. Norris, J. R. 1997.Markov Chains. Cambridge Series in Statistical and Probabilistic Math- ematics. Cambridge University Press, Cambridge. URL:h...

work page doi:10.1002/cso2.1027 1997

[17] [17]

evolution only proceeds uphill in the fitnesslandscape,i.e.,itonlyevolvesbyfixingbeneficialmutations

URL:http://science.sciencemag.org/content/312/5770/111, DOI: https://doi.org/10.1126/science.1123539. Weissman, D. B., Desai, M. M., Fisher, D. S., and Feldman, M. W. 2009. The rate at which asexual populations cross fitness valleys.Theoretical Population Biology,75(4), 286–300. URL:http://dx.doi.org/10.1016/j.tpb.2009.02.006, DOI: https://doi.org/10.1016...

work page doi:10.1126/science.1123539 2009

[18] [18]

scaled transition rate matrix

nor the role of bulk sequencing (Diaz-Uriarte and Johnston, 2025). 27 A.2.1. Scaling the transition rate matrix We can scale the transition rate so that the process is faster or slower, or expressed in different units of time (for example, Gillespie 1984, shows expressions for the scaling of the transition rates so that time is in units of𝑁generations). W...

2025

[19] [19]

Focal genotypes are all accessible genotypes in the true fitness landscape with a certain number of mutations (e.g., 3 to 5 mutations); we use𝑧to denote any one of the focal genotypes

[20] [20]

If under SSWM, fromQ, the true transition rate matrix of the fitness landscape, obtain thetransitionmatrixcorrespondingtotheembeddeddiscrete-timechain;fromit,obtain the hitting (or first passage) probabilities (Norris, 1997; Privault, 2018),h, starting from the WT state (hitting probabilities can be readily obtained using, for example, the markovchainRpac...

1997

[21] [21]

From it, obtain the hitting probability of genoypes under the intervention,h−𝑔.ℎ −𝑔 (𝑧)is the hitting probability of genotype𝑧when intervening on gene𝑔

Foreverygene𝑔,obtainQ −𝑔,thetransitionratematrixwheninterveningongene𝑔and, from it, the transition matrix corresponding to the embedded discrete-time chain. From it, obtain the hitting probability of genoypes under the intervention,h−𝑔.ℎ −𝑔 (𝑧)is the hitting probability of genotype𝑧when intervening on gene𝑔. (IfQ −𝑔 is not available for the evolutionary r...

[22] [22]

no intervention

For each EvAM,𝑀𝑚: 4.1. From𝑀 𝑚,obtainthepredictedhittingprobabilityofgenotypesundernointerven- tion ˆh𝑚. ˆℎ𝑚(𝑧)is the hitting probability, for method𝑚, of genotype𝑧under no intervention. 4.2. Modifythemodelresultsundereachinterventiontoobtain𝑀 𝑚,−𝑔 and,fromit,the predicted hitting probability of genotypesˆh𝑚,−𝑔. ˆℎ𝑚,−𝑔 (𝑧)is the predicted hitting probabil...

[23] [23]

FromQ,thetruetransitionratematrixofthefitnesslandscape,obtainf,thedistribution of genotypes, using the same time distribution as used when generating the samples (section 4.1). (As above, ifQis not available, we can instead use forward genetic simulations to estimatef.) 31 Fromf, obtain the true population mean number of mutations as𝐷= Í 𝑧 𝑛(𝑧)𝑓(𝑧), where...

[24] [24]

For every𝑔, fromQ(or using forward genetic simulations on the modified fitness landscapes),obtainf −𝑔,andfromitthetruemeannumberofmutationsunderintervention on𝑔:𝐷 −𝑔 = Í 𝑧 𝑛(𝑧)𝑓 −𝑔 (𝑧)

[25] [25]

no inter- vention

For each EvAM,𝑀𝑚: 3.1. From𝑀 𝑚, obtain the predicted distribution of genotypes under no intervention ˆf𝑚. ˆ𝑓𝑚(𝑧)is the predicted frequency, for method𝑚, of genotype𝑧under no intervention. From ˆf𝑚 obtain the predicted mean number of driver mutations, ˆ𝐷 𝑚 = Í 𝑧 𝑛(𝑧) ˆ𝑓𝑚(𝑧). 3.2. From𝑀 𝑚,−𝑔 obtain ˆf𝑚,−𝑔 and from this, the predicted mean number of driver m...

[26] [26]

Obtain𝑓(𝑊𝑇), i.e.,𝑓(𝑧)when𝑧is the WT genotype

[27] [27]

For every𝑔, obtain𝑓 −𝑔 (𝑊𝑇), the frequency of the WT genotype when intervening on gene𝑔

[28] [28]

Obtain ˆ𝑓𝑚(𝑊𝑇)(predicted frequency of WT for model𝑚)

For each EvAM,𝑀𝑚: 3.1. Obtain ˆ𝑓𝑚(𝑊𝑇)(predicted frequency of WT for model𝑚). 3.2. For every𝑔, obtain ˆ𝑓𝑚,−𝑔 (𝑊𝑇)(predicted frequency of WT for model𝑚under intervention𝑔). 3.3. For each EvAM, compute the rank correlation between the vectors (𝑓(𝑊𝑇), 𝑓 −𝐴 (𝑊𝑇), 𝑓 −𝐵 (𝑊𝑇), . . .)and( ˆ𝑓(𝑊𝑇), ˆ𝑓−𝐴 (𝑊𝑇), ˆ𝑓−𝐵 (𝑊𝑇), . . .). 32 a) O mut Targets None A B C Truth C...

[29] [29]

Identify the genotypes that become non-viable after the intervention: all those that have 𝑔

[30] [30]

that appear in formerly viable genotypes only with𝑔

Find all genes,ℎ1, ℎ2, . . .that appear in formerly viable genotypes only with𝑔

[31] [31]

MHN:graphinterventions

Remove, from the DAG, any edges that involve𝑔, ℎ1, ℎ2, . . .(i.e., any edges that involve 𝑔, ℎ 1, ℎ2, . . .as origin or destination). Code:Theabovealgorithmisinfunctionkill_gene_DAG(calledfromkill_gene)infile kill-gene-and-output-from-cpm.R. Code:Functionintervene_cpm_every_gene, in fileintervention.R, is the main in- tervention function, which callskill_...

2020