Recognition: 1 theorem link
· Lean TheoremProbability of Root Cause: A Counterfactual Definition and Its Identification
Pith reviewed 2026-05-13 05:22 UTC · model grok-4.3
The pith
A formal counterfactual definition makes the probability that a variable set is a root cause identifiable from observed data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce a formal definition of a root cause as the variable set satisfying a counterfactual root condition, in which the outcome would change if that set were altered while holding other paths fixed. From this we define the probability of root cause (PRC) as the conditional probability that a given candidate set is the root cause of the observed outcome. Under the standard assumptions of the potential outcomes framework we prove that the PRC is identifiable and derive the explicit identification formula that expresses it in terms of observed quantities.
What carries the argument
The probability of root cause (PRC), the conditional probability that a candidate variable set satisfies the counterfactual root condition for an observed outcome.
If this is right
- Root causes can be separated from proximate causes when attributing an outcome.
- The PRC can be calculated directly from observed data without reconstructing the full causal graph.
- The same formula applies across medical diagnosis and engineering fault-finding tasks.
- Different candidate sets can be ranked by their PRC values on the same evidence.
Where Pith is reading between the lines
- The definition could be used to audit existing root-cause tools by checking whether they recover high-PRC sets.
- Applications to sequential or time-varying outcomes would require extending the counterfactual root condition to paths across time.
- When multiple root-cause sets are possible, the PRC supplies a natural way to allocate responsibility among them.
Load-bearing premise
The standard potential-outcomes assumptions of consistency, positivity, and no unmeasured confounding hold for the variables and outcome in question.
What would settle it
In a controlled setting where the true root cause is known by direct intervention, compute the PRC from observational data alone and verify whether it assigns substantially higher probability to the known root cause than to other candidates.
Figures
read the original abstract
Attributing an observed outcome to its root cause is a central task in domains ranging from medical diagnosis to engineering fault diagnosis. Existing approaches either equate the root cause with a root node of the causal graph, as in causal-discovery-based root cause analysis, or target causes more broadly and thereby favour proximate ones, as with the probability of causation and posterior causal effects. We argue that this issue stems from the absence of a formal definition of a root cause, which has led to methods designed for other purposes being applied to root cause attribution by default. We address this by giving a formal, individual-level definition of a root cause within the potential outcomes framework, based on the notion of an individual cause and a counterfactual root condition motivated by mediation analysis. Building on this definition, we propose the probability of root cause (PRC), which quantifies how probable it is that a candidate variable set is the root cause of a given outcome, conditional on observed evidence. Under standard assumptions, we establish the identifiability of the PRC and derive an explicit identification formula. Two numerical examples illustrate the approach.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a formal individual-level definition of a root cause within the potential outcomes framework, based on an individual cause and a counterfactual root condition motivated by mediation analysis. It defines the Probability of Root Cause (PRC) to quantify the probability that a candidate variable set is the root cause of an observed outcome given evidence. Under standard assumptions (consistency, positivity, no unmeasured confounding), the authors claim the PRC is identifiable and derive an explicit identification formula, illustrated with two numerical examples.
Significance. If the identification result holds, the work supplies a principled, individual-level quantity for root cause attribution that distinguishes root from proximate causes, addressing a gap between causal discovery methods and existing measures like the probability of causation. The derivation builds directly on the potential outcomes framework without introducing free parameters or self-referential quantities, and the numerical examples provide concrete verification consistent with the formula.
minor comments (3)
- The abstract invokes 'standard assumptions' without enumerating them; a brief parenthetical list (consistency, positivity, no unmeasured confounding) would improve immediate readability.
- In the numerical examples, ensure that the observed evidence and candidate variable sets are explicitly tabulated alongside the computed PRC values for direct comparison with the identification formula.
- Notation for the counterfactual root condition should be introduced with a dedicated display equation before its use in the identification derivation.
Simulated Author's Rebuttal
We thank the referee for their positive and accurate summary of our manuscript, as well as for recommending acceptance. We are pleased that the work is viewed as supplying a principled individual-level quantity for root cause attribution that addresses a gap in the literature.
Circularity Check
No significant circularity detected in derivation
full rationale
The paper defines an individual-level root cause using the potential outcomes framework and a counterfactual root condition drawn from mediation analysis, then derives an identification formula for the probability of root cause (PRC) under the standard external assumptions of consistency, positivity, and no unmeasured confounding. No step reduces by construction to a fitted parameter, self-referential quantity, or load-bearing self-citation; the derivation proceeds directly from the new definition plus these pre-existing assumptions. Numerical examples illustrate the formula without introducing internal validation loops. This is the expected non-circular outcome for a paper that extends an established framework with an explicit, falsifiable identification result.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard causal inference assumptions including consistency, positivity, and no unmeasured confounding
invented entities (1)
-
Probability of Root Cause (PRC)
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclearDefinition 3.2 (Root Cause) and Theorem 4.1 (identifiability formula under Assumptions 1-3)
Reference graph
Works this paper leans on
-
[1]
Cai, B., Huang, L., and Xie, M. (2017). Bayesian Networks in Fault Diagnosis.IEEE Transactions on Industrial Informatics, 13(5):2227–2240
work page 2017
- [2]
-
[3]
Cuellar, M. (2022). Causes of Effects and Effects of Causes. In Carriquiry, A. L., Tanur, J. M., and Eddy, W. F., editors,Statistics in the Public Interest: In Memory of Stephen E. Fienberg, pages 211–233. Springer International Publishing, Cham
work page 2022
-
[4]
Dawid, A. P., Faigman, D. L., and Fienberg, S. E. (2014). Fitting Science Into Legal Contexts: Assessing Effects of Causes or Causes of Effects?Sociological Methods & Research, 43(3):359–390
work page 2014
-
[5]
Dawid, A. P., Faigman, D. L., and Fienberg, S. E. (2015). On the Causes of Effects: Response to Pearl.Sociological Methods & Research, 44(1):165–174
work page 2015
-
[6]
Dawid, P., Humphreys, M., and Musio, M. (2022). Bounding Causes of Effects With Mediators.Sociological Methods & Research, page 00491241211036161
work page 2022
-
[7]
Frangakis, C. E. and Rubin, D. B. (2002). Principal stratification in causal inference.Biometrics, 58(1):21–29
work page 2002
-
[8]
Gao, Z., Cecati, C., and Ding, S. X. (2015). A Survey of Fault Diagnosis and Fault-Tolerant Techniques—Part I: Fault Diagnosis With Model-Based and Signal-Based Approaches.IEEE Transactions on Industrial Electronics, 62(6):3757–3767. 17 Ikram,A.,Chakraborty,S.,Mitra,S.,Saini,S.,Bagchi,S.,andKocaoglu,M.(2022). RootCauseAnalysisofFailuresinMicroservices thr...
work page 2015
-
[9]
Imai, K., Keele, L., and Tingley, D. (2010). A general approach to causal mediation analysis.Psychological Methods, 15(4):309–334
work page 2010
-
[10]
Joffe, M. M. (2001). Using Information on Realized Effects to Determine Prospective Causal Effects.Journal of the Royal Statistical Society Series B: Statistical Methodology, 63(4):759–774
work page 2001
-
[11]
Kawakami, Y. and Tian, J. (2025). Mediation Analysis for Probabilities of Causation.Proceedings of the AAAI Conference on Artificial Intelligence, 39(25):26823–26832
work page 2025
-
[12]
Lei, Y., Yang, B., Jiang, X., Jia, F., Li, N., and Nandi, A. K. (2020). Applications of machine learning to machine fault diagnosis: A review and roadmap.Mechanical Systems and Signal Processing, 138:106587
work page 2020
-
[13]
Li, J., Chu, B. B., Scheller, I. F., Gagneur, J., and Maathuis, M. H. (2025). Root cause discovery via permutations and Cholesky decomposition.Journal of the Royal Statistical Society Series B: Statistical Methodology, page qkaf066
work page 2025
-
[14]
Li, W., Lu, Z., Jia, J., Xie, M., and Geng, Z. (2024). Retrospective causal inference with multiple effect variables.Biometrika, 111(2):573–589
work page 2024
-
[15]
Lin, C.-M., Chang, C., Wang, W.-Y., Wang, K.-D., and Peng, W.-C. (2024). Root Cause Analysis in Microservice Using Neural Granger Causal Discovery.Proceedings of the AAAI Conference on Artificial Intelligence, 38(1):206–213
work page 2024
-
[16]
Lu, Z., Geng, Z., Li, W., Zhu, S., and Jia, J. (2023). Evaluating causes of effects by posterior effects of causes.Biometrika, 110(2):449–465
work page 2023
-
[17]
Neyman, J. (1923). On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9.Statistical Science, 5(4):465–472
work page 1923
-
[18]
Pearl, J. (1999). Probabilities Of Causation: Three Counterfactual Interpretations And Their Identification.Synthese, 121(1):93–149
work page 1999
-
[19]
Pearl, J. (2001). Direct and Indirect Effects. InProceedings of the Seventeenth Conference on Uncertainty and Artificial Intelligence 2001, pages 411–420, San Francisco (CA). Morgan Kaufmann
work page 2001
- [20]
-
[21]
Pearl, J. and Mackenzie, D. (2018).The Book of Why: The New Science of Cause and Effect. Basic books, New York
work page 2018
-
[22]
F., Carr, S., Waring, J., and Dixon-Woods, M
Peerally, M. F., Carr, S., Waring, J., and Dixon-Woods, M. (2017). The problem with root cause analysis.BMJ Quality & Safety, 26(5):417–422
work page 2017
-
[23]
Richens, J. G., Lee, C. M., and Johri, S. (2020). Improving the accuracy of medical diagnosis with causal machine learning.Nature Communications, 11(1):3923
work page 2020
-
[24]
Robins, J. M. and Greenland, S. (1992). Identifiability and Exchangeability for Direct and Indirect Effects.Epidemiology, 3(2)
work page 1992
-
[25]
Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies.Journal of Educational Psychology, 66(5):688–701
work page 1974
-
[26]
Rubinstein, M., Cuellar, M., and Malinsky, D. (2025). Mediated probabilities of causation.Journal of Causal Inference, 13(1):20240019
work page 2025
-
[27]
Shen, G., Wang, P., Hu, H., and Ye, Q. (2021). Fault Root Cause Diagnosis Method Based on Recurrent Neural Network and Granger Causality. In2021 CAA Symposium on Fault Detection, Supervision, and Safety for Technical Processes (SAFEPROCESS), pages 1–6
work page 2021
-
[28]
Shuai, K., Liu, L., He, Y., and Li, W. (2026). Mediation analysis with unmeasured confounding between parallel mediators and outcome.Electronic Journal of Statistics, 20(1):1388–1427. Solé, M., Muntés-Mulero, V., Rana, A. I., and Estrada, G. (2017). Survey on models and techniques for root-cause analysis.arXiv preprint arXiv:1701.08546
-
[29]
(2016).Introduction to Data Mining
Tan, P.-N., Steinbach, M., and Kumar, V. (2016).Introduction to Data Mining. Pearson Education India
work page 2016
- [30]
-
[31]
VanderWeele, T. and Vansteelandt, S. (2014). Mediation Analysis with Multiple Mediators.Epidemiologic Methods, 2(1):95–115. 18
work page 2014
-
[32]
Xu, Z. and Dang, Y. (2022). Data-driven causal knowledge graph construction for root cause analysis in quality problem solving. International Journal of Production Research, pages 1–19
work page 2022
-
[33]
Yang, L., Wang, P., Wang, Q., Bi, S., Peng, R., Behrensdorf, J., and Beer, M. (2021). Reliability analysis of a complex system with hybrid structures and multi-level dependent life metrics.Reliability Engineering & System Safety, 209:107469
work page 2021
-
[34]
Zhang, C., Geng, Z., Li, W., and Ding, P. (2025). Identifying and bounding the probability of necessity for causes of effects with ordinal outcomes.Biometrika, 112(3):asaf049
work page 2025
-
[35]
Zhang, J., Tian, J., and Bareinboim, E. (2022). Partial counterfactual identification from observational and experimental data. In International Conference on Machine Learning, pages 26548–26558. PMLR
work page 2022
-
[36]
Zhao, R., Zhang, L., Zhu, S., Lu, Z., Dong, Z., Zhang, C., Xu, J., Geng, Z., and He, Y. (2023). Conditional counterfactual causal effect for individual attribution. InProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, pages 2519–2528. PMLR
work page 2023
-
[37]
Zhou, X. (2022). Semiparametric Estimation for Causal Mediation Analysis with Multiple Causally Ordered Mediators.Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(3):794–821. 19 A Proof of Property 1 Proof. 𝑃𝑜𝑠𝑡𝑇𝐶𝐸(𝑋 𝑘 ⇒𝑌|𝐸=𝑒) =E 𝑌𝑋𝑘=1 −𝑌 𝑋𝑘=0 |𝐸=𝑒 = 1∑︁ 𝑥∗ 𝑖 =0 Pr 𝑌𝑋𝑘=1 =1,(𝑋 𝑖)𝑋𝑘=1 =𝑥 ∗ 𝑖 |𝐸=𝑒 − 1∑︁ 𝑥∗ 𝑖 =0 Pr 𝑌𝑋𝑘=0 =1,(𝑋 𝑖...
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.