Recognition: 2 theorem links
· Lean TheoremTowards a holistic understanding of Selection Bias for Causal Effect Identification
Pith reviewed 2026-05-14 19:20 UTC · model grok-4.3
The pith
Necessary and sufficient conditions identify the average treatment effect under selection bias via weak assumptions on probability classes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We provide necessary and sufficient conditions for ATE identifiability, leveraging weak assumptions on probability classes to characterize propensity score and selection probability. Compared to previous works, our results extend existing graphical identifiability criteria and offer a more comprehensive understanding of causal effect identification with strictly weaker conditions in the presence of selection bias.
What carries the argument
The necessary and sufficient identifiability conditions obtained by characterizing the propensity score and selection probability under weak assumptions on probability classes.
If this is right
- The population ATE can be recovered from data drawn only from a selected subpopulation whenever the derived conditions are satisfied.
- Selection bias need not preclude causal identification even when standard graphical criteria are violated.
- Propensity-score and selection-probability characterizations become available under assumptions weaker than those required by prior graphical methods.
- Causal-effect estimation algorithms can be built directly on the new characterizations for practical use with biased samples.
Where Pith is reading between the lines
- The same probability-class approach may extend to identifiability questions involving other forms of bias such as confounding or measurement error.
- Practical checks for the conditions could be implemented via flexible nonparametric estimators of the propensity and selection functions.
- Study-design recommendations could follow for minimizing the severity of selection bias so that the new conditions are more likely to hold.
- The framework may connect to existing results on transportability of causal effects across populations.
Load-bearing premise
Weak assumptions on the classes of possible probability distributions are enough to characterize the propensity score and selection probability.
What would settle it
A concrete causal structure and joint distribution in which the stated conditions hold yet the population ATE cannot be recovered from the selected sample, or in which the ATE is recoverable but the conditions fail.
Figures
read the original abstract
Selection bias is pervasive in observational studies. For example, large scale biobanks data can exhibit ``healthy volunteer bias'' when respondents are healthier and of higher socio-economic status than the population they are meant to represent. Recovering causal effects from such sub-population is an important problem in causal inference, as estimating average treatment effects (ATE) from selected populations can result in a severely biased estimate of the ATE from the whole population. In this paper, we investigate the identifiability of the ATE under selection bias. We provide necessary and sufficient conditions for ATE identifiability, leveraging weak assumptions on probability classes to characterize propensity score and selection probability. Compared to previous works, our results extend existing graphical identifiability criteria and offer a more comprehensive understanding of causal effect identification with strictly weaker conditions in the presence of selection bias.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript investigates identifiability of the average treatment effect (ATE) under selection bias, using examples such as healthy volunteer bias in biobanks. It claims to supply necessary and sufficient conditions for ATE identifiability by leveraging weak assumptions on probability classes that characterize the propensity score and selection probability. These conditions are asserted to extend existing graphical identifiability criteria while employing strictly weaker assumptions in the presence of selection bias.
Significance. If the claimed necessary and sufficient conditions are rigorously established, the work would advance causal inference by providing a more general characterization of ATE recovery from selected subpopulations. This could broaden the scope of identifiable causal effects beyond current graphical criteria, with direct relevance to observational data in epidemiology and social sciences where selection mechanisms are common.
major comments (1)
- [Abstract] Abstract: The central claim of necessary and sufficient conditions for ATE identifiability rests on 'weak assumptions on probability classes' that characterize the propensity score and selection probability, yet no explicit definitions of these probability classes, theorem statements, derivations, or counterexamples are supplied. This absence prevents verification that the conditions are indeed strictly weaker than prior graphical criteria or that they are load-bearing for identifiability.
Simulated Author's Rebuttal
We thank the referee for their review and for highlighting the need for clarity on our central claims. We address the major comment point by point below.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim of necessary and sufficient conditions for ATE identifiability rests on 'weak assumptions on probability classes' that characterize the propensity score and selection probability, yet no explicit definitions of these probability classes, theorem statements, derivations, or counterexamples are supplied. This absence prevents verification that the conditions are indeed strictly weaker than prior graphical criteria or that they are load-bearing for identifiability.
Authors: We agree that the abstract is a high-level summary and does not contain the full technical details, which is standard practice to maintain brevity. The explicit definitions of the probability classes (characterizing the propensity score and selection probability under our weak assumptions), the necessary and sufficient conditions, their theorem statements, derivations/proofs, and counterexamples demonstrating that the conditions are strictly weaker than prior graphical criteria are all provided in the main body of the manuscript. Specifically, these appear in Sections 3 (definitions and setup), 4 (main theorems on ATE identifiability), and 5 (comparisons to graphical criteria with counterexamples). This structure allows full verification of the claims. revision: no
Circularity Check
No significant circularity detected in derivation chain
full rationale
The paper states necessary and sufficient conditions for ATE identifiability under weak external assumptions on probability classes that characterize propensity and selection probabilities. These assumptions are introduced as independent inputs from probability theory rather than being fitted or defined in terms of the target identifiability result. No equations or steps in the provided material reduce the claimed conditions to self-referential fits, self-citations that bear the full load, or renamings of prior results. The extension of graphical criteria rests on these stated assumptions without circular reduction, making the derivation self-contained.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Weak assumptions on probability classes suffice to characterize propensity score and selection probability
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclearWe provide necessary and sufficient conditions for ATE identifiability, leveraging weak assumptions on probability classes to characterize propensity score and selection probability.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclearCorollary 3.9 (Selection-backdoor as special case of Condition 1)
Reference graph
Works this paper leans on
-
[1]
2015 , publisher=
Causal inference in statistics, social, and biomedical sciences , author=. 2015 , publisher=
2015
-
[2]
Abouei, Amir Mohammad and Mokhtarian, Ehsan and Kiyavash, Negar and Grossglauser, Matthias , langid =. Causal
-
[3]
Abouei, Amir Mohammad and Mokhtarian, Ehsan and Kiyavash, Negar , year = 2024, month = jan, number =. S-. doi:10.48550/arXiv.2309.02281 , urldate =. 2309.02281 , primaryclass =
-
[4]
Bareinboim, Elias and Pearl, Judea , year = 2011, month = aug, journal =. Controlling. doi:10.1609/aaai.v25i1.8056 , urldate =
-
[5]
Bellot, Alexis , langid =. Towards
-
[6]
Bengio, Yoshua and Hinton, Geoffrey and Yao, Andrew and Song, Dawn and Abbeel, Pieter and Darrell, Trevor and Harari, Yuval Noah and Zhang, Ya-Qin and Xue, Lan and. Managing Extreme. Science , volume =. doi:10.1126/science.adn0117 , urldate =. 2310.17688 , primaryclass =
-
[7]
, author=
Constructing Separators and Adjustment Sets in Ancestral Graphs. , author=. CI@ UAI , pages=
-
[8]
Cai, Yang and Kalavasis, Alkis and Mamali, Katerina and Mehrotra, Anay and Zampetakis, Manolis , year = 2025, publisher =. What. doi:10.48550/ARXIV.2506.04194 , urldate =
-
[9]
Correa, Juan and Tian, Jin and Bareinboim, Elias , year = 2018, month = apr, journal =. Generalized. doi:10.1609/aaai.v32i1.12125 , urldate =
-
[10]
and Tian, Jin and Bareinboim, Elias , year = 2019, month = jul, journal =
Correa, Juan D. and Tian, Jin and Bareinboim, Elias , year = 2019, month = jul, journal =. Identification of. doi:10.1609/aaai.v33i01.33012744 , urldate =
-
[11]
Detecting
Cummings, Jesse and Snorrason, El. Detecting
-
[12]
Dai, Haoyue and Ng, Ignavier and Sun, Jianle and Tang, Zeyu and Luo, Gongxu and Dong, Xinshuai and Spirtes, Peter and Zhang, Kun , year = 2025, month = mar, number =. When. doi:10.48550/arXiv.2503.07302 , urldate =. 2503.07302 , primaryclass =
-
[13]
Daskalakis, Constantinos and Kontonis, Vasilis and Tzamos, Christos and Zampetakis, Manolis , year = 2021, month = jun, number =. A. doi:10.48550/arXiv.2106.15908 , urldate =. 2106.15908 , primaryclass =
-
[14]
Elwert, Felix and Winship, Christopher , year = 2014, month = jul, journal =. Endogenous. doi:10.1146/annurev-soc-071913-043455 , urldate =
-
[15]
Review of
Glymour, Clark and Zhang, Kun and Spirtes, Peter , year = 2019, journal =. Review of
2019
-
[16]
Greenland, Sander , year = 2003, month = may, journal =. Quantifying. doi:10.1097/01.EDE.0000042804.12056.6C , urldate =
-
[17]
International Conference on Machine Learning , pages=
Causal identification under markov equivalence: Completeness results , author=. International Conference on Machine Learning , pages=. 2019 , organization=
2019
-
[18]
Jaber, Amin and Zhang, Jiji and Bareinboim, Elias , year = 2018, month = jul, pages =. A. Proceedings of the. doi:10.24963/ijcai.2018/697 , urldate =
-
[19]
Estimating
Kalisch, Markus , langid =. Estimating
-
[20]
Knox, Dean and Lowe, Will and Mummolo, Jonathan , year = 2020, journal =. Can. doi:10.2139/ssrn.3940802 , urldate =
-
[21]
Kummerfeld, Erich and Ramsey, Joseph , year = 2016, month = aug, pages =. Causal. Proceedings of the 22nd. doi:10.1145/2939672.2939838 , urldate =
-
[22]
Magliacane, Sara and Claassen, Tom and Mooij, Joris M. , year = 2017, month = jan, number =. Ancestral. doi:10.48550/arXiv.1606.07035 , urldate =. 1606.07035 , primaryclass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1606.07035 2017
-
[23]
Simple Graphical Rules for Assessing Selection Bias in General-Population and Selected-Sample Treatment Effects , author =. Am J Epidemiol , volume =. doi:10.1093/aje/kwae145 , urldate =
-
[24]
and Eulig, Elias and Noceti, Nicoletta and Rosasco, Lorenzo and Janzing, Dominik and Aragam, Bryon and Locatello, Francesco , year = 2023, month = nov, urldate =
Montagna, Francesco and Mastakouri, Atalanti A. and Eulig, Elias and Noceti, Nicoletta and Rosasco, Lorenzo and Janzing, Dominik and Aragam, Bryon and Locatello, Francesco , year = 2023, month = nov, urldate =. Assumption Violations in Causal Discovery and the Robustness of Score Matching , booktitle =
2023
-
[25]
Mudge, Joseph F. and Baker, Leanne F. and Edge, Christopher B. and Houlahan, Jeff E. , editor =. Setting an. PLoS ONE , volume =. doi:10.1371/journal.pone.0032734 , urldate =
-
[26]
Ogarrio, Juan Miguel and Spirtes, Peter and Ramsey, Joe , langid =. A
-
[27]
Richardson, Thomas and Spirtes, Peter , year = 2002, month = aug, journal =. Ancestral Graph. doi:10.1214/aos/1031689015 , urldate =
-
[28]
Richardson, Thomas S and Robins, James M , langid =. Single
-
[29]
and Nisimov, Shami and Gurwicz, Yaniv and Novik, Gal , year = 2022, month = jan, number =
Rohekar, Raanan Y. and Nisimov, Shami and Gurwicz, Yaniv and Novik, Gal , year = 2022, month = jan, number =. Iterative. arXiv , langid =:2111.04095 , primaryclass =
-
[30]
Causal Inference in the Presence of Latent Variables and Selection Bias
Spirtes, Peter L. and Meek, Christopher and Richardson, Thomas S. , year = 2013, month = feb, number =. Causal. doi:10.48550/arXiv.1302.4983 , urldate =. 1302.4983 , primaryclass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1302.4983 2013
-
[31]
Spirtes, Peter and Zhang, Jiji , year = 2014, month = nov, journal =. A. doi:10.1214/13-STS429 , urldate =. arXiv , langid =:1502.00829 , primaryclass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1214/13-sts429 2014
-
[32]
, year = 2012, month = jul, journal =
Swanson, James M. , year = 2012, month = jul, journal =. The. doi:10.1016/S0140-6736(12)61179-9 , urldate =
-
[33]
Tian, Jin and Pearl, Judea , langid =. A
-
[34]
Geometry of the Faithfulness Assumption in Causal Inference , author =. Ann. Statist. , volume =. doi:10.1214/12-AOS1080 , urldate =. arXiv , langid =:1207.0547 , primaryclass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1214/12-aos1080
-
[36]
Van Alten, Sjoerd and Domingue, Benjamin W and Faul, Jessica and Galama, Titus and Marees, Andries T , year = 2024, month = apr, journal =. Reweighting. doi:10.1093/ije/dyae054 , urldate =
-
[37]
Versteeg, Philip and Mooij, Joris and Zhang, Cheng , year = 2022, month = jun, pages =. Local. Proceedings of the
2022
-
[38]
How and Why Alpha Should Depend on Sample Size:
Wulff, Jesper N and Taylor, Luke , year = 2023, month = nov, journal =. How and Why Alpha Should Depend on Sample Size:. doi:10.1177/14761270231214429 , urldate =
-
[39]
Zhang, Jiji , langid =. Causal
-
[40]
Zhang, Jiji , year = 2013, month = jun, journal =. A. doi:10.1093/bjps/axs005 , urldate =
-
[41]
Artificial Intelligence , volume =
On the Completeness of Orientation Rules for Causal Discovery in the Presence of Latent Confounders and Selection Bias , author =. Artificial Intelligence , volume =. doi:10.1016/j.artint.2008.08.001 , urldate =
-
[42]
Zhang, Yichi and Lu, Haidong , year = 2025, month = dec, number =. On the. doi:10.48550/arXiv.2502.00924 , urldate =. 2502.00924 , primaryclass =
-
[43]
, author=
On the Identifiability and Estimation of Functional Causal Models in the Presence of Outcome-Dependent Selection. , author=. UAI , year=
-
[44]
Zheng, Yujia and Tang, Zeyu and Qiu, Yiwen and Sch. Detecting and. doi:10.48550/arXiv.2407.00529 , urldate =. 2407.00529 , primaryclass =
-
[45]
Current Epidemiology Reports , volume=
Selection mechanisms and their consequences: understanding and addressing selection bias , author=. Current Epidemiology Reports , volume=. 2020 , publisher=
2020
-
[46]
arXiv preprint arXiv:2512.11219 , year=
Latent variable causal discovery under selection bias , author=. arXiv preprint arXiv:2512.11219 , year=
-
[47]
Econometrica: Journal of the econometric society , pages=
Sample selection bias as a specification error , author=. Econometrica: Journal of the econometric society , pages=. 1979 , publisher=
1979
-
[48]
Uncertainty in Artificial Intelligence , pages=
Causal calculus in the presence of cycles, latent confounders and selection bias , author=. Uncertainty in Artificial Intelligence , pages=. 2020 , organization=
2020
-
[49]
Conference on Causal Learning and Reasoning , pages=
Local constraint-based causal discovery under selection bias , author=. Conference on Causal Learning and Reasoning , pages=. 2022 , organization=
2022
-
[50]
arXiv preprint arXiv:2401.06925 , year=
Modeling Latent Selection with Structural Causal Models , author=. arXiv preprint arXiv:2401.06925 , year=
-
[51]
Epidemiology , pages=
A structural approach to selection bias , author=. Epidemiology , pages=. 2004 , publisher=
2004
-
[52]
Artificial Intelligence , volume=
On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias , author=. Artificial Intelligence , volume=. 2008 , publisher=
2008
-
[53]
Current Psychology , volume=
A population-based investigation of participation rate and self-selection bias in momentary data capture and survey studies , author=. Current Psychology , volume=. 2024 , publisher=
2024
-
[54]
2001 , publisher=
Causation, prediction, and search , author=. 2001 , publisher=
2001
-
[55]
2009 , publisher=
Causality , author=. 2009 , publisher=
2009
-
[56]
arXiv preprint arXiv:2506.04194 , year=
What Makes Treatment Effects Identifiable? Characterizations and Estimators Beyond Unconfoundedness , author=. arXiv preprint arXiv:2506.04194 , year=
-
[57]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Causal effect identification by adjustment under confounding and selection biases , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[58]
, author=
Estimation of non-normalized statistical models by score matching. , author=. Journal of Machine Learning Research , volume=
-
[59]
Biometrika , volume=
The central role of the propensity score in observational studies for causal effects , author=. Biometrika , volume=. 1983 , publisher=
1983
-
[60]
American Journal of Epidemiology , volume=
Simple graphical rules for assessing selection bias in general-population and selected-sample treatment effects , author=. American Journal of Epidemiology , volume=. 2025 , publisher=
2025
-
[61]
2010 , publisher=
Causal inference , author=. 2010 , publisher=
2010
-
[62]
Epidemiology , volume=
Toward a clearer definition of selection bias when estimating causal effects , author=. Epidemiology , volume=. 2022 , publisher=
2022
-
[63]
International Conference on Machine Learning , pages=
Budgeted experiment design for causal structure learning , author=. International Conference on Machine Learning , pages=. 2018 , organization=
2018
-
[64]
International Conference on Machine Learning , pages=
Cost-optimal learning of causal graphs , author=. International Conference on Machine Learning , pages=. 2017 , organization=
2017
-
[65]
The Journal of Machine Learning Research , volume=
Experiment selection for causal discovery , author=. The Journal of Machine Learning Research , volume=
-
[66]
Journal of Machine Learning Research , volume=
Active learning of causal networks with intervention experiments and optimal designs , author=. Journal of Machine Learning Research , volume=
-
[67]
Artificial Intelligence and Statistics , pages=
Controlling selection bias in causal inference , author=. Artificial Intelligence and Statistics , pages=. 2012 , organization=
2012
-
[68]
Probabilistic and causal inference: The works of Judea Pearl , pages=
Recovering from selection bias in causal and statistical inference , author=. Probabilistic and causal inference: The works of Judea Pearl , pages=
-
[69]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Recovering causal effects from selection bias , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[70]
1977 , publisher=
Sample selection bias as a specification error (with an application to the estimation of labor supply functions) , author=. 1977 , publisher=
1977
-
[71]
Epidemiology , volume=
Marginal structural models and causal inference in epidemiology , author=. Epidemiology , volume=
-
[72]
2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS) , pages=
Efficient truncated statistics with unknown truncation , author=. 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS) , pages=. 2019 , organization=
2019
-
[73]
The Annals of mathematical statistics , pages=
Estimating the mean and variance of normal populations from singly truncated and doubly truncated samples , author=. The Annals of mathematical statistics , pages=. 1950 , publisher=
1950
-
[74]
2024 IEEE 65th Annual Symposium on Foundations of Computer Science (FOCS) , pages=
Efficient Statistics With Unknown Truncation, Polynomial Time Algorithms, Beyond Gaussians , author=. 2024 IEEE 65th Annual Symposium on Foundations of Computer Science (FOCS) , pages=. 2024 , organization=
2024
-
[75]
Econometrica , volume=
A conditional likelihood ratio test for structural models , author=. Econometrica , volume=. 2003 , publisher=
2003
-
[76]
PLoS genetics , volume=
Genetic sensitivity analysis: Adjusting for genetic confounding in epidemiological associations , author=. PLoS genetics , volume=. 2021 , publisher=
2021
-
[77]
Nature Genetics , volume=
Adjusting for genetic confounders in transcriptome-wide association studies improves discovery of risk genes of complex traits , author=. Nature Genetics , volume=. 2024 , publisher=
2024
-
[78]
2018 , publisher=
Machine learning: a practical approach on the statistical learning theory , author=. 2018 , publisher=
2018
-
[79]
Annals of the Institute of Statistical Mathematics , volume=
New approaches to statistical learning theory , author=. Annals of the Institute of Statistical Mathematics , volume=. 2003 , publisher=
2003
-
[80]
IEEE transactions on neural networks , volume=
An overview of statistical learning theory , author=. IEEE transactions on neural networks , volume=. 1999 , publisher=
1999
-
[81]
2004 , publisher=
The elements of statistical learning: data mining, inference, and prediction , author=. 2004 , publisher=
2004
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.