pith. machine review for the scientific record. sign in

arxiv: 2605.09300 · v1 · submitted 2026-05-10 · 📊 stat.ME

Recognition: no theorem link

Causal Stability Selection

Falco J. Bargagli-Stoffi, Omar Melikechi

Pith reviewed 2026-05-12 02:41 UTC · model grok-4.3

classification 📊 stat.ME
keywords causal inferencestability selectioneffect modificationfalse discovery controlconditional average treatment effectscross-fittingobservational data
0
0 comments X

The pith

Causal stability selection produces a set of treatment effect modifiers with an explicit finite-sample bound on expected false positives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a procedure to identify covariates that modify how a treatment affects an outcome. Current data-adaptive methods risk selecting spurious modifiers that fail to replicate because they lack finite-sample control over false discoveries. Causal stability selection integrates cross-fitted estimation of conditional average treatment effects with path stability selection. The resulting selection set carries a non-asymptotic bound on the expected number of false positives, and the selection probabilities converge to their ideal oracle values at the same rate as the underlying treatment effect estimator. This connection matters because it lets researchers discover effect modifiers in randomized trials or observational studies while maintaining explicit guarantees that hold for any finite sample size.

Core claim

Causal stability selection combines cross-fitted estimation of conditional average treatment effects with integrated path stability selection to produce a set of covariates that modify treatment effects, accompanied by an explicit bound on the expected number of false positives that holds in finite samples. Under standard causal identifying assumptions and regularity conditions on the base selector, the estimated selection probabilities converge to their oracle counterparts at the rate of the underlying treatment effect estimator. This establishes a direct connection between treatment effect estimation and effect modifier discovery.

What carries the argument

The causal stability selection algorithm that merges cross-fitted conditional average treatment effect estimation with integrated path stability selection to enforce the false-positive bound.

If this is right

  • The bound on expected false positives holds for any base treatment effect estimator that satisfies the required regularity conditions.
  • Selection probabilities converge to oracle values at the same rate as the treatment effect estimator.
  • The procedure applies equally to randomized experiments and observational studies once the identifying assumptions are met.
  • Non-asymptotic control removes reliance on large-sample approximations common in existing adaptive selection methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Applied researchers could rank candidate modifiers by their estimated stability probabilities to allocate follow-up resources more efficiently.
  • The explicit bound may help address replication concerns in fields that use effect-modifier searches to guide personalized interventions.
  • Similar stability-based bounds could be derived for other causal estimands such as mediation or principal strata effects.

Load-bearing premise

Standard causal identifying assumptions of consistency, no unmeasured confounding and positivity, together with regularity conditions on the base selector.

What would settle it

Apply the procedure to simulated data in which the true set of effect modifiers is known in advance and verify whether the realized number of false positives stays below the stated bound across repeated draws.

Figures

Figures reproduced from arXiv: 2605.09300 by Falco J. Bargagli-Stoffi, Omar Melikechi.

Figure 1
Figure 1. Figure 1: Performance of variable selection methods for effect modifier discovery. Top row: True positive rate (TPR), defined as the proportion of identified effect modifiers among all true effect modifiers, averaged over 200 simulation trials. Bottom row: False discovery rate (FDR), defined as the proportion of false discoveries among all selected covariates, averaged over 200 trials. Black diagonal dashed lines in… view at source ↗
Figure 2
Figure 2. Figure 2: Linear results: Confounding variables. Top and bottom rows show mean TPR and mean FDR (averaged over 200 trials); the diagonal dashed line indicates perfect nominal FDR control. Left: |C| = 0 (RCT). Middle: |C| = 5. Right: |C| = 10 [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Nonlinear results: Confounding variables. As in [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗
read the original abstract

Identifying covariates that modify treatment effects is a central problem in causal inference. Yet existing data-adaptive procedures do not provide finite-sample control over the expected number of false discoveries, risking spurious findings that fail to replicate. We introduce causal stability selection, an algorithm that combines cross-fitted estimation of conditional average treatment effects with integrated path stability selection. The method accommodates arbitrary treatment effect estimators and arbitrary base selectors, and produces a selection set with an explicit, non-asymptotic bound on the expected number of false positives. Under standard causal identifying assumptions and regularity conditions on the base selector, we prove that the estimated selection probabilities converge to their oracle counterparts at the rate of the underlying treatment effect estimator. This establishes a direct connection between treatment effect estimation and effect modifier discovery. We illustrate the method on a randomized trial in oncology and on observational data on maternal smoking and infant birthweight.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces causal stability selection, which combines cross-fitted estimation of conditional average treatment effects (CATE) with integrated path stability selection. The central claims are that the procedure yields a selection set with an explicit non-asymptotic bound on the expected number of false positives and that, under standard causal identifying assumptions together with regularity conditions on the base selector, the estimated selection probabilities converge to their oracle counterparts at the rate of the underlying treatment effect estimator. The method is illustrated on a randomized oncology trial and observational data on maternal smoking and infant birthweight.

Significance. If the non-asymptotic bound and convergence result hold, the work supplies a rare finite-sample guarantee for effect-modifier discovery in causal inference, directly linking the accuracy of CATE estimation to the reliability of selection. The flexibility to accommodate arbitrary treatment-effect estimators and base selectors, together with the explicit connection to oracle quantities, strengthens the practical utility of the approach for reproducible causal findings.

minor comments (3)
  1. The abstract and introduction refer to 'integrated path stability selection' without a one-sentence reminder of how the path is constructed from the base selector; a brief parenthetical would improve accessibility for readers outside the stability-selection literature.
  2. Notation for the selection probability (estimated versus oracle) is introduced in the main text but could be typeset more distinctly (e.g., via consistent use of hats or superscripts) to avoid momentary confusion when the convergence statement is first stated.
  3. The real-data illustrations would benefit from a short table or paragraph summarizing the selected modifiers, their estimated selection probabilities, and the implied false-positive bound for the chosen threshold; this would make the practical output of the method more immediately visible.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of the manuscript and for recommending minor revision. The referee's description accurately reflects the core contributions: the non-asymptotic false-positive bound for causal stability selection and the convergence of estimated selection probabilities to their oracle counterparts. No specific major comments were provided in the report, so we have no points requiring detailed rebuttal or revision at this stage. We will incorporate any minor editorial suggestions in the revised version.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper adapts the standard (non-self) stability selection bound on expected false positives to cross-fitted CATE estimates and proves convergence of selection probabilities to independently defined oracle quantities at the rate of the base treatment-effect estimator. Both the bound and the convergence result are stated under explicit regularity conditions on the base selector plus the usual causal identifying assumptions; neither step reduces by construction to a fitted parameter, a self-citation chain, or a redefinition of the target quantity. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard causal identifying assumptions and regularity conditions on the base selector; these are domain assumptions imported from the literature rather than derived here.

axioms (2)
  • domain assumption Standard causal identifying assumptions (consistency, no unmeasured confounding, positivity)
    Invoked to justify validity of CATE estimation and the selection procedure.
  • domain assumption Regularity conditions on the base selector
    Required for the convergence of selection probabilities to oracle values.

pith-pipeline@v0.9.0 · 5437 in / 1311 out tokens · 55910 ms · 2026-05-12T02:41:01.946344+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

299 extracted references · 299 canonical work pages

  1. [1]

    Genetic Epidemiology , volume=

    Stability selection for genome-wide association , author=. Genetic Epidemiology , volume=. 2011 , publisher=

  2. [2]

    Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining , pages=

    Xgboost: A scalable tree boosting system , author=. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining , pages=

  3. [3]

    2003 , publisher=

    Storey, John D , journal=. 2003 , publisher=

  4. [4]

    Journal of the American Statistical Association , volume=

    The adaptive lasso and its oracle properties , author=. Journal of the American Statistical Association , volume=. 2006 , publisher=

  5. [5]

    Journal of the American Statistical Association , volume=

    Integrated path stability selection , author=. Journal of the American Statistical Association , volume=. 2026 , publisher=

  6. [6]

    Journal of the Royal Statistical Society, Series B (Statistical Methodology) , volume=

    Stability selection , author=. Journal of the Royal Statistical Society, Series B (Statistical Methodology) , volume=. 2010 , publisher=

  7. [7]

    2013 , publisher=

    Concentration Inequalities: A Nonasymptotic Theory of Independence , author=. 2013 , publisher=

  8. [8]

    2019 , publisher=

    High-Dimensional Statistics: A Non-Asymptotic Viewpoint , author=. 2019 , publisher=

  9. [9]

    Annals of Statistics , volume=

    False discovery rate control with unknown null distribution: A data-splitting approach , author=. Annals of Statistics , volume=

  10. [10]

    Annals of Statistics , volume=

    Controlling the false discovery rate via knockoffs , author=. Annals of Statistics , volume=

  11. [11]

    Annals of Statistics , volume=

    Simultaneous high-probability bounds on the false discovery proportion in structured, regression and online settings , author=. Annals of Statistics , volume=

  12. [12]

    , title =

    Kim, Been and Khanna, Rajiv and Koyejo, Oluwasanmi O. , title =. Advances in Neural Information Processing Systems , year =

  13. [13]

    Artificial Intelligence , year =

    Miller, Tim , title =. Artificial Intelligence , year =

  14. [14]

    Journal of the Royal Statistical Society: Series B (Methodological) , volume=

    Regression shrinkage and selection via the lasso , author=. Journal of the Royal Statistical Society: Series B (Methodological) , volume=

  15. [15]

    Biometrics , volume=

    A general statistical framework for subgroup identification and comparative treatment scoring , author=. Biometrics , volume=

  16. [16]

    Journal of the Royal Statistical Society, Series B (Statistical Methodology) , volume=

    Model selection and estimation in regression with grouped variables , author=. Journal of the Royal Statistical Society, Series B (Statistical Methodology) , volume=

  17. [17]

    arXiv preprint arXiv:1705.08020 , year=

    Selective inference for effect modification via the lasso , author=. arXiv preprint arXiv:1705.08020 , year=

  18. [18]

    Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages=

    Interpretable decision sets: A joint framework for description and prediction , author=. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages=

  19. [19]

    Journal of Machine Learning Research , volume=

    A Bayesian framework for learning rule sets for interpretable classification , author=. Journal of Machine Learning Research , volume=

  20. [20]

    arXiv preprint arXiv:2502.07275 , year=

    Distilling heterogeneous treatment effects: Stable subgroup estimation in causal inference , author=. arXiv preprint arXiv:2502.07275 , year=

  21. [21]

    1996 , publisher=

    Weak Convergence and Empirical Processes: With Applications to Statistics , author=. 1996 , publisher=

  22. [22]

    2016 , publisher=

    Mathematical Foundations of Infinite-Dimensional Statistical Models , author=. 2016 , publisher=

  23. [23]

    Statistical Science , volume=

    Microarrays, empirical Bayes and the two-groups model , author=. Statistical Science , volume=

  24. [24]

    2010 , publisher=

    Large-scale inference: empirical Bayes methods for estimation, testing, and prediction , author=. 2010 , publisher=

  25. [25]

    Journal of the American Statistical Association , volume=

    Exploration of the false discovery rate in multiple testing under dependence , author=. Journal of the American Statistical Association , volume=

  26. [26]

    Biometrika , volume=

    The horseshoe estimator for sparse signals , author=. Biometrika , volume=

  27. [27]

    Journal of the American Statistical Association , volume=

    Oracle and adaptive compound decision rules for false discovery rate control , author=. Journal of the American Statistical Association , volume=

  28. [28]

    Annals of Statistics , volume=

    Asymptotic Bayes-optimality under sparsity of some multiple testing procedures , author=. Annals of Statistics , volume=

  29. [29]

    Annals of Statistics , volume=

    Needles and straw in a haystack: Posterior concentration for possibly sparse sequences , author=. Annals of Statistics , volume=. 2012 , publisher=

  30. [30]

    Electronic Journal of Statistics , volume=

    The horseshoe estimator: Posterior concentration around nearly black vectors , author=. Electronic Journal of Statistics , volume=

  31. [31]

    Journal of the American Statistical Association , volume=

    The Spike-and-Slab LASSO , author=. Journal of the American Statistical Association , volume=. 2018 , publisher=

  32. [32]

    Journal of the Royal Statistical Society, Series B (Statistical Methodology) , volume=

    Variable selection with error control: another look at stability selection , author=. Journal of the Royal Statistical Society, Series B (Statistical Methodology) , volume=. 2013 , publisher=

  33. [33]

    Annals of Statistics , volume=

    Asymptotically minimax empirical Bayes estimation of a sparse normal mean vector , author=. Annals of Statistics , volume=. 2014 , publisher=

  34. [34]

    Econometrics Journal , volume=

    Double/debiased machine learning for treatment and structural parameters , author=. Econometrics Journal , volume=

  35. [35]

    Biometrics , volume=

    Doubly robust estimation in missing data and causal inference models , author=. Biometrics , volume=

  36. [36]

    Biostatistics , volume=

    False discovery rates: a new deal , author=. Biostatistics , volume=

  37. [37]

    Journal of the Royal Statistical Society, Series B (Statistical Methodology) , volume=

    A direct approach to false discovery rates , author=. Journal of the Royal Statistical Society, Series B (Statistical Methodology) , volume=

  38. [38]

    Journal of the Royal Statistical Society, Series B (Statistical Methodology) , volume=

    The optimal discovery procedure: a new approach to simultaneous significance testing , author=. Journal of the Royal Statistical Society, Series B (Statistical Methodology) , volume=

  39. [39]

    2025 , publisher=

    Melikechi, Omar and Dunson, David B and Miller, Jeffrey W , journal=. 2025 , publisher=

  40. [40]

    Proceedings of the National Academy of Sciences , volume=

    Metalearners for estimating heterogeneous treatment effects using machine learning , author=. Proceedings of the National Academy of Sciences , volume=. 2019 , publisher=

  41. [41]

    Biometrika , volume=

    Quasi-oracle estimation of heterogeneous treatment effects , author=. Biometrika , volume=. 2021 , publisher=

  42. [42]

    American Journal of Epidemiology , volume=

    Doubly robust estimation of causal effects , author=. American Journal of Epidemiology , volume=. 2011 , publisher=

  43. [43]

    Statistical Science , volume=

    Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data , author=. Statistical Science , volume=. 2007 , publisher=

  44. [44]

    Biometrics , volume=

    Covariate selection with group lasso and doubly robust estimation of causal effects , author=. Biometrics , volume=. 2018 , publisher=

  45. [45]

    Journal of causal inference , volume=

    Targeted learning of the mean outcome under an optimal dynamic treatment rule , author=. Journal of causal inference , volume=. 2014 , publisher=

  46. [46]

    BMC bioinformatics , volume=

    Controlling false discoveries in high-dimensional situations: boosting with stability selection , author=. BMC bioinformatics , volume=. 2015 , publisher=

  47. [47]

    International Statistical Review , volume=

    Stable discovery of interpretable subgroups via calibration in causal studies , author=. International Statistical Review , volume=. 2020 , publisher=

  48. [48]

    Keith Battocchi and Eleanor Dillon and Maggie Hei and Greg Lewis and Paul Oka and Miruna Oprescu and Vasilis Syrgkanis , year=

  49. [49]

    Electronic Journal of Statistics , volume=

    Towards optimal doubly robust estimation of heterogeneous causal effects , author=. Electronic Journal of Statistics , volume=. 2023 , publisher=

  50. [50]

    arXiv preprint arXiv:2009.09036 , year=

    Causal rule ensemble: Interpretable discovery and inference of heterogeneous treatment effects , author=. arXiv preprint arXiv:2009.09036 , year=

  51. [51]

    Journal of Computational and Graphical Statistics , volume=

    Efficient augmentation and relaxation learning for individualized treatment rules using observational data , author=. Journal of Computational and Graphical Statistics , volume=. 2017 , publisher=

  52. [52]

    Nature Machine Intelligence , volume=

    Stable learning establishes some common ground between causal inference and machine learning , author=. Nature Machine Intelligence , volume=. 2022 , publisher=

  53. [53]

    Journal of the Royal Statistical Society, Series B (Statistical Methodology) , volume=

    Regularization and variable selection via the elastic net , author=. Journal of the Royal Statistical Society, Series B (Statistical Methodology) , volume=. 2005 , publisher=

  54. [54]

    Machine learning , volume=

    Random forests , author=. Machine learning , volume=. 2001 , publisher=

  55. [55]

    Journal of the American Statistical Association , volume=

    Estimation and inference of heterogeneous treatment effects using random forests , author=. Journal of the American Statistical Association , volume=. 2018 , publisher=

  56. [56]

    2015 , publisher=

    Causal inference in statistics, social, and biomedical sciences , author=. 2015 , publisher=

  57. [57]

    American Economic Review , year =

    Abadie, Alberto and Gardeazabal, Javier , title =. American Economic Review , year =

  58. [58]

    The Review of Economic Studies , volume=

    Semiparametric difference-in-differences estimators , author=. The Review of Economic Studies , volume=. 2005 , publisher=

  59. [59]

    Econometrica , volume=

    Large sample properties of matching estimators for average treatment effects , author=. Econometrica , volume=. 2006 , publisher=

  60. [60]

    Journal of the American Statistical Association , volume=

    Synthetic control methods for comparative case studies: Estimating the effect of California's tobacco control program , author=. Journal of the American Statistical Association , volume=. 2010 , publisher=

  61. [61]

    2011 , institution=

    Robust Inference for Misspecified Models Conditional on Covariates , author=. 2011 , institution=

  62. [62]

    Journal of Business & Economic Statistics , volume=

    Bias-corrected matching estimators for average treatment effects , author=. Journal of Business & Economic Statistics , volume=. 2011 , publisher=

  63. [63]

    Econometrica , volume=

    On the failure of the bootstrap for matching estimators , author=. Econometrica , volume=. 2008 , publisher=

  64. [64]

    2014 , institution=

    Finite Population Causal Standard Errors , author=. 2014 , institution=

  65. [65]

    American Journal of Political Science , pages=

    Comparative Politics and the Synthetic Control Method , author=. American Journal of Political Science , pages=

  66. [66]

    Clustering as a design problem , author=

  67. [67]

    Econometrica , volume=

    Matching on the estimated propensity score , author=. Econometrica , volume=. 2016 , publisher=

  68. [68]

    2017 , institution=

    When should you adjust standard errors for clustering? , author=. 2017 , institution=

  69. [69]

    J , volume=

    Randomization , author=. J , volume=

  70. [70]

    Annual Review of Economics , volume=

    Econometric methods for program evaluation , author=. Annual Review of Economics , volume=. 2018 , publisher=

  71. [71]

    Journal of Economic Literature , year=

    Using synthetic controls: Feasibility, data requirements, and methodological aspects , author=. Journal of Economic Literature , year=

  72. [72]

    Econometrica , volume=

    Sampling-Based versus Design-Based Uncertainty in Regression Analysis , author=. Econometrica , volume=. 2020 , publisher=

  73. [73]

    American Economic Review: Insights , volume=

    Statistical Nonsignificance in Empirical Economics , author=. American Economic Review: Insights , volume=

  74. [74]

    Journal of the American Statistical Association , pages=

    Robust post-matching inference , author=. Journal of the American Statistical Association , pages=. 2021 , publisher=

  75. [75]

    Advances in Neural Information Processing Systems , volume=

    Variance reduction in bipartite experiments through correlation clustering , author=. Advances in Neural Information Processing Systems , volume=

  76. [76]

    Handbook of econometrics , volume=

    Econometric evaluation of social programs, part III: Distributional treatment effects, dynamic treatment effects, dynamic discrete choice, and general equilibrium policy evaluation , author=. Handbook of econometrics , volume=. 2007 , publisher=

  77. [77]

    Econometrica , volume=

    The nonparametric identification of treatment effects in duration models , author=. Econometrica , volume=. 2003 , publisher=

  78. [78]

    1986 , publisher=

    On the covariance structure of earnings and hours changes , author=. 1986 , publisher=

  79. [79]

    Econometrica: Journal of the Econometric Society , pages=

    On the covariance structure of earnings and hours changes , author=. Econometrica: Journal of the Econometric Society , pages=. 1989 , publisher=

  80. [80]

    Handbook of econometrics , volume=

    Econometric tools for analyzing market outcomes , author=. Handbook of econometrics , volume=. 2007 , publisher=

Showing first 80 references.