Sample size and power calculations for causal inference of observational studies

Bo Liu; Chengxin Yang; Fan Li

arxiv: 2501.11181 · v5 · pith:5R4EF3EInew · submitted 2025-01-19 · 📊 stat.ME

Sample size and power calculations for causal inference of observational studies

Bo Liu , Chengxin Yang , Fan Li This is my paper

Pith reviewed 2026-05-23 04:54 UTC · model grok-4.3

classification 📊 stat.ME

keywords sample size calculationpower analysisobservational studiescausal inferenceinverse probability weightingBhattacharyya coefficientpropensity scoreconfounding

0 comments

The pith

To calculate the minimal sample size for an observational causal study, it suffices to know two parameters quantifying the confounder-treatment and confounder-outcome associations in addition to standard randomized trial inputs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops analytical formulas for sample size and power calculations in observational studies using inverse probability weighting for the average treatment effect. It decomposes the variance into propensity score distribution, potential outcome distribution, and their correlation. The key finding is that only two additional parameters are needed beyond those for randomized trials: the Bhattacharyya coefficient for covariate overlap and a sensitivity parameter bounded by the R-squared of the outcome regression. This makes power analysis practical for observational data without requiring full knowledge of multivariate covariate distributions. Sympathetic readers would care because designing studies with adequate power is essential, and observational studies often face challenges in specifying all components.

Core claim

By analyzing the variance of an inverse probability weighting estimator of the average treatment effect, we decompose the power calculation into three components: propensity score distribution, potential outcome distribution, and their correlation. We show that to determine the minimal sample size of an observational study, in addition to the standard inputs in the power calculation of randomized trials, it is sufficient to have two parameters, which quantify the strength of the confounder-treatment and the confounder-outcome association, respectively. For the former, we propose using the Bhattacharyya coefficient, which measures the covariate overlap and, together with the treatment比例,leads

What carries the argument

Variance decomposition of the inverse probability weighting estimator for average treatment effect, with propensity score distribution identified from Bhattacharyya coefficient plus treatment proportion and outcome correlation bounded by R-squared of outcome on covariates.

If this is right

Minimal sample size follows from standard randomized trial inputs plus the two parameters.
Propensity score distribution is uniquely identifiable from the Bhattacharyya coefficient and treatment proportion.
The sensitivity parameter for the outcome association is bounded by the R-squared statistic without needing full covariate distributional assumptions.
The procedure applies under a parametric propensity score model and semiparametric restricted mean outcome model.
An R package and online calculator implement the formulas.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Pilot data could be used to estimate the Bhattacharyya coefficient for study planning.
The two-parameter approach may combine with existing sensitivity analysis techniques in causal inference.
Similar variance decompositions could be derived for estimators other than inverse probability weighting.
Empirical checks comparing predicted versus observed power in completed studies would test practical accuracy.

Load-bearing premise

The parametric propensity score model is correctly specified and the outcome association can be bounded using only the R-squared statistic from regressing outcome on covariates.

What would settle it

In a dataset with known confounder strengths, compute the actual variance of the IPW estimator directly and compare it to the variance predicted by the formula that uses only the two proposed parameters; a systematic mismatch would falsify the claim that these two suffice.

read the original abstract

This paper investigates the theoretical foundation and develops analytical formulas for sample size and power calculations for causal inference with observational data. By analyzing the variance of an inverse probability weighting estimator of the average treatment effect, we decompose the power calculation into three components: propensity score distribution, potential outcome distribution, and their correlation. We show that to determine the minimal sample size of an observational study, in addition to the standard inputs in the power calculation of randomized trials, it is sufficient to have two parameters, which quantify the strength of the confounder-treatment and the confounder-outcome association, respectively. For the former, we propose using the Bhattacharyya coefficient, which measures the covariate overlap and, together with the treatment proportion, leads to a uniquely identifiable and easily computable propensity score distribution. For the latter, we propose a sensitivity parameter bounded by the R-squared statistic of the regression of the outcome on covariates. Our procedure relies on a parametric propensity score model and a semiparametric restricted mean outcome model, but does not require distributional assumptions on the multivariate covariates. We develop an associated R package PSpower and an online calculator.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Two extra parameters (BC for overlap, R2-bound sensitivity) plus a parametric PS model let you power IPW observational studies, but the identifiability claim depends on that model choice.

read the letter

The punchline is that this paper gives explicit formulas and software to size observational studies for IPW-based ATE estimation by adding only two user-specified parameters beyond the usual RCT inputs: a Bhattacharyya coefficient to capture confounder-treatment strength and an R-squared-bounded sensitivity parameter for the confounder-outcome link. The variance is decomposed into propensity, outcome, and correlation pieces, and the claim is that these two numbers plus treatment proportion suffice once a parametric PS model is fixed. They also release an R package and online calculator, which is the kind of deliverable that actually gets used. That is the concrete advance over standard power literature. The approach is upfront about relying on a parametric propensity score model and a semiparametric restricted-mean outcome model, and it avoids needing the full multivariate covariate distribution. The BC-plus-pi step is presented as yielding a unique PS distribution under that parametric restriction, which is a reasonable way to make the problem tractable. The R2 bound for the outcome component is a sensible, conservative choice that stays within observable quantities. The main limitation is exactly the one flagged in the stress test: the parametric family on the PS distribution is doing the heavy lifting for identifiability, since BC alone is only one functional and does not pin down the full weight distribution without it. If the chosen family is misspecified, the computed variance and therefore the minimal n will be off. The abstract does not report numerical checks or simulation studies that would show how sensitive the results are to that choice, so the practical robustness remains to be verified. The derivations themselves are not visible here, which keeps the soundness assessment provisional. This is aimed at biostatisticians and epidemiologists who plan observational studies and want a structured alternative to purely ad-hoc sensitivity analyses. It is worth sending to peer review because it fills a genuine applied gap with usable formulas and code; referees can check the derivations and ask for validation simulations without the work being fundamentally broken.

Referee Report

2 major / 2 minor

Summary. The manuscript develops analytical formulas for sample size and power calculations for estimating the ATE via IPW in observational studies. It decomposes the IPW variance into propensity-score distribution, potential-outcome distribution, and their correlation components. The central claim is that, beyond the usual RCT inputs, only two additional parameters suffice: the Bhattacharyya coefficient (measuring confounder-treatment association and, with treatment proportion, yielding a uniquely identifiable PS distribution under a parametric PS model) and a sensitivity parameter for confounder-outcome association bounded by the R² from regressing the outcome on covariates. The procedure uses a parametric PS model and semiparametric restricted-mean outcome model without distributional assumptions on the multivariate covariates; an R package and online calculator are provided.

Significance. If the identifiability and bounding arguments hold, the work supplies a practical, low-input framework for power analysis in observational causal inference, which is a frequent practical need. The explicit variance decomposition and software release are strengths that would aid reproducibility and adoption. The avoidance of full covariate-distribution assumptions is a positive feature relative to simulation-based alternatives.

major comments (2)

[Abstract and PS-distribution derivation] Abstract (final paragraph) and the section deriving the PS distribution: the claim that the Bhattacharyya coefficient plus treatment proportion 'leads to a uniquely identifiable' PS distribution under the parametric PS model is load-bearing for the two-parameter sufficiency result. Because the BC equals a single functional E[sqrt(e(X)(1-e(X)))] (normalized by π), uniqueness of the full law of the weights 1/e(X) and 1/(1-e(X)) requires an explicit statement of the low-dimensional parametric family imposed on the distribution of e(X) itself; the manuscript's statement that no assumptions are made on the multivariate law of X leaves open whether this family is an additional modeling choice or is derived.
[Variance decomposition] Variance-decomposition section (around the IPW variance formula): the correlation term between the PS weights and the potential outcomes must be shown to be either bounded or eliminated by the two parameters without introducing further user inputs; otherwise the reduction to exactly two extra parameters does not follow from the decomposition alone.

minor comments (2)

[Abstract] The abstract states reliance on 'a parametric propensity score model' but does not name the family (e.g., logistic, beta, etc.); adding this detail would improve immediate readability.
[Software and examples] Figure captions or the software section could include a small numerical example showing how the two parameters translate into a concrete minimal n, to illustrate the formulas.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive major comments. Both points identify areas where additional explicit statements and derivations would strengthen the manuscript. We agree that clarifications are warranted and will revise accordingly.

read point-by-point responses

Referee: [Abstract and PS-distribution derivation] Abstract (final paragraph) and the section deriving the PS distribution: the claim that the Bhattacharyya coefficient plus treatment proportion 'leads to a uniquely identifiable' PS distribution under the parametric PS model is load-bearing for the two-parameter sufficiency result. Because the BC equals a single functional E[sqrt(e(X)(1-e(X)))] (normalized by π), uniqueness of the full law of the weights 1/e(X) and 1/(1-e(X)) requires an explicit statement of the low-dimensional parametric family imposed on the distribution of e(X) itself; the manuscript's statement that no assumptions are made on the multivariate law of X leaves open whether this family is an additional modeling choice or is derived.

Authors: We agree that the uniqueness result requires an explicit statement of the parametric family on the distribution of e(X). The manuscript already states reliance on a parametric propensity score model; under this model the BC together with the treatment proportion π uniquely determines the parameters of the induced distribution of e(X) (and hence the law of the IPW weights). To remove any ambiguity, we will revise the relevant section and abstract to name the specific low-dimensional parametric family for e(X) (derived directly from the parametric PS model) and to clarify that no further distributional assumptions on the multivariate law of X are introduced beyond those already declared. revision: yes
Referee: [Variance decomposition] Variance-decomposition section (around the IPW variance formula): the correlation term between the PS weights and the potential outcomes must be shown to be either bounded or eliminated by the two parameters without introducing further user inputs; otherwise the reduction to exactly two extra parameters does not follow from the decomposition alone.

Authors: The referee correctly notes that the correlation term must be controlled by the two parameters. The R²-bounded sensitivity parameter for the confounder-outcome association is intended to bound the feasible range of this correlation (via its effect on the covariance between weights and potential outcomes) without additional user-specified inputs. We will add an explicit bounding argument or derivation in the variance-decomposition section demonstrating that the correlation is indeed governed by these two quantities alone, thereby confirming that exactly two extra parameters suffice. revision: yes

Circularity Check

0 steps flagged

No significant circularity; parameters are external inputs under explicit parametric assumption

full rationale

The paper's derivation begins from the standard IPW variance formula for the ATE and decomposes it into propensity-score, outcome, and correlation components. It explicitly states reliance on a parametric propensity score model plus the Bhattacharyya coefficient (plus treatment proportion) to obtain an identifiable PS distribution, and on a semiparametric restricted-mean outcome model with an R-squared-bounded sensitivity parameter. Both quantities are introduced as user-supplied external inputs for the sample-size formula rather than quantities fitted from the same data used to estimate the treatment effect. No load-bearing step reduces the target sample size to a fitted constant or to a self-citation chain by construction; the central claim therefore remains independent of its own outputs.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on two modeling assumptions that are not derived from data and on two user-chosen parameters that replace full knowledge of the covariate and outcome distributions.

free parameters (2)

Bhattacharyya coefficient
User-specified scalar that, together with treatment proportion, is asserted to determine the entire propensity-score distribution under the parametric model.
confounder-outcome sensitivity parameter
User-specified scalar bounded above by the R-squared of outcome-on-covariates regression; enters the outcome-distribution component of the variance.

axioms (2)

domain assumption parametric propensity score model
Invoked to guarantee that the Bhattacharyya coefficient plus treatment proportion uniquely identifies the propensity-score distribution (abstract, paragraph on propensity score distribution).
domain assumption semiparametric restricted mean outcome model
Invoked to express the outcome component of the variance without requiring a full joint distribution on the covariates (abstract, final paragraph).

pith-pipeline@v0.9.0 · 5723 in / 1651 out tokens · 36721 ms · 2026-05-23T04:54:36.769351+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Estimator-Aligned Prospective Sample Size Determination for Designs Using Inverse Probability of Treatment Weighting
stat.ME 2026-04 unverdicted novelty 7.0

A GEE-based stacked M-estimation framework merges propensity score and marginal structural models to directly compute the large-sample variance of the IPTW estimator from pilot data for prospective sample size plannin...
Externally Controlled Trials: A Review of Design and Borrowing Through a Causal Lens
stat.ME 2026-05 unverdicted novelty 1.0

A review organizes externally controlled trial methodology through causal estimands and identifiability assumptions for single-arm and hybrid designs with borrowing strategies.