Recognition: 3 theorem links
· Lean TheoremThe Illusion of Learning from Observational Data: An Empirical Bayes Perspective
Pith reviewed 2026-05-10 18:24 UTC · model grok-4.3
The pith
Calibration studies identify the distribution of biases in observational data, allowing empirical Bayes shrinkage to recover causal effects consistently.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Absent prior information about bias, observational results cannot meaningfully contribute to the estimation of a causal parameter. Calibration studies that target a causal effect known a priori to be zero identify the distribution of observational bias. This distribution then allows observational studies to inform the estimation of causal parameters via empirical Bayes shrinkage. With an increasing number of calibration and observation studies, both the bias distribution and the causal effect can be consistently recovered.
What carries the argument
Empirical Bayes shrinkage that uses a bias distribution estimated from calibration studies where the true causal effect is known to be zero.
If this is right
- Observational studies can contribute to causal estimation once paired with calibration data that reveals the bias distribution.
- Both the bias distribution and the causal effect become consistently recoverable as the number of calibration and observational studies increases.
- Empirical Bayes shrinkage produces improved point estimates and uncertainty quantification for causal parameters from observational data.
- The approach applies in settings where semi-synthetic data from real experiments can be used to validate performance.
Where Pith is reading between the lines
- Fields with frequent null-effect experiments, such as certain policy evaluations, could systematically collect calibration data to make large observational archives more usable for causal questions.
- The method suggests a practical design choice: embed null-effect probes within observational programs to anchor the bias distribution without additional randomized trials.
- Extensions could examine how the number and design of calibration studies affect the rate at which causal estimates converge.
Load-bearing premise
The distribution of biases learned from calibration studies where the causal effect is known to be zero is the same as the bias distribution affecting the target observational studies of interest.
What would settle it
A simulation in which the true bias distribution in the target observational studies differs from the one recovered from calibration studies would cause the estimated causal effects to fail to converge to their true values as the number of studies grows.
Figures
read the original abstract
Randomized experiments have long been the gold standard for scientists seeking to learn about cause and effect. When randomized experiments are infeasible, scientists often resort to observational studies, which are widely available and often large but rely on untestable assumptions that, when violated, may result in biased estimates. Uncertainty about bias leads to a phenomenon known as the illusion of learning from observational research (Gerber, Green and Kaplan, 2004a): absent prior information about bias, observational results cannot meaningfully contribute to the estimation of a causal parameter. To shatter the illusion, we take an empirical Bayes perspective. We show that the distribution of observational biases can be learned from calibration studies-experiments that target a causal effect that is known a priori to be zero. Calibration identifies the distribution of observational bias and allows observational studies to inform the estimation of causal parameters via empirical Bayes shrinkage. We formalize the illusion phenomenon in an empirical Bayes setting and show that, with an increasing number of calibration and observation studies, both the bias distribution and the causal effect can be consistently recovered. We illustrate our method through a simulation study and a semi-synthetic application based on Ferraro and Miranda (2013)'s water-usage experiment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that the 'illusion of learning' from observational data—where untestable bias prevents meaningful contribution to causal parameter estimation—can be overcome via an empirical Bayes approach. Calibration studies (with known zero causal effects) are used to identify the distribution of observational biases, enabling shrinkage of observational estimates; with growing numbers of calibration and observational studies, both the bias distribution and causal effects are consistently recovered. The approach is formalized theoretically and illustrated with simulations plus a semi-synthetic application based on Ferraro and Miranda (2013).
Significance. If the core assumptions hold, the work offers a principled statistical framework for integrating abundant observational data with targeted calibration experiments to improve causal inference, directly addressing a known limitation in the literature. Strengths include the explicit EB formalization of the illusion phenomenon, the asymptotic consistency results, and the use of independent calibration data to ground the bias prior (avoiding direct circularity). The simulation and semi-synthetic checks provide initial empirical grounding.
major comments (2)
- [§2–3] §2–3 (setup and assumptions): The central consistency claim (recovering both bias distribution and causal effect as the number of studies → ∞) rests on the assumption that the bias distribution learned from calibration studies (true effect = 0) is identical to the bias distribution operating in the target observational studies. This transportability assumption is stated without justification, sensitivity analysis, or discussion of plausible violations (e.g., differences in selection, measurement error, or context between calibration and target studies). If violated, the EB shrinkage step and the claimed recovery both fail even with infinite data.
- [§4] §4 (consistency results): The theorems establish consistency under the maintained assumptions, but the paper does not examine finite-sample behavior, the rate of convergence, or robustness to mild misspecification of the bias distribution. Given that the practical value hinges on moderate numbers of studies, these omissions limit assessment of whether the method delivers reliable gains in realistic settings.
minor comments (2)
- [Application section] The semi-synthetic application section would benefit from a clearer description of how the Ferraro and Miranda (2013) data were modified to create the observational and calibration components.
- [Throughout] Notation for the bias distribution and the EB posterior could be introduced earlier and used consistently to improve readability.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive comments, which help clarify the scope and limitations of our empirical Bayes framework. We address each major comment below and describe the revisions planned for the next version of the manuscript.
read point-by-point responses
-
Referee: [§2–3] The central consistency claim (recovering both bias distribution and causal effect as the number of studies → ∞) rests on the assumption that the bias distribution learned from calibration studies (true effect = 0) is identical to the bias distribution operating in the target observational studies. This transportability assumption is stated without justification, sensitivity analysis, or discussion of plausible violations (e.g., differences in selection, measurement error, or context between calibration and target studies). If violated, the EB shrinkage step and the claimed recovery both fail even with infinite data.
Authors: We agree that transportability of the bias distribution is a strong identifying assumption. The manuscript posits that calibration studies can be chosen to share the same bias-generating mechanisms as the target observational studies (e.g., comparable measurement protocols or selection processes). This mirrors standard transportability assumptions in causal inference and meta-analysis. We acknowledge that the current draft offers limited explicit justification and no sensitivity analysis. In the revision we will expand the assumptions section (§2–3) with additional discussion of when the assumption is plausible, add a subsection on potential violations (including differences in selection, measurement error, or context), and include a sensitivity analysis that perturbs the bias distribution to illustrate the consequences of violations. revision: yes
-
Referee: [§4] The theorems establish consistency under the maintained assumptions, but the paper does not examine finite-sample behavior, the rate of convergence, or robustness to mild misspecification of the bias distribution. Given that the practical value hinges on moderate numbers of studies, these omissions limit assessment of whether the method delivers reliable gains in realistic settings.
Authors: The consistency theorems in §4 are asymptotic results that hold as the number of studies diverges. The existing simulation study already reports performance for finite numbers of studies. To address the concern more directly, we will expand the simulation section with new experiments focused on moderate numbers of studies (e.g., 10–100), provide a brief discussion of the convergence rates implied by the theoretical analysis, and add robustness checks that introduce mild misspecification of the bias distribution to evaluate sensitivity in finite samples. revision: yes
Circularity Check
No significant circularity; calibration studies supply independent identification of the bias distribution.
full rationale
The derivation proceeds by positing a shared bias distribution across calibration studies (where the causal effect is known to be zero by design) and target observational studies, then applying standard empirical Bayes shrinkage and consistency arguments as the number of studies grows. The bias distribution is estimated from data external to the target observational studies, so the shrinkage step is not a fit to the quantities being corrected. No equation reduces a claimed prediction to a fitted input by construction, no uniqueness theorem is imported from the authors' own prior work, and the formalization of the illusion phenomenon builds on but does not tautologically rest upon the 2004 citation. The transportability assumption is stated explicitly rather than derived from the target data itself.
Axiom & Free-Parameter Ledger
free parameters (1)
- bias distribution
axioms (1)
- domain assumption The bias distribution learned from zero-effect calibration studies applies to the observational studies whose causal effects are being estimated.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclearWe show that the distribution of observational biases can be learned from calibration studies—experiments that target a causal effect that is known a priori to be zero. ... fit μ̂,γ̂² by maximizing the marginal likelihood of the calibration studies
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclearTheorem 4 (Calibration shatters the illusion). ... R(θ̂CEB,θ⋆)→0 as J,K→∞
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclearAssumption that the bias distribution identified from calibration studies ... is identical to the bias distribution operating in the target observational studies
Forward citations
Cited by 1 Pith paper
-
Empirical Bayes Rebiasing
Empirical Bayes rebiasing learns the bias distribution from paired noisy estimates to produce shorter calibrated intervals than full debiasing while maintaining coverage.
Reference graph
Works this paper leans on
-
[1]
barticle [author] Abadie , A. A. Imbens , G. W. G. W. ( 2012 ). A martingale representation for matching estimators . Journal of the American Statistical Association 107 833-843 . 10.1080/01621459.2012.682537 barticle
-
[2]
, Horvath , Hacsi T H
barticle [author] Anglemyer , Andrew A. , Horvath , Hacsi T H. T. Bero , Lisa L. ( 2014 ). Healthcare outcomes assessed with observational study designs compared with those assessed in randomized trials . Cochrane database of systematic reviews 4 . barticle
2014
-
[3]
bbook [author] Angrist , Joshua D J. D. Pischke , Jorn-Steffen J.-S. ( 2009 ). Mostly Harmless Econometrics: An Empiricist's Companion . Princeton University Press . bbook
2009
-
[4]
barticle [author] Arceneaux , Kevin K. , Gerber , Alan S. A. S. Green , Donald P. D. P. ( 2010 ). A cautionary note on the use of matching to estimate causal effects: A n empirical example comparing matching estimates to an experimental benchmark . Sociological Methods and Research 39 . 10.1177/0049124110378098 barticle
-
[5]
Nickerson , David W
binproceedings [author] Arceneaux , Kevin K. Nickerson , David W. D. ( 2006 ). Put Your Money Where Your Mouth Is: A Field Experiment on Political Persuasion . In Annals of Voting and Elections Research . binproceedings
2006
-
[6]
, Imbens , Guido G
barticle [author] Athey , Susan S. , Imbens , Guido G. Wager , Stefan S. ( 2020 ). Approximate Residual Balancing: Debiased Inference of Average Treatment Effects in High Dimensions . Journal of the Royal Statistical Society Series B 82 597--607 . barticle
2020
-
[7]
barticle [author] Benson , Kjell K. Hartz , Arthur J. A. J. ( 2000 ). A Comparison of Observational Studies and Randomized, Controlled Trials . New England Journal of Medicine 342 . 10.1056/nejm200006223422506 barticle
-
[8]
( 2013 )
bbook [author] Billingsley , Patrick P. ( 2013 ). Convergence of Probability Measures . John Wiley & Sons . bbook
2013
-
[9]
bbook [author] Bloom , Howard S H. S. ( 2005 ). Learning More from Social Experiments: Evolving Analytic Approaches . Russell Sage Foundation . bbook
2005
-
[10]
bbook [author] Brady , Henry E H. E. Collier , David D. ( 2010 ). Rethinking Social Inquiry: Diverse Tools, Shared Standards . Bloomsbury Publishing PLC . bbook
2010
-
[11]
Hunter , Albert A
bbook [author] Brewer , John J. Hunter , Albert A. ( 2006 ). Foundations of Multimethod Research: Synthesizing Styles . Sage . bbook
2006
-
[12]
barticle [author] Cole , Stephen R. S. Stuart , Elizabeth A. E. ( 2010 ). Generalizing Evidence from Randomized Trials to Target Populations: The Actg 320 Trial . American Journal of Epidemiology 172 107--115 . barticle
2010
-
[13]
, Shah , Nirav N
bincollection [author] Concato , John J. , Shah , Nirav N. Horwitz , Ralph I R. I. ( 2017 ). Randomized, controlled trials, observational studies, and the hierarchy of research designs . In Research Ethics 207--212 . Routledge . bincollection
2017
-
[14]
bbook [author] Creswell , John W J. W. Creswell , J David J. D. ( 2017 ). Research Design: Qualitative, Quantitative, and Mixed Methods Approaches . Sage publications . bbook
2017
-
[15]
barticle [author] Curtis , Jeffrey R J. R. , Larson , Joseph C J. C. , Delzell , Elizabeth E. , Brookhart , Maurice Alan M. A. , Cadarette , Suzanne M S. M. , Chlebowski , Rowan R. , Judd , Suzanne S. , Safford , Monika M. , Solomon , Daniel H D. H. LaCroix , Andrea Z A. Z. ( 2011 ). Placebo adherence, clinical outcomes, and mortality in the women's healt...
2011
-
[16]
barticle [author] Dehejia , Rajeev H. R. H. Wahba , Sadek S. ( 1999 ). Causal Effects in Non-Experimental Studies: Reevaluating the Evaluation of Training Programs . Journal of the American Statistical Association 94 1053--1062 . barticle
1999
-
[17]
( 2024 )
bbook [author] Ding , Peng P. ( 2024 ). A First Course in Causal Inference . Chapman and Hall/CRC . bbook
2024
-
[18]
( 2012 )
bbook [author] Efron , Bradley B. ( 2012 ). Large-scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction 1 . Cambridge University Press . bbook
2012
-
[19]
barticle [author] Efron , Bradley B. ( 2014 ). Two modeling strategies for empirical B ayes estimation . Statistical Science 29 . 10.1214/13-STS455 barticle
-
[20]
barticle [author] Ferraro , Paul J. P. J. Miranda , Juan Jos \'e J. J. ( 2013 ). Heterogeneous treatment effects and mechanisms in information-based environmental policies: Evidence from a large-scale field experiment . Resource and Energy Economics 35 356--379 . barticle
2013
-
[21]
Hill , Jennifer J
bbook [author] Gelman , Andrew A. Hill , Jennifer J. ( 2006 ). Data Analysis Using Regression and Multilevel/Hierarchical Models . Cambridge University Press . bbook
2006
-
[23]
binbook [author] Gerber , Alan S. A. S. , Green , Donald P. D. P. Kaplan , Edward H. E. H. ( 2004 b). The illusion of learning from observational research . 10.1017/CBO9780511492174.012 binbook
-
[24]
barticle [author] Gerber , Alan S A. S. , Green , Donald P D. P. , Kaplan , Edward H E. H. Kern , Holger L H. L. ( 2010 ). Baseline, placebo, and treatment: E fficient estimation for three-group experiments . Political Analysis 18 297--315 . barticle
2010
-
[25]
barticle [author] Gordon , Brett R B. R. , Zettelmeyer , Florian F. , Bhargava , Neha N. Chapsky , Dan D. ( 2019 ). A comparison of approaches to advertising measurement: E vidence from big field experiments at F acebook . Marketing Science 38 193--225 . barticle
2019
-
[26]
barticle [author] Guo , Xinyu X. et al. ( 2021 a). Control Variate Methods for Combining Experimental and Observational Data in Causal Inference . Journal of Causal Inference 9 123--145 . barticle
2021
-
[27]
barticle [author] Guo , Wenshuo W. , Wang , Serena S. , Ding , Peng P. , Wang , Yixin Y. Jordan , Michael I M. I. ( 2021 b). Multi-source causal inference using control variates . arXiv preprint arXiv:2103.16689 . barticle
-
[28]
barticle [author] Hirano , K. K. , Imbens , G. W. G. W. Ridder , G. G. ( 2003 ). Efficient estimation of average treatment effects using the estimated propensity score . Econometrica 71 1161-1189 . 10.1111/1468-0262.00442 barticle
-
[29]
bbook [author] Ignatiadis , N. N. Sen , B. B. ( 2025 ). Empirical B ayes: From H erbert R obbins to Modern Theory and Applications . bbook
2025
-
[30]
bbook [author] Imbens , Guido W G. W. Rubin , Donald B D. B. ( 2015 ). Causal Inference in Statistics, Social, and Biomedical Sciences . Cambridge University Press . bbook
2015
-
[31]
, Kallus , Nathan N
barticle [author] Imbens , Guido G. , Kallus , Nathan N. , Mao , Xiaojie X. Wang , Yuhao Y. ( 2025 ). Long-term causal inference under persistent confounding via data combination . Journal of the Royal Statistical Society Series B: Statistical Methodology 87 362--388 . barticle
2025
-
[32]
barticle [author] Jiang , W. W. Zhang , C. H. C. H. ( 2009 ). General maximum likelihood empirical B ayes estimation of normal means . Annals of Statistics 37 1647-1684 . 10.1214/08-AOS638 barticle
-
[33]
barticle [author] Kiefer , J. J. Wolfowitz , J. J. ( 1956 ). Consistency of the Maximum Likelihood Estimator in the Presence of Infinitely Many Incidental Parameters . The Annals of Mathematical Statistics 27 887-906 . 10.1214/AOMS/1177728066 barticle
-
[34]
barticle [author] LaLonde , Robert J. R. ( 1986 ). Evaluating the Econometric Evaluations of Training Programs with Experimental Data . American Economic Review 76 604--620 . barticle
1986
-
[35]
barticle [author] Lin , Zhexiao Z. , Bickel , Peter J P. J. Ding , Peng P. ( 2026 ). Introducing the b-value: combining unbiased and biased estimators from a sensitivity analysis perspective . arXiv preprint arXiv:2602.16310 . barticle
-
[36]
, Tchetgen , Eric Tchetgen E
barticle [author] Lipsitch , Marc M. , Tchetgen , Eric Tchetgen E. T. Cohen , Ted T. ( 2010 ). Negative controls: a tool for detecting confounding and bias in observational studies . Epidemiology 21 383-388 . barticle
2010
-
[37]
, Tchetgen Tchetgen , Eric J
barticle [author] Miao , Xian X. , Tchetgen Tchetgen , Eric J. E. J. et al. ( 2018 ). Identification of Causal Effects with Auxiliary Variables: The Role of Negative Controls . Journal of the American Statistical Association 113 172--186 . barticle
2018
-
[38]
( 1923 )
barticle [author] Neyman , Jerzy J. ( 1923 ). On the application of probability theory to agricultural experiments. E ssay on principles . Ann. Agricultural Sciences 1--51 . barticle
1923
-
[39]
( 1956 )
binproceedings [author] Robbins , Herbert H. ( 1956 ). An empirical B ayes approach to statistics . In Proceedings of the 3rd Berkeley Symposium on Mathematical Statistics and Probability 157--163 . binproceedings
1956
-
[40]
van der Laan , M J M
barticle [author] Rose , S S. van der Laan , M J M. J. ( 2011 ). Targeted Learning: Causal Inference for Observational and Experimental Data . Targeted Learning: Causal Inference for Observational and Experimental Data . barticle
2011
-
[41]
barticle [author] Rosenbaum , Paul R P. R. Rubin , Donald B D. B. ( 1983 ). The central role of the propensity score in observational studies for causal effects . Biometrika 70 41-55 . barticle
1983
-
[42]
barticle [author] Rosenman , Evan TR E. T. , Basse , Guillaume G. , Owen , Art B A. B. Baiocchi , Mike M. ( 2023 ). Combining observational and experimental datasets using shrinkage estimators . Biometrics 79 2961--2973 . barticle
2023
-
[43]
barticle [author] Smith , J. A. J. A. Todd , P. E. P. E. ( 2001 ). Reconciling conflicting evidence on the performance of propensity-score matching methods . American Economic Review 91 112-118 . 10.1257/aer.91.2.112 barticle
-
[44]
barticle [author] Soloff , Jake A J. A. , Guntuboyina , Adityanand A. Sen , Bodhisattva B. ( 2025 ). Multivariate, heteroscedastic empirical B ayes via nonparametric maximum likelihood . Journal of the Royal Statistical Society Series B: Statistical Methodology 87 1--32 . barticle
2025
-
[45]
barticle [author] Stuart , Elizabeth A. E. ( 2011 ). Matching Methods for Causal Inference: A Review and a Look Forward . Statistical Science 25 1--21 . barticle
2011
-
[46]
barticle [author] Tchetgen Tchetgen , Eric J. E. J. ( 2014 ). Using Negative Controls to Adjust for Unmeasured Confounding . Epidemiology 25 364--380 . barticle
2014
-
[47]
Jackson , Dave D
barticle [author] Turner , Ruth R. Jackson , Dave D. ( 2009 ). A Hierarchical B ayesian Approach to Bias Adjustment in Meta-Analysis . Statistics in Medicine 28 331--347 . barticle
2009
-
[48]
bbook [author] Van der Vaart , Aad W A. W. ( 2000 ). Asymptotic Statistics 3 . Cambridge university press . bbook
2000
-
[49]
barticle [author] van der Vaart , A. W. A. W. Wellner , Jon A. J. A. ( 2023 ). Weak Convergence and Empirical Processes . 10.1007/978-3-031-29040-4 barticle
-
[50]
barticle [author] Waddington , Hugh Sharma H. S. , Villar , Paul Fenton P. F. Valentine , Jeffrey C J. C. ( 2023 ). Can non-randomised studies of interventions provide unbiased effect estimates? A systematic review of internal replication studies . Evaluation Review 47 563--593 . barticle
2023
-
[51]
barticle [author] Wong , Vivian C V. C. , Steiner , Peter M P. M. Anglin , Kylie L K. L. ( 2018 ). What can be learned from empirical evaluations of nonexperimental methods? Evaluation review 42 147--175 . barticle
2018
-
[52]
barticle [author] Xie , X. X. , Kou , S. C. S. C. Brown , L. D. L. D. ( 2012 ). SURE estimates for a heteroscedastic hierarchical model . Journal of the American Statistical Association 107 1465-1479 . 10.1080/01621459.2012.728154 barticle
-
[53]
, Koenecke , Allison A
barticle [author] Xiong , Ruoxuan R. , Koenecke , Allison A. , Powell , Michael M. , Shen , Zhu Z. , Vogelstein , Joshua T J. T. Athey , Susan S. ( 2023 ). Federated causal inference in heterogeneous observational data . Statistics in Medicine 42 4418--4439 . barticle
2023
-
[54]
Ding , Peng P
barticle [author] Yang , Shu S. Ding , Peng P. ( 2020 ). Combining multiple observational data sources to estimate causal effects . Journal of the American Statistical Association . barticle
2020
-
[55]
, Gao , Chenyin C
barticle [author] Yang , Shu S. , Gao , Chenyin C. , Zeng , Donglin D. Wang , Xiaofei X. ( 2023 ). Elastic integrative analysis of randomised trial and real-world data for treatment heterogeneity estimation . Journal of the Royal Statistical Society Series B: Statistical Methodology 85 575--596 . barticle
2023
-
[56]
barticle [author] Yang , Xuelin X. , Lin , Licong L. , Athey , Susan S. , Jordan , Michael I M. I. Imbens , Guido W G. W. ( 2025 ). Cross-Validated Causal Inference: a Modern Method to Combine Experimental and Observational Data . arXiv preprint arXiv:2511.00727 . barticle
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.