An Old Look at Empirical Bayes
Pith reviewed 2026-05-22 01:51 UTC · model grok-4.3
The pith
Empirical Bayes reuses the same data twice and produces uncertainty measures distinct from those of a full hierarchical posterior conditional on the realized observations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Blei's empirical Bayes machinery, including population empirical Bayes and its extensions, targets inferential objects distinct from the posterior conditional on the realized data; empirical Bayes therefore conflates levels of the hierarchy and yields posterior-shaped summaries whose uncertainty quantification differs from what a fully hierarchical model delivers. The cost of maintaining the full hierarchical discipline has fallen low enough that the computational trade-off no longer favors the shortcut. The case study is the Tweedie formula, where a smoothed score need not arise from any prior but the horseshoe Tweedie formula does.
What carries the argument
The distinction between empirical Bayes summaries obtained by reusing data across hierarchy levels and the true posterior conditional on the realized data, illustrated by whether an estimated score function in the Tweedie identity arises from an actual prior.
If this is right
- Modern tools such as variational inference, neural amortization, and simulation-based inference should be redeployed to serve properly hierarchical models rather than empirical Bayes shortcuts.
- Uncertainty measures from empirical Bayes will systematically differ from those obtained by conditioning on the observed data.
- The horseshoe prior supplies a Tweedie formula in which the score truly arises from a hierarchical model, unlike generic smoothed-score approximations.
- Calibration studies that combine experimental and observational data remain useful only when embedded inside an explicit hierarchical structure.
Where Pith is reading between the lines
- Practitioners facing high-dimensional data might test whether the coverage gap between empirical Bayes and hierarchical intervals shrinks or grows with dimension.
- The argument implies that applications previously dismissed as too slow for full Bayes, such as large-scale calibration or implicit-likelihood problems, should now be revisited with hierarchical formulations.
- Connections to causal inference or meta-analysis may gain from treating empirical Bayes steps as approximations that require explicit hierarchical justification.
Load-bearing premise
Differences in uncertainty quantification and posterior-shaped summaries between empirical Bayes and full hierarchical models are large enough in practice to matter more than computational convenience.
What would settle it
A repeated-sampling simulation or real-data analysis in which credible intervals from a full hierarchical model achieve nominal coverage rates while the corresponding empirical Bayes intervals do not, or where decision losses differ materially between the two approaches.
read the original abstract
Dennis Lindley once said that there is only one thing worse than a frequentist, and that is an empirical Bayesian. The quip has the air of caricature, but its technical content is serious: empirical Bayes uses the same data twice, conflates levels of a hierarchy, and produces posterior-shaped summaries whose uncertainty quantification differs from what a fully hierarchical model delivers. David Blei's 2026 IMS Medallion Lecture, "A Fresh Look at Empirical Bayes," revives the program under three new banners: empirical Bayes via probabilistic symmetries (rebranded "Bayesian empirical Bayes"), empirical Bayes with implicit likelihoods through simulation-based inference, and empirical Bayes for combining experimental and observational data through calibration studies. This is a continuation of Blei and Kucukelbir's earlier "population empirical Bayes" (PopEB, 2015). We argue, in the spirit of Lindley, I. J. Good, William DuMouchel, Thomas Louis, and our own recent work with Datta, that Blei's machinery targets inferential objects distinct from the posterior conditional on the realized data, and that the cost of maintaining the full hierarchical discipline has fallen low enough that the computational trade-off no longer favors the shortcut. The case study is the Tweedie formula. Efron's f-modeling empirical Bayes plugs an estimated score function into a posterior-mean identity, but a smoothed score need not arise from any prior. The horseshoe Tweedie formula does. We conclude by recommending that the impressive computational machinery of modern empirical Bayes (variational inference, neural amortization, simulation-based inference) be redeployed in service of properly hierarchical Bayes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper critiques David Blei's 2026 IMS Medallion Lecture on empirical Bayes, arguing that EB methods target inferential objects distinct from the posterior conditional on realized data in a fully hierarchical model. Using the Tweedie formula as a case study, it contrasts Efron's f-modeling approach, where a smoothed score need not arise from a prior, with the horseshoe Tweedie formula that does. The authors conclude that with reduced computational costs, full hierarchical Bayes should be preferred and modern EB machinery redeployed accordingly.
Significance. If the distinctions in inferential targets and UQ hold, this work highlights foundational differences between empirical Bayes shortcuts and hierarchical modeling, potentially shifting practice toward full hierarchies now that computation has improved. It connects to historical critiques and provides a concrete Tweedie illustration of when smoothed scores correspond (or fail to correspond) to priors.
major comments (2)
- [Tweedie formula case study] The assertion that differences in posterior-shaped summaries and uncertainty quantification between EB and full hierarchical models are practically meaningful enough to outweigh computational convenience lacks supporting quantitative evidence. The Tweedie case study supplies no coverage rates, interval widths, decision loss, or simulation comparisons showing when these distinctions affect conclusions in typical applications.
- [Conclusion and recommendations] The claim that the cost of maintaining full hierarchical discipline has fallen low enough that the trade-off no longer favors the EB shortcut is presented as a modeling preference rather than a demonstrated result. No calibration or threshold analysis is given for when modern tools (VI, neural amortization, SBI) make the hierarchical approach feasible across the applications discussed.
minor comments (1)
- [Abstract] The abstract reference to 'our own recent work with Datta' would be strengthened by an explicit citation.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and detailed report. We address the two major comments below, clarifying the manuscript's intent as a commentary on foundational distinctions while agreeing that additional discussion can strengthen the presentation.
read point-by-point responses
-
Referee: [Tweedie formula case study] The assertion that differences in posterior-shaped summaries and uncertainty quantification between EB and full hierarchical models are practically meaningful enough to outweigh computational convenience lacks supporting quantitative evidence. The Tweedie case study supplies no coverage rates, interval widths, decision loss, or simulation comparisons showing when these distinctions affect conclusions in typical applications.
Authors: The Tweedie case study serves to illustrate the core conceptual distinction: Efron's f-modeling plugs an estimated score into a posterior-mean identity, but the resulting smoothed score need not arise from any prior, whereas the horseshoe Tweedie formula does correspond to a hierarchical model. The manuscript's focus is on the difference in inferential targets and the historical context of such critiques, rather than on a simulation-based performance comparison. We agree that noting potential practical implications would be helpful. We will revise to add a concise paragraph discussing settings (e.g., strong shrinkage or misspecified marginals) where the UQ differences could influence conclusions, without expanding into a full empirical study. revision: partial
-
Referee: [Conclusion and recommendations] The claim that the cost of maintaining full hierarchical discipline has fallen low enough that the trade-off no longer favors the EB shortcut is presented as a modeling preference rather than a demonstrated result. No calibration or threshold analysis is given for when modern tools (VI, neural amortization, SBI) make the hierarchical approach feasible across the applications discussed.
Authors: The recommendation reflects the substantial literature on computational advances in variational inference, neural amortization, and simulation-based inference, which have reduced the cost of full hierarchical modeling. The manuscript frames this as a shift in the practical trade-off, consistent with earlier critiques by Lindley, Good, and others. While we do not supply a formal calibration or threshold analysis, as the work is a targeted commentary rather than a methods paper, we will revise the conclusion to include additional citations to recent applications demonstrating feasibility in high-dimensional and complex-data settings. revision: partial
Circularity Check
No significant circularity: commentary relies on external standards and non-load-bearing self-reference
full rationale
The manuscript is a critical commentary contrasting Blei-style empirical Bayes with full hierarchical models, using the established Tweedie formula as an illustrative case study. It invokes standard results (Efron's f-modeling, posterior-mean identities, horseshoe prior) and prior literature including a single self-citation to work with Datta, but this reference supports a general preference for hierarchical discipline rather than serving as the sole justification for any derived claim or fitted quantity. No equations, parameters, or predictions are defined in terms of themselves, fitted to subsets then relabeled as forecasts, or smuggled via self-citation chains. The central assertions about distinct inferential objects and computational trade-offs rest on conceptual distinctions and external benchmarks, rendering the derivation self-contained against the listed circularity patterns.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Empirical Bayes uses the same data twice and conflates levels of a hierarchy
- domain assumption A smoothed score function in the Tweedie formula need not arise from any prior
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Efron’s f-modeling empirical Bayes plugs an estimated score function into a posterior-mean identity, but a smoothed score need not arise from any prior. The horseshoe Tweedie formula does.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Alquier, P. (2024). User-friendly introduction to PAC-Bayes bounds.Foundations and Trends in Machine Learning17(2), 174–303. Athey, S., Imbens, G. W., and Wager, S. (2018). Approximate residual balancing: Debiased inference of average treatment effects in high dimensions.Journal of the Royal Statistical Society, Series B80(4), 597–623. Bhadra, A., Datta, ...
work page 2024
-
[2]
Chernozhukov, V ., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters.The Econometrics Journal 20 Polson, Sokolov, and Zantedeschi: An Old Look at Empirical Bayes 21(1), C1–C68. Cranmer, K., Brehmer, J., and Louppe, G. (2020). The frontier of simul...
work page 2018
-
[3]
Institute of Mathematical Statistics. Louis, T. A. (1984). Estimating a population of parameter values using Bayes and empirical Bayes methods. Journal of the American Statistical Association79(386), 393–398. Lovász, L. and Szegedy, B. (2006). Limits of dense graph sequences.Journal of Combinatorial Theory, Series B96(6), 933–957. McAuliffe, J. D., Blei, ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.