pith. sign in

arxiv: 2605.21535 · v1 · pith:3F5EIYPJnew · submitted 2026-05-20 · 📊 stat.ME

An Old Look at Empirical Bayes

Pith reviewed 2026-05-22 01:51 UTC · model grok-4.3

classification 📊 stat.ME
keywords empirical Bayeshierarchical modelsTweedie formulauncertainty quantificationposterior inferencesimulation-based inferencevariational inference
0
0 comments X

The pith

Empirical Bayes reuses the same data twice and produces uncertainty measures distinct from those of a full hierarchical posterior conditional on the realized observations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper revives Lindley's old critique to examine Blei's recent proposals for empirical Bayes via symmetries, simulation-based implicit likelihoods, and calibration of experimental with observational data. It argues that these methods conflate hierarchy levels and target summaries whose uncertainty quantification differs from the posterior obtained by conditioning directly on the observed data. The authors use the Tweedie formula as a concrete case: Efron's f-modeling approach plugs in an estimated score that need not arise from any prior, whereas a horseshoe prior produces a score that does. With modern computation making full hierarchical models feasible, the paper concludes that the computational machinery of variational inference and simulation-based methods should be redirected to proper hierarchical Bayes rather than shortcuts.

Core claim

Blei's empirical Bayes machinery, including population empirical Bayes and its extensions, targets inferential objects distinct from the posterior conditional on the realized data; empirical Bayes therefore conflates levels of the hierarchy and yields posterior-shaped summaries whose uncertainty quantification differs from what a fully hierarchical model delivers. The cost of maintaining the full hierarchical discipline has fallen low enough that the computational trade-off no longer favors the shortcut. The case study is the Tweedie formula, where a smoothed score need not arise from any prior but the horseshoe Tweedie formula does.

What carries the argument

The distinction between empirical Bayes summaries obtained by reusing data across hierarchy levels and the true posterior conditional on the realized data, illustrated by whether an estimated score function in the Tweedie identity arises from an actual prior.

If this is right

  • Modern tools such as variational inference, neural amortization, and simulation-based inference should be redeployed to serve properly hierarchical models rather than empirical Bayes shortcuts.
  • Uncertainty measures from empirical Bayes will systematically differ from those obtained by conditioning on the observed data.
  • The horseshoe prior supplies a Tweedie formula in which the score truly arises from a hierarchical model, unlike generic smoothed-score approximations.
  • Calibration studies that combine experimental and observational data remain useful only when embedded inside an explicit hierarchical structure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Practitioners facing high-dimensional data might test whether the coverage gap between empirical Bayes and hierarchical intervals shrinks or grows with dimension.
  • The argument implies that applications previously dismissed as too slow for full Bayes, such as large-scale calibration or implicit-likelihood problems, should now be revisited with hierarchical formulations.
  • Connections to causal inference or meta-analysis may gain from treating empirical Bayes steps as approximations that require explicit hierarchical justification.

Load-bearing premise

Differences in uncertainty quantification and posterior-shaped summaries between empirical Bayes and full hierarchical models are large enough in practice to matter more than computational convenience.

What would settle it

A repeated-sampling simulation or real-data analysis in which credible intervals from a full hierarchical model achieve nominal coverage rates while the corresponding empirical Bayes intervals do not, or where decision losses differ materially between the two approaches.

read the original abstract

Dennis Lindley once said that there is only one thing worse than a frequentist, and that is an empirical Bayesian. The quip has the air of caricature, but its technical content is serious: empirical Bayes uses the same data twice, conflates levels of a hierarchy, and produces posterior-shaped summaries whose uncertainty quantification differs from what a fully hierarchical model delivers. David Blei's 2026 IMS Medallion Lecture, "A Fresh Look at Empirical Bayes," revives the program under three new banners: empirical Bayes via probabilistic symmetries (rebranded "Bayesian empirical Bayes"), empirical Bayes with implicit likelihoods through simulation-based inference, and empirical Bayes for combining experimental and observational data through calibration studies. This is a continuation of Blei and Kucukelbir's earlier "population empirical Bayes" (PopEB, 2015). We argue, in the spirit of Lindley, I. J. Good, William DuMouchel, Thomas Louis, and our own recent work with Datta, that Blei's machinery targets inferential objects distinct from the posterior conditional on the realized data, and that the cost of maintaining the full hierarchical discipline has fallen low enough that the computational trade-off no longer favors the shortcut. The case study is the Tweedie formula. Efron's f-modeling empirical Bayes plugs an estimated score function into a posterior-mean identity, but a smoothed score need not arise from any prior. The horseshoe Tweedie formula does. We conclude by recommending that the impressive computational machinery of modern empirical Bayes (variational inference, neural amortization, simulation-based inference) be redeployed in service of properly hierarchical Bayes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper critiques David Blei's 2026 IMS Medallion Lecture on empirical Bayes, arguing that EB methods target inferential objects distinct from the posterior conditional on realized data in a fully hierarchical model. Using the Tweedie formula as a case study, it contrasts Efron's f-modeling approach, where a smoothed score need not arise from a prior, with the horseshoe Tweedie formula that does. The authors conclude that with reduced computational costs, full hierarchical Bayes should be preferred and modern EB machinery redeployed accordingly.

Significance. If the distinctions in inferential targets and UQ hold, this work highlights foundational differences between empirical Bayes shortcuts and hierarchical modeling, potentially shifting practice toward full hierarchies now that computation has improved. It connects to historical critiques and provides a concrete Tweedie illustration of when smoothed scores correspond (or fail to correspond) to priors.

major comments (2)
  1. [Tweedie formula case study] The assertion that differences in posterior-shaped summaries and uncertainty quantification between EB and full hierarchical models are practically meaningful enough to outweigh computational convenience lacks supporting quantitative evidence. The Tweedie case study supplies no coverage rates, interval widths, decision loss, or simulation comparisons showing when these distinctions affect conclusions in typical applications.
  2. [Conclusion and recommendations] The claim that the cost of maintaining full hierarchical discipline has fallen low enough that the trade-off no longer favors the EB shortcut is presented as a modeling preference rather than a demonstrated result. No calibration or threshold analysis is given for when modern tools (VI, neural amortization, SBI) make the hierarchical approach feasible across the applications discussed.
minor comments (1)
  1. [Abstract] The abstract reference to 'our own recent work with Datta' would be strengthened by an explicit citation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and detailed report. We address the two major comments below, clarifying the manuscript's intent as a commentary on foundational distinctions while agreeing that additional discussion can strengthen the presentation.

read point-by-point responses
  1. Referee: [Tweedie formula case study] The assertion that differences in posterior-shaped summaries and uncertainty quantification between EB and full hierarchical models are practically meaningful enough to outweigh computational convenience lacks supporting quantitative evidence. The Tweedie case study supplies no coverage rates, interval widths, decision loss, or simulation comparisons showing when these distinctions affect conclusions in typical applications.

    Authors: The Tweedie case study serves to illustrate the core conceptual distinction: Efron's f-modeling plugs an estimated score into a posterior-mean identity, but the resulting smoothed score need not arise from any prior, whereas the horseshoe Tweedie formula does correspond to a hierarchical model. The manuscript's focus is on the difference in inferential targets and the historical context of such critiques, rather than on a simulation-based performance comparison. We agree that noting potential practical implications would be helpful. We will revise to add a concise paragraph discussing settings (e.g., strong shrinkage or misspecified marginals) where the UQ differences could influence conclusions, without expanding into a full empirical study. revision: partial

  2. Referee: [Conclusion and recommendations] The claim that the cost of maintaining full hierarchical discipline has fallen low enough that the trade-off no longer favors the EB shortcut is presented as a modeling preference rather than a demonstrated result. No calibration or threshold analysis is given for when modern tools (VI, neural amortization, SBI) make the hierarchical approach feasible across the applications discussed.

    Authors: The recommendation reflects the substantial literature on computational advances in variational inference, neural amortization, and simulation-based inference, which have reduced the cost of full hierarchical modeling. The manuscript frames this as a shift in the practical trade-off, consistent with earlier critiques by Lindley, Good, and others. While we do not supply a formal calibration or threshold analysis, as the work is a targeted commentary rather than a methods paper, we will revise the conclusion to include additional citations to recent applications demonstrating feasibility in high-dimensional and complex-data settings. revision: partial

Circularity Check

0 steps flagged

No significant circularity: commentary relies on external standards and non-load-bearing self-reference

full rationale

The manuscript is a critical commentary contrasting Blei-style empirical Bayes with full hierarchical models, using the established Tweedie formula as an illustrative case study. It invokes standard results (Efron's f-modeling, posterior-mean identities, horseshoe prior) and prior literature including a single self-citation to work with Datta, but this reference supports a general preference for hierarchical discipline rather than serving as the sole justification for any derived claim or fitted quantity. No equations, parameters, or predictions are defined in terms of themselves, fitted to subsets then relabeled as forecasts, or smuggled via self-citation chains. The central assertions about distinct inferential objects and computational trade-offs rest on conceptual distinctions and external benchmarks, rendering the derivation self-contained against the listed circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central argument rests on domain assumptions from Bayesian statistics about data usage and hierarchy levels, with no new free parameters or invented entities introduced.

axioms (2)
  • domain assumption Empirical Bayes uses the same data twice and conflates levels of a hierarchy
    Invoked in the opening discussion of Lindley's quip and technical content.
  • domain assumption A smoothed score function in the Tweedie formula need not arise from any prior
    Stated in the case study contrasting Efron's f-modeling with the horseshoe version.

pith-pipeline@v0.9.0 · 5833 in / 1252 out tokens · 45115 ms · 2026-05-22T01:51:17.769946+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages

  1. [1]

    Alquier, P. (2024). User-friendly introduction to PAC-Bayes bounds.Foundations and Trends in Machine Learning17(2), 174–303. Athey, S., Imbens, G. W., and Wager, S. (2018). Approximate residual balancing: Debiased inference of average treatment effects in high dimensions.Journal of the Royal Statistical Society, Series B80(4), 597–623. Bhadra, A., Datta, ...

  2. [2]

    Chernozhukov, V ., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters.The Econometrics Journal 20 Polson, Sokolov, and Zantedeschi: An Old Look at Empirical Bayes 21(1), C1–C68. Cranmer, K., Brehmer, J., and Louppe, G. (2020). The frontier of simul...

  3. [3]

    Louis, T

    Institute of Mathematical Statistics. Louis, T. A. (1984). Estimating a population of parameter values using Bayes and empirical Bayes methods. Journal of the American Statistical Association79(386), 393–398. Lovász, L. and Szegedy, B. (2006). Limits of dense graph sequences.Journal of Combinatorial Theory, Series B96(6), 933–957. McAuliffe, J. D., Blei, ...