pith. sign in

arxiv: 2606.23509 · v1 · pith:HR4XMK6Bnew · submitted 2026-06-22 · 📊 stat.ME · econ.EM· stat.ML

Variance or Standard Deviation? Shell Geometry and Global-Scale Priors in High-Dimensional Shrinkage

Pith reviewed 2026-06-26 07:21 UTC · model grok-4.3

classification 📊 stat.ME econ.EMstat.ML
keywords high-dimensional shrinkagescale priorsvariance vs standard deviationasymptotic riskradial-power benchmarkglobal-scale hyperpriorsshell geometryprior selection
0
0 comments X

The pith

Priors flat on the standard deviation hold a one-unit asymptotic risk advantage near the origin over variance-flat priors under radial-power benchmarks in high-dimensional shrinkage.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares how default priors for a common Gaussian scale affect shrinkage risk when the data dimension is high. Priors that are flat on the variance versus flat on the standard deviation place different amounts of mass near the zero boundary because of the geometry of high-dimensional shells. Under a radial-power benchmark this produces a one-unit risk advantage for the SD-flat choice near the origin, a crossover in the critical regime, and second-order equivalence for strong signals. The near-zero exponent of the SD-scale density carries these properties forward to proper global-scale hyperpriors and bounded mixtures, while also classifying the global component in heavier-tailed or sparse settings.

Core claim

Under a radial-power benchmark, the SD-flat benchmark has a one-unit asymptotic risk advantage near the origin, crosses over in the critical regime, and is second-order equivalent to the variance-flat benchmark for strong signals. Proper single global-scale hyperpriors and bounded coordinate-multiplier mixtures inherit these limits through the near-zero exponent of their SD-scale density. For heavier-tailed or sparse priors, that exponent still classifies the common global-scale component, while local-scale tails, model-size priors, or allocation priors can also affect risk.

What carries the argument

The near-zero exponent of the SD-scale density, which determines how much prior mass is allocated near the zero-scale boundary and thereby controls first-order shrinkage risk.

If this is right

  • Proper single global-scale hyperpriors inherit the risk limits through the near-zero exponent of their SD-scale density.
  • Bounded coordinate-multiplier mixtures inherit these limits in the same way.
  • For heavier-tailed or sparse priors the exponent continues to classify the common global-scale component.
  • Local-scale tails, model-size priors, or allocation priors can additionally affect overall risk.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Default prior recommendations in empirical Bayes shrinkage should therefore favor SD-flatness when signals are expected to be weak or near zero.
  • The geometric distinction between variance and SD flatness may appear in other high-dimensional scale estimation problems that use shell-volume arguments.
  • Finite-sample simulations with controlled radial-power signals could test how quickly the one-unit advantage emerges.

Load-bearing premise

The near-zero behavior of the common scale prior has first-order consequences for shrinkage risk in the high-dimensional setting considered.

What would settle it

An explicit calculation of the asymptotic risk difference between the SD-flat and variance-flat benchmarks near the origin that yields a value other than one unit would falsify the central comparison.

Figures

Figures reproduced from arXiv: 2606.23509 by Wayne Yuan Gao, Zhiheng You.

Figure 1
Figure 1. Figure 1: Weak-Regime Excess Risk Notes: Weak-regime excess risk R(θd, δd,c) − λd at d = 2000, where λd = ∥θd∥ 2 , with horizontal asymptotic targets c = 1 and c = 2. We confirm this asymptotic risk gap numerically [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Critical-Regime Risk Gap Notes: Critical-regime risk gap at d = 5000, with signal energy λd = ∥θd∥ 2 = β √ d, together with the asymptotic limit ∆(β), which has a numerically identified zero at β = β∗ ≈ 2.080. Negative values favor the SD-scale benchmark; positive values favor the variance-flat benchmark. Consequently, ∆(β) > 0 for all sufficiently large β. Since ∆(0) = −1 and ∆ is continuous, there exists… view at source ↗
Figure 3
Figure 3. Figure 3: Strong-signal universality Notes: The exact finite-d scaled risks for c = 1 and c = 2 are nearly identical, and the second-order approximation improves substantially on the first-order limit at moderate dimensions. Proper hyperpriors [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Transfer to representative proper single global-scale hyperpriors at [PITH_FULL_IMAGE:figures/full_fig_p020_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Weak-signal many-weak comparison for two representative architectures beyond a [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗
read the original abstract

We study how the choice of default prior for a common Gaussian scale affects high-dimensional shrinkage risk, highlighting the role played by high-dimensional geometry. Formally, we consider a high-dimensional setting in which the near-zero behavior of the common scale prior has first-order consequences for shrinkage risk, and show that priors that are flat on the variance and those flat on the standard deviation allocate markedly different mass near the zero-scale boundary, leading to distinct shrinkage behavior and informing principled default prior selection. Specifically, under a radial-power benchmark, we establish that the SD-flat benchmark has a one-unit asymptotic risk advantage near the origin, crosses over in the critical regime, and is second-order equivalent to the variance-flat benchmark for strong signals. Proper single global-scale hyperpriors and bounded coordinate-multiplier mixtures inherit these limits through the near-zero exponent of their SD-scale density. For heavier-tailed or sparse priors, that exponent still classifies the common global-scale component, while local-scale tails, model-size priors, or allocation priors can also affect risk.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper studies the effect of the near-zero behavior of common scale priors (variance-flat vs. SD-flat) on high-dimensional shrinkage risk. Under an explicit radial-power benchmark, it derives that the SD-flat prior yields a one-unit asymptotic risk advantage near the origin, a crossover in the critical regime, and second-order equivalence to the variance-flat prior for strong signals. These limits are inherited by proper global-scale hyperpriors and bounded coordinate-multiplier mixtures through the near-zero exponent of the SD-scale density; the exponent is also used to classify the global-scale component for heavier-tailed or sparse priors.

Significance. If the asymptotic derivations hold, the work supplies a geometrically grounded criterion for default prior selection in high-dimensional Bayesian shrinkage, showing that first-order risk differences arise directly from the near-zero exponent under the stated benchmark. This is a precise, falsifiable contribution to the literature on global-scale hyperpriors.

minor comments (2)
  1. The abstract and introduction would benefit from an explicit statement of the radial-power benchmark density (including the range of the power parameter) so that the one-unit advantage claim can be checked without consulting later sections.
  2. Notation for the SD-scale density and its near-zero exponent should be introduced once and used consistently; the current phrasing mixes “SD-flat benchmark” and “near-zero exponent of their SD-scale density” without a single defining equation.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of the manuscript, recognition of its contribution to default prior selection in high-dimensional shrinkage, and recommendation of minor revision. No specific major comments were raised.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central claims derive asymptotic risk comparisons (one-unit advantage near origin, crossover, second-order equivalence) directly from the near-zero exponent of the SD-scale density under an explicit radial-power benchmark and high-dimensional shell geometry. No steps reduce by construction to fitted inputs, self-definitions, or load-bearing self-citations; the derivations are presented as following from volume arguments and the stated premise on near-zero behavior. The analysis is self-contained against the benchmark without internal reduction to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the domain assumption that near-zero prior behavior dominates risk and on the modeling choice of a radial-power benchmark whose details are not visible in the abstract.

axioms (1)
  • domain assumption The near-zero behavior of the common scale prior has first-order consequences for shrinkage risk
    Explicitly stated as the formal setting in the abstract.

pith-pipeline@v0.9.1-grok · 5713 in / 1112 out tokens · 22894 ms · 2026-06-26T07:21:39.479848+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 1 linked inside Pith

  1. [1]

    (1997): An elementary introduction to modern convex geometry, in Flavors of Geometry, Cambridge: Cambridge University Press, vol

    Ball, K. (1997): An elementary introduction to modern convex geometry, in Flavors of Geometry, Cambridge: Cambridge University Press, vol. 31 of Mathematical Sciences Research Institute Publications, 1--58

  2. [2]

    Berger, J. O., W. E. Strawderman, and D. Tang (2005): Posterior propriety and admissibility of hyperpriors in normal hierarchical models, Annals of Statistics, 33, 606--646

  3. [3]

    Datta, N

    Bhadra, A., J. Datta, N. G. Polson, and B. Willard (2016): Default B ayesian analysis with global-local shrinkage priors, Biometrika, 103, 955--969

  4. [4]

    Bhattacharya, A., D. Pati, N. S. Pillai, and D. B. Dunson (2015): Dirichlet-- L aplace priors for optimal shrinkage, Journal of the American Statistical Association, 110, 1479--1490

  5. [5]

    Brown, L. D. (1971): Admissible estimators, recurrent diffusions, and insoluble boundary value problems, Annals of Mathematical Statistics, 42, 855--903

  6. [6]

    Brown, L. D. and L. H. Zhao (2012): A geometrical explanation of S tein shrinkage, Statistical Science, 27, 40--52

  7. [7]

    Carvalho, C. M., N. G. Polson, and J. G. Scott (2010): The horseshoe estimator for sparse signals, Biometrika, 97, 465--480

  8. [8]

    Castillo, I. and B. Szab \'o (2020): Spike and slab empirical B ayes sparse credible sets, Bernoulli, 26, 127--158

  9. [9]

    Hansen, and Y

    Chernozhukov, V., C. Hansen, and Y. Liao (2017): A lava attack on the recovery of sums of dense and sparse signals, Annals of Statistics, 45, 39--76

  10. [10]

    Donoho, D. L. and J. Tanner (2009): Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 367, 4273--4293

  11. [11]

    (2006): Prior distributions for variance parameters in hierarchical models, Bayesian Analysis, 1, 515--533

    Gelman, A. (2006): Prior distributions for variance parameters in hierarchical models, Bayesian Analysis, 1, 515--533

  12. [12]

    Lenza, and G

    Giannone, D., M. Lenza, and G. E. Primiceri (2021): Economic predictions with big data: the illusion of sparsity, Econometrica, 89, 2409--2437

  13. [13]

    Ingster, Y. I. and I. A. Suslina (2000): Minimax nonparametric hypothesis testing for ellipsoids and B esov bodies, ESAIM: Probability and Statistics, 4, 53--135

  14. [14]

    Johnstone, I. M. and B. W. Silverman (2004): Needles and straw in haystacks: empirical B ayes estimates of possibly sparse sequences, Annals of Statistics, 32, 1594--1649

  15. [15]

    Koles\'ar, M., U. K. M \"u ller, and S. T. Roelsgaard (2025): The fragility of sparsity, Working paper, March 2025. arXiv:2311.02299

  16. [16]

    Laurent, B. and P. Massart (2000): Adaptive estimation of a quadratic functional by model selection, Annals of Statistics, 28, 1302--1338

  17. [17]

    (2001): The Concentration of Measure Phenomenon, vol

    Ledoux, M. (2001): The Concentration of Measure Phenomenon, vol. 89 of Mathematical Surveys and Monographs, Providence, RI: American Mathematical Society

  18. [18]

    Maruyama, Y. and A. Takemura (2008): Admissibility and minimaxity of generalized B ayes estimators for spherically symmetric family, Journal of Multivariate Analysis, 99, 50--73

  19. [19]

    Moran, G. E., V. Ro c kov \'a , and E. I. George (2019): Variance prior forms for high-dimensional B ayesian variable selection, Bayesian Analysis, 14, 1091--1119

  20. [20]

    Piironen, J. and A. Vehtari (2017 a ): On the Hyperprior Choice for the Global Shrinkage Parameter in the Horseshoe Prior, in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR, vol. 54 of Proceedings of Machine Learning Research, 905--913

  21. [21]

    --- -.1pt --- -.1pt --- (2017 b ): Sparsity information and regularization in the horseshoe and other shrinkage priors, Electronic Journal of Statistics, 11, 5018--5051

  22. [22]

    Polson, N. G. and J. G. Scott (2012): On the Half-Cauchy Prior for a Global Scale Parameter, Bayesian Analysis, 7, 887--902

  23. [23]

    (2018): Bayesian estimation of sparse signals with a continuous spike-and-slab prior, Annals of Statistics, 46, 401--437

    Ro c kov \'a , V. (2018): Bayesian estimation of sparse signals with a continuous spike-and-slab prior, Annals of Statistics, 46, 401--437

  24. [24]

    Ro c kov \'a , V. and E. I. George (2018): The Spike-and-Slab LASSO , Journal of the American Statistical Association, 113, 431--444

  25. [25]

    Scott, J. G. and J. O. Berger (2010): Bayes and empirical- B ayes multiplicity adjustment in the variable-selection problem, Annals of Statistics, 38, 2587--2619

  26. [26]

    Stein, C. (1956): Inadmissibility of the usual estimator for the mean of a multivariate normal distribution, in Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, 197--206

  27. [27]

    Stein, C. M. (1981): Estimation of the mean of a multivariate normal distribution, Annals of Statistics, 9, 1135--1151

  28. [28]

    Strawderman, W. E. (1971): Proper B ayes minimax estimators of the multivariate normal mean, Annals of Mathematical Statistics, 42, 385--388

  29. [29]

    van der Pas, S. L., B. J. Kleijn, and A. W. van der Vaart (2014): The horseshoe estimator: Posterior concentration around nearly black vectors, Electronic Journal of Statistics, 8, 2585--2618

  30. [30]

    Vershynin, R. (2018): High-Dimensional Probability: An Introduction with Applications in Data Science, Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge: Cambridge University Press

  31. [31]

    Zhang, Y. D., B. P. Naughton, H. D. Bondell, and B. J. Reich (2022): Bayesian regression using a prior on the model fit: The R 2- D 2 shrinkage prior, Journal of the American Statistical Association, 117, 862--874