Variance or Standard Deviation? Shell Geometry and Global-Scale Priors in High-Dimensional Shrinkage

Wayne Yuan Gao; Zhiheng You

arxiv: 2606.23509 · v1 · pith:HR4XMK6Bnew · submitted 2026-06-22 · 📊 stat.ME · econ.EM· stat.ML

Variance or Standard Deviation? Shell Geometry and Global-Scale Priors in High-Dimensional Shrinkage

Wayne Yuan Gao , Zhiheng You This is my paper

Pith reviewed 2026-06-26 07:21 UTC · model grok-4.3

classification 📊 stat.ME econ.EMstat.ML

keywords high-dimensional shrinkagescale priorsvariance vs standard deviationasymptotic riskradial-power benchmarkglobal-scale hyperpriorsshell geometryprior selection

0 comments

The pith

Priors flat on the standard deviation hold a one-unit asymptotic risk advantage near the origin over variance-flat priors under radial-power benchmarks in high-dimensional shrinkage.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares how default priors for a common Gaussian scale affect shrinkage risk when the data dimension is high. Priors that are flat on the variance versus flat on the standard deviation place different amounts of mass near the zero boundary because of the geometry of high-dimensional shells. Under a radial-power benchmark this produces a one-unit risk advantage for the SD-flat choice near the origin, a crossover in the critical regime, and second-order equivalence for strong signals. The near-zero exponent of the SD-scale density carries these properties forward to proper global-scale hyperpriors and bounded mixtures, while also classifying the global component in heavier-tailed or sparse settings.

Core claim

Under a radial-power benchmark, the SD-flat benchmark has a one-unit asymptotic risk advantage near the origin, crosses over in the critical regime, and is second-order equivalent to the variance-flat benchmark for strong signals. Proper single global-scale hyperpriors and bounded coordinate-multiplier mixtures inherit these limits through the near-zero exponent of their SD-scale density. For heavier-tailed or sparse priors, that exponent still classifies the common global-scale component, while local-scale tails, model-size priors, or allocation priors can also affect risk.

What carries the argument

The near-zero exponent of the SD-scale density, which determines how much prior mass is allocated near the zero-scale boundary and thereby controls first-order shrinkage risk.

If this is right

Proper single global-scale hyperpriors inherit the risk limits through the near-zero exponent of their SD-scale density.
Bounded coordinate-multiplier mixtures inherit these limits in the same way.
For heavier-tailed or sparse priors the exponent continues to classify the common global-scale component.
Local-scale tails, model-size priors, or allocation priors can additionally affect overall risk.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Default prior recommendations in empirical Bayes shrinkage should therefore favor SD-flatness when signals are expected to be weak or near zero.
The geometric distinction between variance and SD flatness may appear in other high-dimensional scale estimation problems that use shell-volume arguments.
Finite-sample simulations with controlled radial-power signals could test how quickly the one-unit advantage emerges.

Load-bearing premise

The near-zero behavior of the common scale prior has first-order consequences for shrinkage risk in the high-dimensional setting considered.

What would settle it

An explicit calculation of the asymptotic risk difference between the SD-flat and variance-flat benchmarks near the origin that yields a value other than one unit would falsify the central comparison.

Figures

Figures reproduced from arXiv: 2606.23509 by Wayne Yuan Gao, Zhiheng You.

**Figure 1.** Figure 1: Weak-Regime Excess Risk Notes: Weak-regime excess risk R(θd, δd,c) − λd at d = 2000, where λd = ∥θd∥ 2 , with horizontal asymptotic targets c = 1 and c = 2. We confirm this asymptotic risk gap numerically [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗

**Figure 2.** Figure 2: Critical-Regime Risk Gap Notes: Critical-regime risk gap at d = 5000, with signal energy λd = ∥θd∥ 2 = β √ d, together with the asymptotic limit ∆(β), which has a numerically identified zero at β = β∗ ≈ 2.080. Negative values favor the SD-scale benchmark; positive values favor the variance-flat benchmark. Consequently, ∆(β) > 0 for all sufficiently large β. Since ∆(0) = −1 and ∆ is continuous, there exists… view at source ↗

**Figure 3.** Figure 3: Strong-signal universality Notes: The exact finite-d scaled risks for c = 1 and c = 2 are nearly identical, and the second-order approximation improves substantially on the first-order limit at moderate dimensions. Proper hyperpriors [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗

**Figure 4.** Figure 4: Transfer to representative proper single global-scale hyperpriors at [PITH_FULL_IMAGE:figures/full_fig_p020_4.png] view at source ↗

**Figure 5.** Figure 5: Weak-signal many-weak comparison for two representative architectures beyond a [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗

read the original abstract

We study how the choice of default prior for a common Gaussian scale affects high-dimensional shrinkage risk, highlighting the role played by high-dimensional geometry. Formally, we consider a high-dimensional setting in which the near-zero behavior of the common scale prior has first-order consequences for shrinkage risk, and show that priors that are flat on the variance and those flat on the standard deviation allocate markedly different mass near the zero-scale boundary, leading to distinct shrinkage behavior and informing principled default prior selection. Specifically, under a radial-power benchmark, we establish that the SD-flat benchmark has a one-unit asymptotic risk advantage near the origin, crosses over in the critical regime, and is second-order equivalent to the variance-flat benchmark for strong signals. Proper single global-scale hyperpriors and bounded coordinate-multiplier mixtures inherit these limits through the near-zero exponent of their SD-scale density. For heavier-tailed or sparse priors, that exponent still classifies the common global-scale component, while local-scale tails, model-size priors, or allocation priors can also affect risk.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper pins down a one-unit asymptotic risk advantage for SD-flat over variance-flat priors near the origin under the radial-power benchmark, with crossover and equivalence for larger signals.

read the letter

The main new piece is the explicit risk comparison under the radial-power benchmark: SD-flat priors get a one-unit edge near zero because they put different mass near the scale boundary, they cross over in the critical regime, and they match variance-flat priors at second order for strong signals. The geometric argument about shell volume and the near-zero exponent is laid out cleanly, and the paper shows how this carries over to proper hyperpriors and bounded mixtures through that same exponent. It also notes that the exponent still classifies the global component even when local tails or sparsity are added.

The derivations appear consistent with the stated conditions and avoid obvious circularity. The focus on first-order consequences from the near-zero behavior is the paper's own premise rather than a hidden flaw.

The soft spot is the dependence on the radial-power benchmark itself; outside that setup the claimed first-order effect does not necessarily hold, so the practical scope is narrower than the title might suggest. The results are asymptotic, with no finite-sample checks or simulation evidence mentioned. The extension to heavier-tailed cases is noted but not developed in detail.

This is for people already working on Bayesian high-dimensional shrinkage who need concrete guidance on global-scale prior choice. A reader who cares about default priors in that literature will find the comparisons useful. The work is formally grounded enough and the question is well-posed, so it deserves a serious referee even if revisions will be needed on scope and simulations.

Referee Report

0 major / 2 minor

Summary. The paper studies the effect of the near-zero behavior of common scale priors (variance-flat vs. SD-flat) on high-dimensional shrinkage risk. Under an explicit radial-power benchmark, it derives that the SD-flat prior yields a one-unit asymptotic risk advantage near the origin, a crossover in the critical regime, and second-order equivalence to the variance-flat prior for strong signals. These limits are inherited by proper global-scale hyperpriors and bounded coordinate-multiplier mixtures through the near-zero exponent of the SD-scale density; the exponent is also used to classify the global-scale component for heavier-tailed or sparse priors.

Significance. If the asymptotic derivations hold, the work supplies a geometrically grounded criterion for default prior selection in high-dimensional Bayesian shrinkage, showing that first-order risk differences arise directly from the near-zero exponent under the stated benchmark. This is a precise, falsifiable contribution to the literature on global-scale hyperpriors.

minor comments (2)

The abstract and introduction would benefit from an explicit statement of the radial-power benchmark density (including the range of the power parameter) so that the one-unit advantage claim can be checked without consulting later sections.
Notation for the SD-scale density and its near-zero exponent should be introduced once and used consistently; the current phrasing mixes “SD-flat benchmark” and “near-zero exponent of their SD-scale density” without a single defining equation.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of the manuscript, recognition of its contribution to default prior selection in high-dimensional shrinkage, and recommendation of minor revision. No specific major comments were raised.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central claims derive asymptotic risk comparisons (one-unit advantage near origin, crossover, second-order equivalence) directly from the near-zero exponent of the SD-scale density under an explicit radial-power benchmark and high-dimensional shell geometry. No steps reduce by construction to fitted inputs, self-definitions, or load-bearing self-citations; the derivations are presented as following from volume arguments and the stated premise on near-zero behavior. The analysis is self-contained against the benchmark without internal reduction to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the domain assumption that near-zero prior behavior dominates risk and on the modeling choice of a radial-power benchmark whose details are not visible in the abstract.

axioms (1)

domain assumption The near-zero behavior of the common scale prior has first-order consequences for shrinkage risk
Explicitly stated as the formal setting in the abstract.

pith-pipeline@v0.9.1-grok · 5713 in / 1112 out tokens · 22894 ms · 2026-06-26T07:21:39.479848+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 1 linked inside Pith

[1]

(1997): An elementary introduction to modern convex geometry, in Flavors of Geometry, Cambridge: Cambridge University Press, vol

Ball, K. (1997): An elementary introduction to modern convex geometry, in Flavors of Geometry, Cambridge: Cambridge University Press, vol. 31 of Mathematical Sciences Research Institute Publications, 1--58

1997
[2]

Berger, J. O., W. E. Strawderman, and D. Tang (2005): Posterior propriety and admissibility of hyperpriors in normal hierarchical models, Annals of Statistics, 33, 606--646

2005
[3]

Datta, N

Bhadra, A., J. Datta, N. G. Polson, and B. Willard (2016): Default B ayesian analysis with global-local shrinkage priors, Biometrika, 103, 955--969

2016
[4]

Bhattacharya, A., D. Pati, N. S. Pillai, and D. B. Dunson (2015): Dirichlet-- L aplace priors for optimal shrinkage, Journal of the American Statistical Association, 110, 1479--1490

2015
[5]

Brown, L. D. (1971): Admissible estimators, recurrent diffusions, and insoluble boundary value problems, Annals of Mathematical Statistics, 42, 855--903

1971
[6]

Brown, L. D. and L. H. Zhao (2012): A geometrical explanation of S tein shrinkage, Statistical Science, 27, 40--52

2012
[7]

Carvalho, C. M., N. G. Polson, and J. G. Scott (2010): The horseshoe estimator for sparse signals, Biometrika, 97, 465--480

2010
[8]

Castillo, I. and B. Szab \'o (2020): Spike and slab empirical B ayes sparse credible sets, Bernoulli, 26, 127--158

2020
[9]

Hansen, and Y

Chernozhukov, V., C. Hansen, and Y. Liao (2017): A lava attack on the recovery of sums of dense and sparse signals, Annals of Statistics, 45, 39--76

2017
[10]

Donoho, D. L. and J. Tanner (2009): Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 367, 4273--4293

2009
[11]

(2006): Prior distributions for variance parameters in hierarchical models, Bayesian Analysis, 1, 515--533

Gelman, A. (2006): Prior distributions for variance parameters in hierarchical models, Bayesian Analysis, 1, 515--533

2006
[12]

Lenza, and G

Giannone, D., M. Lenza, and G. E. Primiceri (2021): Economic predictions with big data: the illusion of sparsity, Econometrica, 89, 2409--2437

2021
[13]

Ingster, Y. I. and I. A. Suslina (2000): Minimax nonparametric hypothesis testing for ellipsoids and B esov bodies, ESAIM: Probability and Statistics, 4, 53--135

2000
[14]

Johnstone, I. M. and B. W. Silverman (2004): Needles and straw in haystacks: empirical B ayes estimates of possibly sparse sequences, Annals of Statistics, 32, 1594--1649

2004
[15]

Koles\'ar, M., U. K. M \"u ller, and S. T. Roelsgaard (2025): The fragility of sparsity, Working paper, March 2025. arXiv:2311.02299

Pith/arXiv arXiv 2025
[16]

Laurent, B. and P. Massart (2000): Adaptive estimation of a quadratic functional by model selection, Annals of Statistics, 28, 1302--1338

2000
[17]

(2001): The Concentration of Measure Phenomenon, vol

Ledoux, M. (2001): The Concentration of Measure Phenomenon, vol. 89 of Mathematical Surveys and Monographs, Providence, RI: American Mathematical Society

2001
[18]

Maruyama, Y. and A. Takemura (2008): Admissibility and minimaxity of generalized B ayes estimators for spherically symmetric family, Journal of Multivariate Analysis, 99, 50--73

2008
[19]

Moran, G. E., V. Ro c kov \'a , and E. I. George (2019): Variance prior forms for high-dimensional B ayesian variable selection, Bayesian Analysis, 14, 1091--1119

2019
[20]

Piironen, J. and A. Vehtari (2017 a ): On the Hyperprior Choice for the Global Shrinkage Parameter in the Horseshoe Prior, in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR, vol. 54 of Proceedings of Machine Learning Research, 905--913

2017
[21]

--- -.1pt --- -.1pt --- (2017 b ): Sparsity information and regularization in the horseshoe and other shrinkage priors, Electronic Journal of Statistics, 11, 5018--5051

2017
[22]

Polson, N. G. and J. G. Scott (2012): On the Half-Cauchy Prior for a Global Scale Parameter, Bayesian Analysis, 7, 887--902

2012
[23]

(2018): Bayesian estimation of sparse signals with a continuous spike-and-slab prior, Annals of Statistics, 46, 401--437

Ro c kov \'a , V. (2018): Bayesian estimation of sparse signals with a continuous spike-and-slab prior, Annals of Statistics, 46, 401--437

2018
[24]

Ro c kov \'a , V. and E. I. George (2018): The Spike-and-Slab LASSO , Journal of the American Statistical Association, 113, 431--444

2018
[25]

Scott, J. G. and J. O. Berger (2010): Bayes and empirical- B ayes multiplicity adjustment in the variable-selection problem, Annals of Statistics, 38, 2587--2619

2010
[26]

Stein, C. (1956): Inadmissibility of the usual estimator for the mean of a multivariate normal distribution, in Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, 197--206

1956
[27]

Stein, C. M. (1981): Estimation of the mean of a multivariate normal distribution, Annals of Statistics, 9, 1135--1151

1981
[28]

Strawderman, W. E. (1971): Proper B ayes minimax estimators of the multivariate normal mean, Annals of Mathematical Statistics, 42, 385--388

1971
[29]

van der Pas, S. L., B. J. Kleijn, and A. W. van der Vaart (2014): The horseshoe estimator: Posterior concentration around nearly black vectors, Electronic Journal of Statistics, 8, 2585--2618

2014
[30]

Vershynin, R. (2018): High-Dimensional Probability: An Introduction with Applications in Data Science, Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge: Cambridge University Press

2018
[31]

Zhang, Y. D., B. P. Naughton, H. D. Bondell, and B. J. Reich (2022): Bayesian regression using a prior on the model fit: The R 2- D 2 shrinkage prior, Journal of the American Statistical Association, 117, 862--874

2022

[1] [1]

(1997): An elementary introduction to modern convex geometry, in Flavors of Geometry, Cambridge: Cambridge University Press, vol

Ball, K. (1997): An elementary introduction to modern convex geometry, in Flavors of Geometry, Cambridge: Cambridge University Press, vol. 31 of Mathematical Sciences Research Institute Publications, 1--58

1997

[2] [2]

Berger, J. O., W. E. Strawderman, and D. Tang (2005): Posterior propriety and admissibility of hyperpriors in normal hierarchical models, Annals of Statistics, 33, 606--646

2005

[3] [3]

Datta, N

Bhadra, A., J. Datta, N. G. Polson, and B. Willard (2016): Default B ayesian analysis with global-local shrinkage priors, Biometrika, 103, 955--969

2016

[4] [4]

Bhattacharya, A., D. Pati, N. S. Pillai, and D. B. Dunson (2015): Dirichlet-- L aplace priors for optimal shrinkage, Journal of the American Statistical Association, 110, 1479--1490

2015

[5] [5]

Brown, L. D. (1971): Admissible estimators, recurrent diffusions, and insoluble boundary value problems, Annals of Mathematical Statistics, 42, 855--903

1971

[6] [6]

Brown, L. D. and L. H. Zhao (2012): A geometrical explanation of S tein shrinkage, Statistical Science, 27, 40--52

2012

[7] [7]

Carvalho, C. M., N. G. Polson, and J. G. Scott (2010): The horseshoe estimator for sparse signals, Biometrika, 97, 465--480

2010

[8] [8]

Castillo, I. and B. Szab \'o (2020): Spike and slab empirical B ayes sparse credible sets, Bernoulli, 26, 127--158

2020

[9] [9]

Hansen, and Y

Chernozhukov, V., C. Hansen, and Y. Liao (2017): A lava attack on the recovery of sums of dense and sparse signals, Annals of Statistics, 45, 39--76

2017

[10] [10]

Donoho, D. L. and J. Tanner (2009): Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 367, 4273--4293

2009

[11] [11]

(2006): Prior distributions for variance parameters in hierarchical models, Bayesian Analysis, 1, 515--533

Gelman, A. (2006): Prior distributions for variance parameters in hierarchical models, Bayesian Analysis, 1, 515--533

2006

[12] [12]

Lenza, and G

Giannone, D., M. Lenza, and G. E. Primiceri (2021): Economic predictions with big data: the illusion of sparsity, Econometrica, 89, 2409--2437

2021

[13] [13]

Ingster, Y. I. and I. A. Suslina (2000): Minimax nonparametric hypothesis testing for ellipsoids and B esov bodies, ESAIM: Probability and Statistics, 4, 53--135

2000

[14] [14]

Johnstone, I. M. and B. W. Silverman (2004): Needles and straw in haystacks: empirical B ayes estimates of possibly sparse sequences, Annals of Statistics, 32, 1594--1649

2004

[15] [15]

Koles\'ar, M., U. K. M \"u ller, and S. T. Roelsgaard (2025): The fragility of sparsity, Working paper, March 2025. arXiv:2311.02299

Pith/arXiv arXiv 2025

[16] [16]

Laurent, B. and P. Massart (2000): Adaptive estimation of a quadratic functional by model selection, Annals of Statistics, 28, 1302--1338

2000

[17] [17]

(2001): The Concentration of Measure Phenomenon, vol

Ledoux, M. (2001): The Concentration of Measure Phenomenon, vol. 89 of Mathematical Surveys and Monographs, Providence, RI: American Mathematical Society

2001

[18] [18]

Maruyama, Y. and A. Takemura (2008): Admissibility and minimaxity of generalized B ayes estimators for spherically symmetric family, Journal of Multivariate Analysis, 99, 50--73

2008

[19] [19]

Moran, G. E., V. Ro c kov \'a , and E. I. George (2019): Variance prior forms for high-dimensional B ayesian variable selection, Bayesian Analysis, 14, 1091--1119

2019

[20] [20]

Piironen, J. and A. Vehtari (2017 a ): On the Hyperprior Choice for the Global Shrinkage Parameter in the Horseshoe Prior, in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR, vol. 54 of Proceedings of Machine Learning Research, 905--913

2017

[21] [21]

--- -.1pt --- -.1pt --- (2017 b ): Sparsity information and regularization in the horseshoe and other shrinkage priors, Electronic Journal of Statistics, 11, 5018--5051

2017

[22] [22]

Polson, N. G. and J. G. Scott (2012): On the Half-Cauchy Prior for a Global Scale Parameter, Bayesian Analysis, 7, 887--902

2012

[23] [23]

(2018): Bayesian estimation of sparse signals with a continuous spike-and-slab prior, Annals of Statistics, 46, 401--437

Ro c kov \'a , V. (2018): Bayesian estimation of sparse signals with a continuous spike-and-slab prior, Annals of Statistics, 46, 401--437

2018

[24] [24]

Ro c kov \'a , V. and E. I. George (2018): The Spike-and-Slab LASSO , Journal of the American Statistical Association, 113, 431--444

2018

[25] [25]

Scott, J. G. and J. O. Berger (2010): Bayes and empirical- B ayes multiplicity adjustment in the variable-selection problem, Annals of Statistics, 38, 2587--2619

2010

[26] [26]

Stein, C. (1956): Inadmissibility of the usual estimator for the mean of a multivariate normal distribution, in Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, 197--206

1956

[27] [27]

Stein, C. M. (1981): Estimation of the mean of a multivariate normal distribution, Annals of Statistics, 9, 1135--1151

1981

[28] [28]

Strawderman, W. E. (1971): Proper B ayes minimax estimators of the multivariate normal mean, Annals of Mathematical Statistics, 42, 385--388

1971

[29] [29]

van der Pas, S. L., B. J. Kleijn, and A. W. van der Vaart (2014): The horseshoe estimator: Posterior concentration around nearly black vectors, Electronic Journal of Statistics, 8, 2585--2618

2014

[30] [30]

Vershynin, R. (2018): High-Dimensional Probability: An Introduction with Applications in Data Science, Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge: Cambridge University Press

2018

[31] [31]

Zhang, Y. D., B. P. Naughton, H. D. Bondell, and B. J. Reich (2022): Bayesian regression using a prior on the model fit: The R 2- D 2 shrinkage prior, Journal of the American Statistical Association, 117, 862--874

2022