pith. sign in

arxiv: 2606.22850 · v1 · pith:6OQAMMHOnew · submitted 2026-06-22 · 📊 stat.ME

To select or not to select: predictively consistent priors instead of model selection

Pith reviewed 2026-06-26 07:50 UTC · model grok-4.3

classification 📊 stat.ME
keywords predictively consistent priorsmodel selectionBayesian modellingprior predictive distributionout-of-sample predictionvariable selectionlinear regressionlogistic regression
0
0 comments X

The pith

Predictively consistent priors let complex models match or beat selected simpler ones in out-of-sample prediction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies when model selection becomes unnecessary or harmful for prediction in finite samples. It defines predictively consistent priors as those that keep prior predictive distributions stable and sensible even as models grow more complex, for example by adding covariates. In linear and logistic regression, forward selection, and nonlinear cases, models using these priors typically perform as well as or better than simpler models chosen by selection. The work concludes that selection often compensates for poor joint prior implications rather than reflecting a fundamental need for parsimony.

Core claim

Predictively consistent priors keep prior predictive implications stable as model complexity increases. Flexible models equipped with such priors typically match or outperform selected simpler models in out-of-sample predictive performance across examples of adding covariates in linear and logistic regression, forward variable selection, and nonlinear modelling. When selection still improves performance, it signals that the original prior placed excessive mass on implausible predictive values. The authors therefore propose replacing sparsity or parsimony at the level of model components with the requirement that priors remain sensible in predictive space as models become more complex.

What carries the argument

Predictively consistent priors, defined as priors whose prior predictive distributions remain stable and sensible as model complexity increases.

If this is right

  • Model selection can be omitted without harming predictive performance when priors satisfy the consistency condition.
  • Cases where selection improves results point to prior predictive distributions that are already implausible before seeing data.
  • Bayesian workflows can shift effort from comparing discrete models to designing priors that behave sensibly in predictive space.
  • The same logic applies when adding covariates, performing variable selection, or moving to nonlinear structures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Workflows that routinely fit one large model with such a prior may replace pipelines that enumerate and score many candidate models.
  • The same prior-construction principle could be tested in settings beyond regression, such as time-series or spatial models where complexity also increases with added structure.
  • If the stability condition proves hard to satisfy in some model classes, hybrid approaches that combine consistent priors with light selection might still be useful.

Load-bearing premise

Priors can be constructed so that their predictive implications stay stable and reasonable when model complexity grows.

What would settle it

A controlled experiment in which a flexible model with a predictively consistent prior is outperformed by a selected simpler model on out-of-sample predictive metrics across several independent datasets.

Figures

Figures reproduced from arXiv: 2606.22850 by Aki Vehtari, Anna Elisabeth Riha, David Kohns, Leevi Lindgren, Paul-Christian B\"urkner.

Figure 1
Figure 1. Figure 1: Parameter spaces of true-structure models Θ𝑘 and Θ𝑘+1 containing 𝜃0, with Θ𝑘 being the parameter space of the minimal true-structure model since 𝜃0 ∉ Θ𝑘−1. The KL-minimising parameter is 𝜃KL = arg min𝜃 ∈Θ KL (𝑝𝑡(𝑦 || 𝑝(𝑦 | 𝜃)) and 𝜃KL = 𝜃0 if the model is a true-structure model. write M𝑘 (𝑦) := 𝑝M𝑘 ( · | 𝑦, 𝑋) as the posterior predictive, which is evaluated at new 𝑦˜ when assessing test performance. Two mo… view at source ↗
Figure 2
Figure 2. Figure 2: Illustrative example: Part 1. Test performance for a model with one covariate (M1) with prior 𝛽 ∼ normal(0, 1) and the intercept-only model (M0), relative to the oracle model (Equation (6)), averaged across 500 repetitions and rescaled to elpdloo scale. We indicate a difference of 4 to oracle performance with a dotted line. factors (Kass and Raftery, 1995)2, and (2) elpdloo estimated with PSIS-LOO-CV. We a… view at source ↗
Figure 3
Figure 3. Figure 3: Illustrative example: Part 1. Schematic illustration of model performance relative to the oracle model for increasing true effect size in the finite-data regime, comparing the full (true-structure) model (M1), the intercept-only model (M0), Bayesian model averaging and stacking, as well as selecting a model with Bayes factor or PSIS elpdloo. This is a simplified summary based on the results of simulated ex… view at source ↗
Figure 4
Figure 4. Figure 4: Illustrative example: Part 3. Left: realised prior Bayesian-𝑅 2 for 𝑝 ∈ {5, 15, 30} covariates with 𝜎 2 = 1, 1 𝑛 𝑋 𝑇 𝑋 = 𝐼𝑝 and using Var 𝑋 𝑇 𝛽 | 𝛽  = ||𝛽||2 with ||𝛽||2 ∼ 𝜒 2 (𝑝) for normal priors. As 𝑝 increases, the R2D2 prior keeps 𝑅 2 stable, while it concentrates near 1 with independent normal priors. Right: elpdtest (with 𝑛 = 100, 𝜌 = 0.5, 𝑅 2 = 0.5, 𝑛test = 2000, averaged over 500 repetitions), on… view at source ↗
Figure 5
Figure 5. Figure 5: Experiment 1: Prior predictive distributions probabilities as well as prior predictive Bayesian-𝑅 2 for the logistic regression under the standard normal prior and the pseudo-𝑅 2 prior, generating 𝑅˜2 ∼ beta(𝜇𝑅2 , 𝜑𝑅2 ) and 𝛽𝑗 ∼ normal(0, 𝜏2𝜙𝑗𝜎˜ 2 /𝜎 2 𝑥𝑗 ), where 𝜎𝑥𝑗 = 1. Norm al prior R2D2 prior 𝑅 2 = 0 0 10 20 30 -90 -85 -80 -75 -70 elpdtest (on elpdloo scale) Norm al prior R2D2 prior 𝑅 2 = 0.5 0 10 20 … view at source ↗
Figure 6
Figure 6. Figure 6: Experiment 1: Adding covariates in logistic regression. We compare out-of-sample predictive performance (elpdtest) (averaged over 500 repetitions) with normal priors or the pseudo-𝑅 2 prior with true Bayesian-𝑅 2 ∈ {0, 0.5, 0.8} (columns) and 𝜌 = 0.5. A difference of 4 on elpdloo scale (dotted line) to the best-performing model (dashed line) indicates substantially different predictive performance. The pse… view at source ↗
Figure 7
Figure 7. Figure 7: Experiment 2: Forward selection. Predictive performance on independent test data (elpdtest), relative to the reference model rescaled to elpdloo scale (with 𝑛test = 2000, averaged over 100 repetitions). Data is generated with true 𝑅 2 = 0.5 and block correlation 𝜌 ∈ {0, 0.5, 0.9} (columns) and 𝑛 ∈ {100, 200} (rows). A dotted line marks a difference of 4 relative to the best-performing model and three verti… view at source ↗
Figure 8
Figure 8. Figure 8: Experiment 3: Increasing the complexity of nonlinear models. Top: Prior predictive variance (log10-transformed) for one repetition. Bottom: elpdtest (with 𝑛 = 50, 𝑛test = 2000, averaged over 500 repetitions), rescaled to elpdloo scale. For raw polynomials, prior predictive variances explode and elpdtest drops at higher degrees. Only thin plate splines and HSGP remain stable as the degree/number of basis fu… view at source ↗
Figure 9
Figure 9. Figure 9: Experiment 4: Posterior of the treatment effect for increasing 𝑝. Grey lines show the treatment-only model Mbase; coloured lines show Mfull for different 𝑝. When the signal is relatively small (𝑅 2 = 0.2, left panel of figures), the joint R2D2 model produces increasing bias with 𝑝. When the signal is large enough (𝑅 2 = 0.8, right panel of figures), the R2D2 joint model concentrates on 𝛼 ∗ with 𝑝. In eithe… view at source ↗
Figure 10
Figure 10. Figure 10: Illustrative example: Part 1. We compare predictive performance with elpd on independent test data (elpdtest) for the full model with one covariate (M1), the intercept-only model (M0), and four model selection and averaging strategies relative to the oracle model (6) across different training data sizes (rows) and prior choices (columns). The x-axis shows true effect sizes for 𝛽. Results are averages acro… view at source ↗
Figure 11
Figure 11. Figure 11: shows similar patterns as discussed for the normal DGP in [PITH_FULL_IMAGE:figures/full_fig_p047_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Illustrative example: Part 1. Out-of-sample performance for the intercept-only model M0, the full model M1 and selection with model probability > 90%. The other methods considered in Section 3 are plotted as gray lines in the background. The dotted lines indicate M1’s worst-case performance with R2D2 prior [PITH_FULL_IMAGE:figures/full_fig_p048_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Illustrative example: Part 1. Out-of-sample performance for the intercept-only model M0, the full model M1 and selection using stacking weights > 0.5 or 0.9. The other methods in Section 3 are plotted as gray lines in the background. The dotted lines indicate M1’s worst-case performance with R2D2 prior. Normal prior R2D2 prior n = 20 n = 50 n = 100 0.0 0.25 0.5 0.75 1.0 0.0 0.25 0.5 0.75 1.0 -2.0 -1.5 -1.… view at source ↗
Figure 14
Figure 14. Figure 14: Illustrative example: Part 1. Out-of-sample performance for the intercept-only model M0, the full model M1 and selection using pseudo-BMA weights > 0.5 or 0.9. The other methods considered in Section 3 are plotted as gray lines in the background. The dotted lines indicate M1’s worst-case performance with R2D2 prior. 49 [PITH_FULL_IMAGE:figures/full_fig_p049_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Illustrative example: Part 1. Out-of-sample performance for the intercept-only models M0, the full model M1 and selection using LOO-BB weights > 0.5 or 0.9. The other methods considered in Section 3 are plotted as gray lines in the background. The dotted lines indicate M1’s worst-case performance with R2D2 prior. 50 [PITH_FULL_IMAGE:figures/full_fig_p050_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Illustrative example: Part 3. Implied prior Bayesian-𝑅 2 for increasing number of covariates 𝑝 ∈ {5, 15, 30}. We assume 𝜎 2 = 1 and standardised and orthogonal predictors, that is, 1 𝑛 𝑋 𝑇 𝑋 = 𝐼. The first row shows Bayesian-𝑅 2 based on (1) the Beta prior with mean 𝜇𝑅2 = 1/3 and scale 𝜑𝑅2 = 3 in the R2D2 specification and (2) constants based on 𝑅 2 = 𝑝𝜎𝛽/( 𝑝𝜎𝛽 + 𝜎2 ) for the independent normal priors 𝛽 ∼… view at source ↗
Figure 17
Figure 17. Figure 17: Illustrative example: Part 3. Implied prior and posterior 𝑅 2 for models with independent normal prior or R2D2 prior (columns) and 𝑝 ∈ {5, 15, 30} predictors (rows). The true 𝑅 2 = 0.2, indicated by a vertical dashed line. With increasing number of predictors, independent normal priors concentrate a-priori at 𝑅 2 values close to one which also pulls the posterior 𝑅 2 values away from the true value. Resul… view at source ↗
Figure 18
Figure 18. Figure 18: Illustrative example: Part 3. We compare out-of-sample predictive performance (elpdtest) with normal priors or R2D2 prior with true 𝑅 2 ∈ {0, 0.5, 0.8} (columns). The subfigures show results for a DGP with 𝑝 ∈ {30, 100}. Results are averaged over 500 repetitions and each based on 100 observations with uncorrelated predictors (𝜌 = 0) and independent test data with 𝑛test = 2000. Results are on elpdloo scale… view at source ↗
Figure 19
Figure 19. Figure 19: Illustrative example: Part 3. We compare out-of-sample predictive performance (elpdtest) with normal priors or R2D2 prior with true 𝑅 2 ∈ {0, 0.5, 0.8} (columns). The subfigures show results for a DGP with 𝑝 ∈ {30, 100}. Results are averaged over 500 repetitions and each based on 100 observations with correlated predictors with 𝜌 = 0.5 and independent test data with 𝑛test = 2000. Results are on elpdloo sc… view at source ↗
Figure 20
Figure 20. Figure 20: Illustrative example: Part 3. We compare out-of-sample predictive performance (elpdtest) with normal priors or an R2D2 prior with true 𝑅 2 ∈ {0, 0.5, 0.8} (columns). The subfigures show results for a DGP with 𝑝 ∈ {30, 100}. Results are averaged over 500 repetitions and each based on 100 observations with correlated predictors with 𝜌 = 0.9 and independent test data with 𝑛test = 2000. Results are on elpdloo… view at source ↗
Figure 21
Figure 21. Figure 21: Illustrative example: Part 3. Adding covariates in linear regression. We compare selected model sizes across 100 repetitions when using max elpdloo as a selection rule. With independent normal priors, model sizes tend to be smaller than with R2D2 prior and more concentrated across repetitions. With R2D2 prior, the results are more varied across the range of model sizes. We can also select smaller models,… view at source ↗
Figure 22
Figure 22. Figure 22: Experiment 2: Forward selection. elpdtest (average: thicker line, repetitions: thinner lines) with 𝑛test = 2000 for forward search with (1) independent normal priors, (2) R2D2 prior and (3) projection predictive selection via projpred (columns), excluding the intercept-only model. Training data is generated using 𝑝 = 50, true 𝑅 2 = 0.5, 𝑛 ∈ {100, 200} (rows) and correlation 𝜌 ∈ {0, 0.9} (subfigures (a) an… view at source ↗
Figure 23
Figure 23. Figure 23: Experiment 2: Forward selection. Predictive performance on independent test data (elpdtest) with 𝑛test = 2000 on elpdloo scale, relative to the reference model and averaged over 100 repetitions. We compare forward search with (1) independent normal priors, (2) R2D2 prior and (3) projection predictive inference via projpred. We show results for 𝑛 ∈ {100, 200} (rows) and 𝑝 = 50 equally weakly relevant covar… view at source ↗
Figure 24
Figure 24. Figure 24: Experiment 2: Forward selection. elpdtest (average: thicker line, repetitions: thinner lines) with 𝑛test = 2000 on elpdloo scale, relative to the reference model. We compare forward search with (1) independent normal priors, (2) R2D2 prior and (3) projection predictive inference via projpred (columns). We show results with 𝑛 ∈ {100, 200} (rows) and 𝑝 = 50 equally weakly relevant covariates with correlatio… view at source ↗
Figure 25
Figure 25. Figure 25: Experiment 3: Increasing the complexity of nonlinear models. Observed data (𝑛 = 100) and posterior results for six different modelling approaches (columns) with increasing degree/number of basis functions 𝑘 ∈ {5, 10, 12, 15, 20} (rows). We compare a raw polynomial (1) without and (2) with improved initial value, (3) an orthogonal polynomial, as well as thin plate splines (4) with fixed degrees of freedom … view at source ↗
Figure 26
Figure 26. Figure 26: Experiment 3: Increasing the complexity of nonlinear models. We compare basis values of the first five bases for a raw polynomial (Poly (raw)), an orthogonal polynomial (Poly (orthogonal)), as well as thin plate splines (TPS) and HSGPs. The range of the basis values for the raw polynomial is much larger and increases much faster with more bases than the basis values for the other modelling approaches (com… view at source ↗
Figure 27
Figure 27. Figure 27: Experiment 3: Increasing the complexity of nonlinear models. For raw and orthogonal polynomials, elpdtest drops at higher degrees. For 𝑛 = 20, also non-penalised TPS (with fixed degrees of freedom) deteriorate for higher degrees. Only penalised TPS and HSGP remain stable as 𝑘 increases. 65 [PITH_FULL_IMAGE:figures/full_fig_p065_27.png] view at source ↗
Figure 28
Figure 28. Figure 28: Experiment 3: Increasing the complexity of nonlinear models. Number of divergent transitions for each of the modelling approaches averaged across repetitions for each condition. We run four chains with 1000 iterations. The raw polynomials have large numbers of divergent transitions from degree 11 and higher. Better initialisation or orthogonalisation seems to resolve issues with divergent transitions, but… view at source ↗
read the original abstract

Bayesian modelling workflows often consider multiple candidate models of varying complexity. Model selection is commonly used to navigate potential trade-offs between model complexity and generalisability to new data. We study when model selection is unnecessary or can even be harmful for predictive performance in finite data regimes and find that the need for selecting simpler models can depend on prior choice. We formalise predictively consistent priors, which keep prior predictive implications stable as model complexity increases. Across examples and numerical experiments, including adding covariates in linear and logistic regression, forward variable selection, and nonlinear modelling, flexible models with predictively consistent priors typically match or outperform selected simpler models in out-of-sample predictive performance. When selection helps, it can indicate poor joint prior implications, such as excessive prior mass on implausible predictive values. Based on our findings, we propose replacing the notion of sparsity or parsimony at the level of model components with specifying priors that remain sensible in predictive space as models become more complex.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that model selection can be unnecessary or harmful for out-of-sample predictive performance when using predictively consistent priors, which are priors designed to keep prior predictive distributions stable as model complexity increases (e.g., by adding covariates). Across linear and logistic regression, forward variable selection, and nonlinear modeling examples and numerical experiments, flexible models equipped with such priors typically match or outperform selected simpler models. The authors further claim that selection only improves performance when the joint prior is already misspecified in predictive space, and recommend replacing notions of sparsity with priors that remain sensible in predictive space as complexity grows.

Significance. If the central empirical claims hold under more detailed scrutiny, the work offers a substantive alternative perspective in Bayesian statistical methodology: shifting emphasis from post-hoc model selection to upfront prior specification that preserves predictive properties. The numerical demonstrations across multiple model classes provide concrete, falsifiable evidence that could influence prior elicitation practices and reduce reliance on selection procedures in finite-data regimes. The framing of predictively consistent priors as a generalizable concept (rather than ad-hoc per example) is a strength if the construction method generalizes beyond the reported cases.

major comments (2)
  1. [§4] §4 (numerical experiments): The central claim rests on out-of-sample performance comparisons, yet the manuscript provides insufficient detail on the exact predictive metrics employed (e.g., whether log predictive density, CRPS, or MSE), the precise construction rules for the predictively consistent priors in each setting, and any controls for data exclusion or cross-validation scheme. These omissions are load-bearing because they prevent assessment of whether the reported superiority is robust or sensitive to implementation choices.
  2. [Formalisation] Formalisation section: The definition of predictively consistent priors is presented primarily through examples rather than a general, model-class-independent construction that guarantees stability of the prior predictive as complexity increases; without this, the weakest assumption (that such priors exist and can be specified for arbitrary flexible models) remains demonstrated only case-by-case rather than established as a general principle.
minor comments (2)
  1. [Abstract] The abstract and introduction introduce the term 'predictively consistent priors' without an immediate formal definition or reference to the section where it is defined, which would improve readability for readers unfamiliar with the concept.
  2. [Formalisation] Notation for prior predictive distributions could be clarified with an explicit equation early in the formalisation to distinguish it from posterior predictive quantities.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The comments highlight areas where additional clarity will strengthen the manuscript. We address each major comment below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [§4] §4 (numerical experiments): The central claim rests on out-of-sample performance comparisons, yet the manuscript provides insufficient detail on the exact predictive metrics employed (e.g., whether log predictive density, CRPS, or MSE), the precise construction rules for the predictively consistent priors in each setting, and any controls for data exclusion or cross-validation scheme. These omissions are load-bearing because they prevent assessment of whether the reported superiority is robust or sensitive to implementation choices.

    Authors: We agree that greater detail is needed for reproducibility. In the revision we will expand §4 to specify the predictive metrics (primarily log predictive density, with MSE and CRPS where used), the exact construction rules applied to obtain predictively consistent priors in each regression and nonlinear example, and the cross-validation scheme including data partitioning and any exclusion rules. revision: yes

  2. Referee: [Formalisation] Formalisation section: The definition of predictively consistent priors is presented primarily through examples rather than a general, model-class-independent construction that guarantees stability of the prior predictive as complexity increases; without this, the weakest assumption (that such priors exist and can be specified for arbitrary flexible models) remains demonstrated only case-by-case rather than established as a general principle.

    Authors: The formalisation section defines predictively consistent priors via the requirement that the prior predictive distribution remains stable (in a suitable sense) under increases in model complexity. While concrete constructions are illustrated case-by-case, the underlying principle is stated generally. We will revise the section to separate the general definition more clearly from the examples and to discuss the extent to which the principle can be applied to other model classes without providing a single algorithmic template that covers all cases. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper formalizes predictively consistent priors as a definition that stabilizes prior predictive distributions under increasing model complexity, then demonstrates via numerical experiments (linear/logistic regression, variable selection, nonlinear cases) that flexible models using these priors match or exceed the out-of-sample performance of selected simpler models. No load-bearing derivation, equation, or self-citation reduces the performance claims to fitted inputs or definitional equivalence; the argument is empirical and self-contained against external benchmarks. No steps meet the circularity criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the newly introduced concept of predictively consistent priors together with standard Bayesian assumptions about prior predictive distributions and the representativeness of the reported regression and selection examples.

axioms (2)
  • domain assumption Bayesian modelling workflows often consider multiple candidate models of varying complexity
    Opening premise of the abstract describing common practice.
  • domain assumption Model selection is commonly used to navigate potential trade-offs between model complexity and generalisability
    Stated as the standard approach being questioned.
invented entities (1)
  • predictively consistent priors no independent evidence
    purpose: Priors that keep prior predictive implications stable as model complexity increases
    Core new concept introduced to replace the need for model selection; no independent evidence outside the paper's examples is provided in the abstract.

pith-pipeline@v0.9.1-grok · 5710 in / 1399 out tokens · 26528 ms · 2026-06-26T07:50:52.744748+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

232 extracted references · 170 canonical work pages · 4 internal anchors

  1. [1]

    Vershynin, Roman , year =. High-

  2. [2]

    Mathematics of the USSR-Sbornik , author =

    Distribution of eigenvalues for some sets of random matrices , volume =. Mathematics of the USSR-Sbornik , author =. 1967 , pages =

  3. [3]

    , year =

    Bai, Zhidong and Silverstein, Jack W. , year =. Spectral

  4. [4]

    Richard and Murray, Jared S

    Hahn, P. Richard and Murray, Jared S. and Carvalho, Carlos M. , month = sep, year =. Bayesian. Bayesian Analysis , publisher =. doi:10.1214/19-BA1195 , abstract =

  5. [5]

    Social Science & Medicine , author =

    Understanding and misunderstanding randomized controlled trials , volume =. Social Science & Medicine , author =. 2018 , keywords =. doi:10.1016/j.socscimed.2017.12.005 , abstract =

  6. [6]

    Wasserstein

    Kuhn, Daniel and Esfahani, Peyman Mohajerin and Nguyen, Viet Anh and Shafieezadeh-Abadeh, Soroosh , month = oct, year =. Wasserstein. Operations. doi:10.1287/educ.2019.0198 , urldate =

  7. [7]

    Kohns, David and Kallioinen, Noa and McLatchie, Yann and Vehtari, Aki , month = jan, year =. The. Bayesian Analysis , publisher =. doi:10.1214/25-BA1512 , abstract =

  8. [8]

    Sharpening

    Jefferys, William H and Berger, James O , year =. Sharpening

  9. [9]

    Journal of Economic Literature , author =

    Potential. Journal of Economic Literature , author =. 2020 , keywords =. doi:10.1257/jel.20191597 , abstract =

  10. [10]

    Distributional

    Husain, Hisham , month = jun, year =. Distributional. 34th. doi:10.48550/arXiv.2006.04349 , abstract =

  11. [11]

    Hünermund, Paul and Louw, Beyers , month = jan, year =. On the. Organizational Research Methods , publisher =. doi:10.1177/10944281231219274 , abstract =

  12. [12]

    Ghojogh, Benyamin and Crowley, Mark , month = may, year =. The. doi:10.48550/arXiv.1905.12787 , abstract =

  13. [13]

    , year =

    George, Edward I. , year =. The. Journal of the American Statistical Association , publisher =. doi:10.2307/2669776 , number =

  14. [14]

    The garden of forking paths:

    Gelman, Andrew and Loken, Eric , year =. The garden of forking paths:

  15. [15]

    , year =

    Geisser, Seymour and Eddy, William F. , year =. A. Journal of the American Statistical Association , publisher =. doi:10.2307/2286745 , abstract =

  16. [16]

    International Statistical Review , author =

    Priors in. International Statistical Review , author =. 2022 , note =. doi:10.1111/insr.12502 , abstract =

  17. [17]

    Biometrika , author =

    On the marginal likelihood and cross-validation , volume =. Biometrika , author =. 2020 , pages =. doi:10.1093/biomet/asz077 , abstract =

  18. [18]

    Flam-Shepherd, Daniel and Requeima, James and Duvenaud, David , month = jan, year =. Mapping. 31st

  19. [19]

    Characterizing and

    Flam-Shepherd, Daniel and Requeima, James and Duvenaud, David , pages =. Characterizing and. Third workshop on

  20. [20]

    free range statistics , author =

    Stepwise selection of variables in regression is. free range statistics , author =. 2024 , note =

  21. [21]

    GigaScience , author =

    How to select predictive models for decision-making or causal inference , volume =. GigaScience , author =. 2025 , pages =. doi:10.1093/gigascience/giaf016 , abstract =

  22. [22]

    Dawid, A. P. and Stone, M. and Zidek, J. V. , year =. Marginalization. Journal of the Royal Statistical Society. Series B (Methodological) , publisher =

  23. [23]

    Cooper, Alex and Simpson, Dan and Kennedy, Lauren and Forbes, Catherine and Vehtari, Aki , month = jun, year =. Cross-. Bayesian Analysis , publisher =. doi:10.1214/23-BA1409 , abstract =

  24. [24]

    and Polson, Nicholas G

    Carvalho, Carlos M. and Polson, Nicholas G. and Scott, James G. , year =. The horseshoe estimator for sparse signals , volume =. Biometrika , publisher =

  25. [25]

    Defining a

    Campbell, Harlan and Gustafson, Paul , month = sep, year =. Defining a. Bayesian Analysis , publisher =. doi:10.1214/23-BA1397 , abstract =

  26. [26]

    Box, George E. P. , year =. Sampling and. Journal of the Royal Statistical Society. Series A (General) , publisher =. doi:10.2307/2982063 , abstract =

  27. [27]

    Box, George E. P. , year =. Science and. Journal of the American Statistical Association , publisher =. doi:10.2307/2286841 , abstract =

  28. [28]

    Bayesian Analysis , author =

    Group. Bayesian Analysis , author =. 2023 , keywords =. doi:10.1214/23-BA1371 , abstract =

  29. [29]

    Local scale invariance and robustness of proper scoring rules , volume =

    Bolin, David and Wallin, Jonas , month = feb, year =. Local scale invariance and robustness of proper scoring rules , volume =. Statistical Science , publisher =. doi:10.1214/22-STS864 , abstract =

  30. [30]

    Journal of the Economic Science Association , author =

    Some guidance for the choice of priors for. Journal of the Economic Science Association , author =. 2025 , keywords =. doi:10.1017/esa.2025.6 , abstract =

  31. [31]

    Journal of the American Statistical Association , author =

    Dirichlet-. Journal of the American Statistical Association , author =. 2015 , pages =. doi:10.1080/01621459.2014.960967 , abstract =

  32. [32]

    and Willard, Brandon , year =

    Bhadra, Anindya and Datta, Jyotishka and Polson, Nicholas G. and Willard, Brandon , year =. Default. Biometrika , publisher =

  33. [33]

    Atkinson, A. C. and Cox, D. R. , year =. Planning. Journal of the Royal Statistical Society. Series B (Methodological) , publisher =

  34. [34]

    Atkinson, A. C. , year =. Posterior. Biometrika , publisher =. doi:10.2307/2335274 , abstract =

  35. [35]

    Generalized

    Aguilar, Javier Enrique and Bürkner, Paul-Christian , month = jan, year =. Generalized. Bayesian Analysis , publisher =. doi:10.1214/25-BA1524 , abstract =

  36. [36]

    and Ibrahim, Joseph G

    Laud, Purushottam W. and Ibrahim, Joseph G. , year =. Predictive. Journal of the Royal Statistical Society. Series B (Methodological) , publisher =

  37. [37]

    Advances in

    McLatchie, Yann and Rögnvaldsson, Sölvi and Weber, Frank and Vehtari, Aki , month = jan, year =. Advances in. Statistical Science , publisher =. doi:10.1214/24-STS949 , abstract =

  38. [38]

    Mikkola, Petrus and Martin, Osvaldo A. and Chandramouli, Suyog and Hartmann, Marcelo and Pla, Oriol Abril and Thomas, Owen and Pesonen, Henri and Corander, Jukka and Vehtari, Aki and Kaski, Samuel and Bürkner, Paul-Christian and Klami, Arto , month = dec, year =. Prior. Bayesian Analysis , publisher =. doi:10.1214/23-BA1381 , abstract =

  39. [39]

    , month = aug, year =

    Minka, Thomas P. , month = aug, year =. Expectation propagation for approximate. Proceedings of the

  40. [40]

    and Cunningham, John P

    Moran, Gemma E. and Cunningham, John P. and Blei, David M. , month = dec, year =. The. Bayesian Analysis , publisher =. doi:10.1214/22-BA1313 , abstract =

  41. [41]

    , year =

    O'Hagan, Anthony and Forster, Jonathan J. , year =. Kendall's

  42. [42]

    Pericchi, L. R. , year =. An. Biometrika , publisher =. doi:10.2307/2336567 , abstract =

  43. [43]

    projpred:

    Piironen, Juho and Paasiniemi, Markus and Catalina, Alejandro and Weber, Frank and Vehtari, Aki , year =. projpred:

  44. [44]

    Statistics and Computing , author =

    Graphical test for discrete uniformity and its applications in goodness-of-fit evaluation and multiple sample comparison , volume =. Statistics and Computing , author =. 2022 , keywords =. doi:10.1007/s11222-022-10090-6 , abstract =

  45. [45]

    Journal of the Royal Statistical Society

    A. Journal of the Royal Statistical Society. Series B (Methodological) , author =. 1984 , pages =

  46. [46]

    Psychological Methods , author =

    Workflow techniques for the robust use of bayes factors , volume =. Psychological Methods , author =. 2023 , keywords =. doi:10.1037/met0000472 , abstract =

  47. [47]

    Silva, Luca Alessandro and Zanella, Giacomo , month = jul, year =. Robust. Journal of the American Statistical Association , publisher =. doi:10.1080/01621459.2023.2257893 , abstract =

  48. [48]

    Uncertainty in

    Sivula, Tuomas and Magnusson, Måns and Matamoros, Asael Alonzo and Vehtari, Aki , month = jan, year =. Uncertainty in. Bayesian Analysis , publisher =. doi:10.1214/25-BA1569 , abstract =

  49. [49]

    Unbiased estimator for the variance of the leave-one-out cross-validation estimator for a

    Sivula, Tuomas and Magnusson, Måns and Vehtari, Aki , month = aug, year =. Unbiased estimator for the variance of the leave-one-out cross-validation estimator for a. Communications in Statistics - Theory and Methods , publisher =. doi:10.1080/03610926.2021.2021240 , abstract =

  50. [50]

    International Journal of Forecasting , author =

    Evaluating probabilistic forecasts of extremes using continuous ranked probability score distributions , volume =. International Journal of Forecasting , author =. 2023 , keywords =. doi:10.1016/j.ijforecast.2022.07.003 , abstract =

  51. [51]

    Vehtari, Aki and Gabry, Jonah and Magnusson, Mans and Yao, Yuling and Bürkner, Paul-Christian and Paananen, Topi and Gelman, Andrew , year =. loo:

  52. [52]

    Journal of Machine Learning Research , author =

    Bayesian. Journal of Machine Learning Research , author =. 2016 , pages =

  53. [53]

    Journal of Machine Learning Research , author =

    Pareto. Journal of Machine Learning Research , author =. 2024 , pages =

  54. [54]

    Position:

    Wilson, Andrew Gordon , month = oct, year =. Position:. Proceedings of the 42nd

  55. [55]

    , month = jun, year =

    Wilson, Greg and Bryan, Jennifer and Cranston, Karen and Kitzes, Justin and Nederbragt, Lex and Teal, Tracy K. , month = jun, year =. Good enough practices in scientific computing , volume =. PLOS Computational Biology , publisher =. doi:10.1371/journal.pcbi.1005510 , abstract =

  56. [56]

    Bayesian Analysis , author =

    Bayesian. Bayesian Analysis , author =. 2021 , keywords =. doi:10.1214/21-BA1287 , abstract =

  57. [57]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , author =

    Bayesian inference with the l1-ball prior: solving combinatorial problems with exact zeros , volume =. Journal of the Royal Statistical Society Series B: Statistical Methodology , author =. 2023 , pages =. doi:10.1093/jrsssb/qkad076 , abstract =

  58. [58]

    Superconductivity in Tetragonal LaPt_{2-x}Ge_{2+x}

    Yanchenko, Eric and Bondell, Howard D. and Reich, Brian J. , month = jan, year =. The. The American Statistician , publisher =. doi:10.1080/00031305.2024.2352010 , abstract =

  59. [59]

    Yao, Yuling and Vehtari, Aki and Simpson, Daniel and Gelman, Andrew , month = sep, year =. Using. Bayesian Analysis , publisher =. doi:10.1214/17-BA1091 , abstract =

  60. [60]

    Journal of Machine Learning Research , author =

    Pathfinder:. Journal of Machine Learning Research , author =. 2022 , pages =

  61. [61]

    Neural Networks , author =

    Bayesian approach for neural networks—review and case studies , volume =. Neural Networks , author =. 2001 , keywords =. doi:10.1016/S0893-6080(00)00098-8 , abstract =

  62. [62]

    Estimation in moderately misspecified models , url =

    Hjort, Nils Lid , month = may, year =. Estimation in moderately misspecified models , url =. doi:10.48550/arXiv.2603.24632 , abstract =

  63. [63]

    Hjort, Nils Lid , year =. The. Journal of the American Statistical Association , publisher =. doi:10.2307/2290869 , abstract =

  64. [64]

    Cinelli, Carlos and Forney, Andrew and Pearl, Judea , month = aug, year =. A. Sociological Methods & Research , publisher =. doi:10.1177/00491241221099552 , abstract =

  65. [65]

    Statistics and Computing , author =

    The. Statistics and Computing , author =. 2026 , keywords =. doi:10.1007/s11222-025-10812-6 , abstract =

  66. [66]

    and George, Edward I

    Chipman, Hugh A. and George, Edward I. and McCulloch, Robert E. , year =. Bart:. The Annals of Applied Statistics , publisher =

  67. [67]

    Fundamentals of

    Ghosal, Subhashis and van der Vaart, Aad , year =. Fundamentals of. doi:10.1017/9781139029834 , abstract =

  68. [68]

    doi:10.1017/CBO9780511802478 , abstract =

    Bayesian. doi:10.1017/CBO9780511802478 , abstract =

  69. [69]

    and Madigan, David and Hoeting, Jennifer A

    Raftery, Adrian E. and Madigan, David and Hoeting, Jennifer A. , month = mar, year =. Bayesian. Journal of the American Statistical Association , publisher =. doi:10.1080/01621459.1997.10473615 , abstract =

  70. [70]

    Biostatistics , author =

    Penalized logistic regression for detecting gene interactions , volume =. Biostatistics , author =. 2008 , pages =. doi:10.1093/biostatistics/kxm010 , language =

  71. [71]

    Biopsychosocial Science and Medicine , author =

    What you see may not be what you get: a brief, nontechnical introduction to overfitting in regression-type models , volume =. Biopsychosocial Science and Medicine , author =. 2004 , pages =

  72. [72]

    , month = dec, year =

    Neal, Radford M. , month = dec, year =. Bayesian

  73. [73]

    Biometrical Journal , author =

    A coefficient of determination (. Biometrical Journal , author =. 2019 , pages =. doi:10.1002/bimj.201800270 , abstract =

  74. [74]

    Relativistic Hydrodynamics[M/OL]

    Key, Jane T and Pericchi, Luis R and Smith, Adrian F M , editor =. Bayesian. Bayesian. doi:10.1093/oso/9780198504856.003.0015 , abstract =

  75. [75]

    Environmetrics , author =

    Spatial regression modeling via the. Environmetrics , author =. 2024 , keywords =. doi:10.1002/env.2829 , abstract =

  76. [76]

    Covariate

    Zhang, Yan Dora and , Brian P., Naughton and , Howard D., Bondell and Reich, Brian J. , month = apr, year =. Bayesian. Journal of the American Statistical Association , publisher =. doi:10.1080/01621459.2020.1825449 , abstract =

  77. [77]

    Prediction can be safely used as a proxy for explanation in causally consistent

    Scholz, Maximilian and and Bürkner, Paul-Christian , month = apr, year =. Prediction can be safely used as a proxy for explanation in causally consistent. Journal of Statistical Computation and Simulation , publisher =. doi:10.1080/00949655.2024.2449534 , abstract =

  78. [78]

    and Scott, James G

    Polson, Nicholas G. and Scott, James G. , editor =. Shrink. Bayesian. 2011 , doi =

  79. [79]

    Journal of the Royal Statistical Society: Series B (Methodological) , author =

    The. Journal of the Royal Statistical Society: Series B (Methodological) , author =. 1968 , pages =. doi:10.1111/j.2517-6161.1968.tb01505.x , abstract =

  80. [80]

    and Smith, Adrian F

    Bernardo, José M. and Smith, Adrian F. M. , year =. Bayesian. doi:10.1002/9780470316870.oth1 , language =

Showing first 80 references.