Robust Bayesian Predictive Model Selection using Bregman Divergence

Dipak K. Dey; Jongwoo Choi; Neil A. Spencer

arxiv: 2606.10409 · v1 · pith:LWPKGERBnew · submitted 2026-06-09 · 📊 stat.ME

Robust Bayesian Predictive Model Selection using Bregman Divergence

Jongwoo Choi , Neil A. Spencer , Dipak K. Dey This is my paper

Pith reviewed 2026-06-27 12:41 UTC · model grok-4.3

classification 📊 stat.ME

keywords Bayesian model selectionBregman divergencegeneralized ELPDleave-one-out cross-validationmodel misspecificationrobust predictive comparisonbeta-divergencegeneralized posterior

0 comments

The pith

Replacing the log score with a Bregman divergence in leave-one-out cross-validation yields a predictive model selector that asymptotically picks the closest distribution under misspecification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a generalized expected log predictive density that substitutes a Bregman scoring rule for the usual log score when updating parameters via a generalized posterior and when scoring out-of-sample predictive utility. Candidate models are then ranked directly by this proper-score utility, with the beta-divergence family singled out because its tuning parameter reduces the influence of low-density observations. Under misspecification the ranking is shown to converge to the model whose predictive distribution minimizes the chosen Bregman divergence to the data-generating process. The change in scoring rule is motivated by the known sensitivity of the log score to outliers and tail behavior in standard LOO cross-validation.

Core claim

A score-matched generalized ELPD framework replaces the log score by a Bregman scoring rule both to form the generalized posterior and to evaluate leave-one-out predictive utility; under model misspecification this procedure asymptotically selects the model whose predictive distribution is closest to the data-generating process under the chosen Bregman divergence.

What carries the argument

The Bregman scoring rule and its associated generalized posterior, which together define the generalized ELPD used for predictive utility ranking.

If this is right

Model rankings become tunable for outlier sensitivity by choice of the beta parameter in the beta-divergence family.
In microbial and forensic data examples the selected model can differ from the one chosen by ordinary ELPD because low-density observations exert less influence.
The framework supplies a direct proper-score generalization of standard leave-one-out cross-validation.
Asymptotic consistency targets the predictive distribution that minimizes the chosen divergence rather than the Kullback-Leibler divergence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same generalized-posterior construction could be applied with other proper scoring rules to achieve robustness properties not limited to the Bregman family.
In settings with heavy tails or contamination the method offers a concrete way to trade bias for reduced variance in model selection.
The divergence-minimizing property suggests that model averaging weights derived from the generalized ELPD would also converge to weights concentrated on the closest predictive distributions.

Load-bearing premise

The Bregman scoring rule and generalized posterior produce an out-of-sample utility ranking that is asymptotically consistent for the divergence-minimizing model, without explicit conditions stated on the model class or data-generating process.

What would settle it

A Monte Carlo experiment in which the procedure repeatedly selects a model whose predictive distribution does not minimize the target Bregman divergence to the known data-generating process would falsify the asymptotic selection claim.

Figures

Figures reproduced from arXiv: 2606.10409 by Dipak K. Dey, Jongwoo Choi, Neil A. Spencer.

**Figure 1.** Figure 1: Histogram shows n = 1000 simulated data from q(x) = (1−ϵ) N (x; 0, 1)+ϵ N (x; 0, 102 ), with overlays of M1 : N (0, 1) (center-correct, light-tailed) and M2 : t2(3, 1) (miscentered, heavytailed). Despite their broad success, ELPD-based criteria can behave undesirably in the presence of outliers or heavy-tailed observations. To make this concrete, consider the contaminated normal DGP q(x) = (1 − ϵ) N (x;… view at source ↗

**Figure 2.** Figure 2: Robust predictive model selection in the contaminated normal simulation. [PITH_FULL_IMAGE:figures/full_fig_p018_2.png] view at source ↗

**Figure 3.** Figure 3: Robust predictive model selection for the [PITH_FULL_IMAGE:figures/full_fig_p020_3.png] view at source ↗

**Figure 4.** Figure 4: The binarized contact grid and RAC locations for the five shoe treads. Orange tiles [PITH_FULL_IMAGE:figures/full_fig_p021_4.png] view at source ↗

**Figure 5.** Figure 5: Robust predictive model selection for the synthetic JESA dataset. [PITH_FULL_IMAGE:figures/full_fig_p023_5.png] view at source ↗

read the original abstract

Predictive Bayesian model comparison often relies on leave-one-out (LOO) cross-validation criteria such as the expected log predictive density (ELPD). However, model rankings can be overly sensitive to outliers and tail mismatch because ELPD is based on the log score. We propose a score-matched generalized ELPD framework that replaces the log score by a Bregman scoring rule to update model parameters through a generalized posterior and to evaluate LOO predictive utility. Candidate posterior predictive distributions are ranked by out-of-sample utility under the chosen scoring rule, yielding a direct proper-score generalization of standard ELPD. We focus especially on the $\beta$-divergence family, where $\beta$ controls the sensitivity of predictive comparison to low-density observations. Under model misspecification, the procedure asymptotically selects the model whose predictive distribution is closest to the data-generating process under the chosen Bregman divergence. A simulation study and applications to microbial and forensic data show that the generalized ELPD can change the selected model through reduced sensitivity to low-density observations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Bregman generalization of ELPD adds tunable robustness but the asymptotic claim lacks supporting conditions.

read the letter

The paper replaces the usual log score in leave-one-out cross-validation with a Bregman divergence to make Bayesian predictive model selection more robust to outliers. The beta family is the main example, where the tuning parameter controls how much low-density observations affect the ranking.

What is new is the construction of a generalized posterior using the Bregman scoring rule and then evaluating out-of-sample utility with the same rule. This gives a direct generalization of ELPD that still uses proper scores. The simulations and two real-data applications indicate that the rankings can shift compared to standard ELPD.

The work is straightforward and addresses a known limitation of the log score in the presence of tail discrepancies. It builds on existing proper scoring rule ideas without overcomplicating the setup.

The soft spot is the asymptotic consistency claim. The abstract states that the procedure asymptotically selects the model minimizing the chosen divergence to the data-generating process, but supplies no regularity conditions, no proof outline, and no discussion of when the generalized posterior concentrates or when the LOO estimate converges to the population quantity. That makes the central theoretical result hard to evaluate from the given information.

This paper is for methodologists and applied statisticians interested in robust alternatives to standard Bayesian model comparison tools. A reader already familiar with ELPD and scoring rules will see the extension quickly.

It should go to peer review. The idea is solid enough and the practical difference is demonstrated, though the theory section will likely need more detail on the conditions for the asymptotic result.

Referee Report

2 major / 1 minor

Summary. The paper proposes a generalized ELPD framework that replaces the log score with a Bregman scoring rule (focusing on the β-divergence family) both to form a generalized posterior and to compute LOO predictive utility for model ranking. Under misspecification the procedure is claimed to asymptotically select the predictive distribution minimizing the chosen divergence to the DGP. Simulations and applications to microbial and forensic data are reported to produce different model rankings than standard ELPD due to reduced sensitivity to low-density observations.

Significance. If the asymptotic selection property can be rigorously established, the framework would supply a tunable robust alternative to ELPD-based predictive model comparison. The empirical illustrations already show that altering the scoring rule can change selected models, which is of practical interest in misspecified settings. However, the absence of any derivation, regularity conditions, or quantitative verification of the generalized posterior concentration undermines the central claim and therefore the current significance of the contribution.

major comments (2)

[Abstract] Abstract: The asymptotic selection claim (“the procedure asymptotically selects the model whose predictive distribution is closest to the data-generating process under the chosen Bregman divergence”) is stated without any derivation, reference to a theorem, or list of regularity conditions (compactness of parameter space, uniform integrability of the score, uniqueness of the minimizer, ergodicity of the data process). This is the load-bearing theoretical result; its absence prevents assessment of whether the generalized posterior and LOO utility ranking are consistent for the divergence minimizer.
[Abstract / Method description] The construction of the generalized posterior via replacement of the log score by the Bregman scoring rule is described only at a high level; no explicit form of the generalized posterior, no proof that it concentrates at the expected-score minimizer, and no discussion of how the β parameter enters the posterior are supplied. These steps are required for the subsequent LOO ranking argument.

minor comments (1)

[Abstract] The phrase “score-matched generalized ELPD framework” is introduced without a precise definition or equation linking the Bregman score to the leave-one-out utility; a short clarifying equation would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and for highlighting the need for explicit theoretical support. We agree that the current manuscript presents the asymptotic selection property and the generalized posterior construction at a high level. Below we address each major comment and commit to adding the required derivations and explicit forms in a revised version.

read point-by-point responses

Referee: [Abstract] Abstract: The asymptotic selection claim (“the procedure asymptotically selects the model whose predictive distribution is closest to the data-generating process under the chosen Bregman divergence”) is stated without any derivation, reference to a theorem, or list of regularity conditions (compactness of parameter space, uniform integrability of the score, uniqueness of the minimizer, ergodicity of the data process). This is the load-bearing theoretical result; its absence prevents assessment of whether the generalized posterior and LOO utility ranking are consistent for the divergence minimizer.

Authors: We acknowledge that the abstract asserts the asymptotic selection property without a derivation or list of regularity conditions in the main text. Although the claim is a direct consequence of standard consistency results for generalized posteriors defined by proper scoring rules, we agree that a self-contained argument is required. In the revision we will add a dedicated theoretical section that derives the asymptotic selection result under explicit regularity conditions (compact parameter space, uniform integrability of the Bregman score, uniqueness of the minimizer, and ergodicity of the data-generating process). revision: yes
Referee: [Abstract / Method description] The construction of the generalized posterior via replacement of the log score by the Bregman scoring rule is described only at a high level; no explicit form of the generalized posterior, no proof that it concentrates at the expected-score minimizer, and no discussion of how the β parameter enters the posterior are supplied. These steps are required for the subsequent LOO ranking argument.

Authors: We accept the criticism that the generalized posterior is introduced only conceptually. The revised manuscript will supply the explicit functional form of the generalized posterior, prove its concentration at the minimizer of the expected Bregman score (under the regularity conditions listed in the response to the first comment), and detail how the tuning parameter β enters both the posterior and the LOO utility through the β-divergence scoring rule. revision: yes

Circularity Check

0 steps flagged

No circularity: asymptotic claim rests on external properties of proper scoring rules

full rationale

The paper's central claim—that the procedure asymptotically selects the Bregman-divergence-minimizing predictive distribution under misspecification—is presented as a direct consequence of the general theory of proper scoring rules and generalized posteriors. No equation or derivation step within the abstract or described framework reduces this result to a fitted parameter, self-defined quantity, or load-bearing self-citation internal to the paper. The consistency argument is invoked from established scoring-rule properties rather than constructed tautologically inside the manuscript, rendering the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on the standard mathematical fact that Bregman divergences induce proper scoring rules and on the domain assumption that a generalized posterior defined via the same score yields asymptotically consistent model ranking under misspecification.

free parameters (1)

β
Tuning parameter that controls sensitivity to low-density observations; chosen by the user.

axioms (1)

standard math Bregman divergences define proper scoring rules whose expected value is minimized by the true predictive distribution.
Invoked to justify both the generalized posterior and the out-of-sample utility ranking.

pith-pipeline@v0.9.1-grok · 5706 in / 1138 out tokens · 17112 ms · 2026-06-27T12:41:14.680216+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

184 extracted references · 7 canonical work pages

[1]

S.-I. Amari. -divergence is unique, belonging to both f -divergence and Bregman divergence classes . IEEE Transactions on Information Theory, 55 0 (11): 0 4925--4931, 2009

2009
[2]

M. J. Angilletta Jr. Estimating and comparing thermal performance curves. Journal of Thermal Biology, 31 0 (7): 0 541--545, 2006

2006
[3]

Banerjee, S

A. Banerjee, S. Merugu, I. S. Dhillon, J. Ghosh, and J. Lafferty. Clustering with Bregman divergences. Journal of machine learning research, 6 0 (10), 2005

2005
[4]

A. Basu, I. R. Harris, N. L. Hjort, and M. Jones. Robust and efficient estimation by minimising a density power divergence. Biometrika, 85 0 (3): 0 549--559, 1998

1998
[5]

J. O. Berger. Statistical decision theory and Bayesian analysis. Springer Science & Business Media, 1985

1985
[6]

J. O. Berger. An overview of robust Bayesian analysis . Test, 3 0 (1): 0 5--124, 1994

1994
[7]

R. H. Berk. Limiting behavior of posterior distributions when the model is incorrect. The Annals of Mathematical Statistics, 37 0 (1): 0 51--58, 1966

1966
[8]

J. M. Bernardo and A. F. Smith. Bayesian Theory, volume 586. Wiley Online Library, 1994

1994
[9]

Besag, J

J. Besag, J. York, and A. Molli \'e . Bayesian image restoration, with two applications in spatial statistics. Annals of the institute of statistical mathematics, 43 0 (1): 0 1--20, 1991

1991
[10]

Bayesian fractional posteriors , volume =

A. Bhattacharya, D. Pati, and Y. Yang. Bayesian fractional posteriors. The Annals of Statistics, 47 0 (1): 0 39 -- 66, 2019. doi:10.1214/18-AOS1712. URL https://doi.org/10.1214/18-AOS1712

work page doi:10.1214/18-aos1712 2019
[11]

P. G. Bissiri, C. C. Holmes, and S. G. Walker. A general framework for updating belief distributions. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78 0 (5): 0 1103--1130, 2016

2016
[12]

L. M. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR computational mathematics and mathematical physics, 7 0 (3): 0 200--217, 1967

1967
[13]

Bunke and X

O. Bunke and X. Milhaud. Asymptotic behavior of Bayes estimates under possibly incorrect models . The Annals of Statistics, 26 0 (2): 0 617 -- 644, 1998. doi:10.1214/aos/1028144851. URL https://doi.org/10.1214/aos/1028144851

work page doi:10.1214/aos/1028144851 1998
[14]

Carpenter, A

B. Carpenter, A. Gelman, M. D. Hoffman, D. Lee, B. Goodrich, M. Betancourt, M. Brubaker, J. Guo, P. Li, and A. Riddell. Stan: A probabilistic programming language. Journal of statistical software, 76: 0 1--32, 2017

2017
[15]

P. S. Chodrow. Equivalence of informations characterizes Bregman divergences. Entropy, 27 0 (7), 2025. ISSN 1099-4300. doi:10.3390/e27070766. URL https://www.mdpi.com/1099-4300/27/7/766

work page doi:10.3390/e27070766 2025
[16]

D. K. Dey and L. R. Birmiwal. Robust Bayesian analysis using divergence measures . Statistics & Probability Letters, 20 0 (4): 0 287--294, 1994

1994
[17]

B. A. Frigyik, S. Srivastava, and M. R. Gupta. Functional Bregman Divergence and Bayesian Estimation of Distributions . IEEE Transactions on Information Theory, 54 0 (11): 0 5130--5139, 2008. doi:10.1109/TIT.2008.929943

work page doi:10.1109/tit.2008.929943 2008
[18]

S. Geisser. The predictive sample reuse method with applications. Journal of the American statistical Association, 70 0 (350): 0 320--328, 1975

1975
[19]

A. E. Gelfand, D. K. Dey, and H. Chang. Model determination using predictive distributions with implementation via sampling based methods. In J. Bernardo, J. Berger, A. Dawid, and A. Smith, editors, Bayesian Statistics 4, pages 147--167. Oxford University Press, 1992

1992
[20]

Ghosh and A

A. Ghosh and A. Basu. Robust Bayes estimation using the density power divergence. Annals of the Institute of Statistical Mathematics, 68 0 (2): 0 413--437, 2016

2016
[21]

Girardi, L

P. Girardi, L. Greco, V. Mameli, M. Musio, W. Racugno, E. Ruli, and L. Ventura. Robust inference for non-linear regression models from the Tsallis score: application to coronavirus disease 2019 contagion in Italy . Stat, 9 0 (1): 0 e309, 2020

2019
[22]

Giummol \`e , V

F. Giummol \`e , V. Mameli, E. Ruli, and L. Ventura. Objective Bayesian inference with proper scoring rules . Test, 28 0 (3): 0 728--755, 2019

2019
[23]

Gneiting and A

T. Gneiting and A. E. Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association, 102 0 (477): 0 359--378, 2007

2007
[24]

Goh and D

G. Goh and D. K. Dey. Bayesian model diagnostics using functional Bregman divergence . Journal of Multivariate Analysis, 124: 0 371--383, 2014

2014
[25]

Goh and D

G. Goh and D. K. Dey. Bayesian model assessment and selection using Bregman divergence . Advances in Statistics-Theory and Applications: Honoring the Contributions of Barry C. Arnold in Statistical Science, pages 295--313, 2021

2021
[26]

Gr \"u nwald

P. Gr \"u nwald. The safe Bayesian : learning the learning rate via the mixability gap. In International Conference on Algorithmic Learning Theory, pages 169--183. Springer, 2012

2012
[27]

P. D. Gr \"u nwald and A. P. Dawid. Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory . The Annals of Statistics, 32 0 (4): 0 1367 -- 1433, 2004

2004
[28]

J. A. Hoeting, D. Madigan, A. E. Raftery, and C. T. Volinsky. Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors . Statistical Science, 14 0 (4): 0 382 -- 417, 1999. doi:10.1214/ss/1009212519

work page doi:10.1214/ss/1009212519 1999
[29]

Hooker and A

G. Hooker and A. N. Vidyashankar. Bayesian model robustness via disparities. Test, 23 0 (3): 0 556--584, 2014

2014
[30]

P. J. Huber. Robust Statistics. Wiley Series in Probability and Mathematical Statistics. Wiley, New York, 1981

1981
[31]

Jewson, J

J. Jewson, J. Q. Smith, and C. Holmes. Principles of Bayesian inference using general divergence criteria . Entropy, 20 0 (6): 0 442, 2018

2018
[32]

Jewson, J

J. Jewson, J. Q. Smith, and C. Holmes. On the Stability of General Bayesian Inference . Bayesian Analysis, pages 1 -- 31, 2024. doi:10.1214/24-BA1502. URL https://doi.org/10.1214/24-BA1502

work page doi:10.1214/24-ba1502 2024
[33]

Kaplan-Damary, M

N. Kaplan-Damary, M. Mandel, Y. Yekutieli, Y. Shor, and S. Wiesner. Location distribution of randomly acquired characteristics on a shoe sole. Journal of Forensic Sciences, 67 0 (5): 0 1801--1809, 2022

2022
[34]

Kellermann, S

V. Kellermann, S. L. Chown, M. F. Schou, I. Aitkenhead, C. Janion-Scheepers, A. Clemson, M. T. Scott, and C. M. Sgr \`o . Comparing thermal performance curves across traits: how consistent are they? Journal of Experimental Biology, 222 0 (11): 0 jeb193433, 2019

2019
[35]

Kellett, D

D. Kellett, D. Lagnado, R. Morgan, and S. Nakhaeizadeh. A Bayesian network approach to evaluating footwear evidence. Forensic Science International: Synergy, 12: 0 100673, 2026. ISSN 2589-871X. doi:https://doi.org/10.1016/j.fsisyn.2026.100673. URL https://www.sciencedirect.com/science/article/pii/S2589871X26000161

work page doi:10.1016/j.fsisyn.2026.100673 2026
[36]

Knoblauch, J

J. Knoblauch, J. E. Jewson, and T. Damoulas. Doubly robust B ayesian inference for non-stationary streaming data with -divergences. Advances in Neural Information Processing Systems, 31, 2018

2018
[37]

Knoblauch, J

J. Knoblauch, J. Jewson, and T. Damoulas. An optimization-centric view on Bayes' rule: reviewing and generalizing variational inference . Journal of Machine Learning Research, 23 0 (132): 0 1--109, 2022

2022
[38]

Kontopoulos, A

D.-G. Kontopoulos, A. Sentis, M. Daufresne, N. Glazman, A. I. Dell, and S. Pawar. No universal mathematical model for thermal performance curves across traits and taxonomic groups. Nature communications, 15 0 (1): 0 8855, 2024

2024
[39]

D. V. Lindley. The choice of variables in B ayesian analysis. Journal of the Royal Statistical Society. Series B (Methodological), 30 0 (2): 0 239--251, 1968

1968
[41]

Martin and N

R. Martin and N. Syring. Direct Gibbs posterior inference on risk minimizers: Construction, concentration, and calibration. In Handbook of Statistics, volume 47, pages 1--41. Elsevier, 2022

2022
[42]

Matsubara, J

T. Matsubara, J. Knoblauch, F.-X. Briol, and C. J. Oates. Robust generalised Bayesian inference for intractable likelihoods . Journal of the Royal Statistical Society Series B: Statistical Methodology, 84 0 (3): 0 997--1022, 2022

2022
[43]

McLatchie, E

Y. McLatchie, E. Fong, D. T. Frazier, and J. Knoblauch. Predictive performance of power posteriors. Biometrika, page asaf034, 2025 a

2025
[44]

McLatchie, S

Y. McLatchie, S. R \"o gnvaldsson, F. Weber, and A. Vehtari. Advances in projection predictive inference. Statistical Science, 40 0 (1): 0 128--147, 2025 b

2025
[45]

J. W. Miller. Asymptotic normality, concentration, and coverage of generalized posteriors. The Journal of Machine Learning Research, 22 0 (1): 0 7598--7650, 2021

2021
[46]

J. W. Miller and D. B. Dunson. Robust Bayesian inference via coarsening. Journal of the American Statistical Association, 114 0 (527): 0 1113--1125, 2019

2019
[47]

Nakagawa and S

T. Nakagawa and S. Hashimoto. Robust Bayesian inference via -divergence . Communications in Statistics-Theory and Methods, 49 0 (2): 0 343--360, 2020

2020
[48]

Pacchiardi, S

L. Pacchiardi, S. Khoo, and R. Dutta. Generalized Bayesian likelihood-free inference . Electronic Journal of Statistics, 18 0 (2): 0 3628--3686, 2024

2024
[49]

Piironen and A

J. Piironen and A. Vehtari. Comparison of Bayesian predictive methods for model selection . Statistics and Computing, 27: 0 711--735, 2017

2017
[50]

Piironen, M

J. Piironen, M. Paasiniemi, and A. Vehtari. Projective inference in high-dimensional problems: prediction and feature selection. Electronic Journal of Statistics, 14 0 (1): 0 2155 -- 2197, 2020

2020
[51]

D. A. Ratkowsky, J. Olley, and T. Ross. Unifying temperature effects on the growth rate of bacteria and the stability of globular proteins. Journal of theoretical biology, 233 0 (3): 0 351--362, 2005

2005
[52]

T. Sawa. Information criteria for discriminating among alternative regression models. Econometrica: Journal of the Econometric Society, pages 1273--1291, 1978

1978
[53]

B. J. Sinclair, K. E. Marshall, M. A. Sewell, D. L. Levesque, C. S. Willett, S. Slotsbo, Y. Dong, C. D. Harley, D. J. Marshall, B. S. Helmuth, et al. Can we predict ectotherm responses to climate change using thermal performance curves and body temperatures? Ecology letters, 19 0 (11): 0 1372--1385, 2016

2016
[54]

Sivula, M

T. Sivula, M. Magnusson, A. A. Matamoros, and A. Vehtari. Uncertainty in Bayesian leave-one-out cross-validation based model comparison . Bayesian Analysis, 1 0 (1): 0 1--31, 2025

2025
[55]

N. A. Spencer and J. S. Murray. A Bayesian hierarchical model for evaluating forensic footwear evidence. The Annals of Applied Statistics, 14 0 (3): 0 1449--1470, 2020

2020
[56]

M. Stone. Cross-validation and multinomial prediction. Biometrika, pages 509--515, 1974

1974
[57]

Sugasawa

S. Sugasawa. Robust empirical Bayes small area estimation with density power divergence. Biometrika, 107 0 (2): 0 467--480, 2020

2020
[58]

Vehtari and J

A. Vehtari and J. Ojanen. A survey of Bayesian predictive methods for model assessment, selection and comparison . Statistics Surveys, 6 0 (none): 0 142 -- 228, 2012

2012
[59]

Vehtari, A

A. Vehtari, A. Gelman, and J. Gabry. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC . Statistics and computing, 27: 0 1413--1432, 2017

2017
[60]

Vehtari, D

A. Vehtari, D. Simpson, A. Gelman, Y. Yao, and J. Gabry. Pareto smoothed importance sampling. Journal of Machine Learning Research, 25 0 (72): 0 1--58, 2024

2024
[61]

Wiesner, Y

S. Wiesner, Y. Shor, T. Tsach, N. Kaplan-Damary, and Y. Yekutieli. Dataset of digitized racs and their rarity score analysis for strengthening shoeprint evidence. Journal of forensic sciences, 65 0 (3): 0 762--774, 2020

2020
[62]

Y. Yao, A. Vehtari, D. Simpson, and A. Gelman. Using stacking to average Bayesian predictive distributions (with discussion) . Bayesian Analysis, 13 0 (3): 0 917--1003, 2018

2018
[63]

Statistics Surveys , number =

Aki Vehtari and Janne Ojanen , title =. Statistics Surveys , number =
[64]

Bayesian Analysis , volume=

Sivula, Tuomas and Magnusson, M. Bayesian Analysis , volume=. 2025 , publisher=

2025
[65]

Journal of statistical software , volume=

Stan: A probabilistic programming language , author=. Journal of statistical software , volume=
[66]

2017 , publisher=

Piironen, Juho and Vehtari, Aki , journal=. 2017 , publisher=

2017
[67]

Journal of the American statistical Association , volume=

The predictive sample reuse method with applications , author=. Journal of the American statistical Association , volume=. 1975 , publisher=

1975
[68]

Journal of the American Statistical Association , volume=

A predictive approach to model selection , author=. Journal of the American Statistical Association , volume=. 1979 , publisher=

1979
[69]

, author=

Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. , author=. Journal of machine learning research , volume=
[70]

Danyela Kellett and David Lagnado and Ruth Morgan and Sherry Nakhaeizadeh , doi =. A. Forensic Science International: Synergy , keywords =. 2026 , bdsk-url-1 =

2026
[71]

Journal of forensic sciences , volume=

Dataset of digitized RACs and their rarity score analysis for strengthening shoeprint evidence , author=. Journal of forensic sciences , volume=. 2020 , publisher=

2020
[72]

Spencer, Neil A and Murray, Jared S , journal=. A. 2020 , publisher=

2020
[73]

Journal of Forensic Sciences , volume=

Location distribution of randomly acquired characteristics on a shoe sole , author=. Journal of Forensic Sciences , volume=. 2022 , publisher=

2022
[74]

Annals of the institute of statistical mathematics , volume=

Bayesian image restoration, with two applications in spatial statistics , author=. Annals of the institute of statistical mathematics , volume=. 1991 , publisher=

1991
[75]

arXiv preprint arXiv:2602.07006 , year=

Scalable spatial point process models for forensic footwear analysis , author=. arXiv preprint arXiv:2602.07006 , year=

Pith/arXiv arXiv
[76]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Bayesian measures of model complexity and fit , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2002 , publisher=

2002
[77]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Predictive model selection , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 1995 , publisher=

1995
[78]

Biometrika , volume=

Model choice: a minimum posterior predictive loss approach , author=. Biometrika , volume=. 1998 , publisher=

1998
[79]

Journal of the American Statistical Association , volume=

Bayes factors , author=. Journal of the American Statistical Association , volume=. 1995 , publisher=

1995
[80]

Journal of the American Statistical Association , volume=

Markov chain monte carlo methods for computing Bayes factors: A comparative review , author=. Journal of the American Statistical Association , volume=. 2001 , publisher=

2001
[81]

Optimal predictive model selection , author=

Showing first 80 references.

[1] [1]

S.-I. Amari. -divergence is unique, belonging to both f -divergence and Bregman divergence classes . IEEE Transactions on Information Theory, 55 0 (11): 0 4925--4931, 2009

2009

[2] [2]

M. J. Angilletta Jr. Estimating and comparing thermal performance curves. Journal of Thermal Biology, 31 0 (7): 0 541--545, 2006

2006

[3] [3]

Banerjee, S

A. Banerjee, S. Merugu, I. S. Dhillon, J. Ghosh, and J. Lafferty. Clustering with Bregman divergences. Journal of machine learning research, 6 0 (10), 2005

2005

[4] [4]

A. Basu, I. R. Harris, N. L. Hjort, and M. Jones. Robust and efficient estimation by minimising a density power divergence. Biometrika, 85 0 (3): 0 549--559, 1998

1998

[5] [5]

J. O. Berger. Statistical decision theory and Bayesian analysis. Springer Science & Business Media, 1985

1985

[6] [6]

J. O. Berger. An overview of robust Bayesian analysis . Test, 3 0 (1): 0 5--124, 1994

1994

[7] [7]

R. H. Berk. Limiting behavior of posterior distributions when the model is incorrect. The Annals of Mathematical Statistics, 37 0 (1): 0 51--58, 1966

1966

[8] [8]

J. M. Bernardo and A. F. Smith. Bayesian Theory, volume 586. Wiley Online Library, 1994

1994

[9] [9]

Besag, J

J. Besag, J. York, and A. Molli \'e . Bayesian image restoration, with two applications in spatial statistics. Annals of the institute of statistical mathematics, 43 0 (1): 0 1--20, 1991

1991

[10] [10]

Bayesian fractional posteriors , volume =

A. Bhattacharya, D. Pati, and Y. Yang. Bayesian fractional posteriors. The Annals of Statistics, 47 0 (1): 0 39 -- 66, 2019. doi:10.1214/18-AOS1712. URL https://doi.org/10.1214/18-AOS1712

work page doi:10.1214/18-aos1712 2019

[11] [11]

P. G. Bissiri, C. C. Holmes, and S. G. Walker. A general framework for updating belief distributions. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78 0 (5): 0 1103--1130, 2016

2016

[12] [12]

L. M. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR computational mathematics and mathematical physics, 7 0 (3): 0 200--217, 1967

1967

[13] [13]

Bunke and X

O. Bunke and X. Milhaud. Asymptotic behavior of Bayes estimates under possibly incorrect models . The Annals of Statistics, 26 0 (2): 0 617 -- 644, 1998. doi:10.1214/aos/1028144851. URL https://doi.org/10.1214/aos/1028144851

work page doi:10.1214/aos/1028144851 1998

[14] [14]

Carpenter, A

B. Carpenter, A. Gelman, M. D. Hoffman, D. Lee, B. Goodrich, M. Betancourt, M. Brubaker, J. Guo, P. Li, and A. Riddell. Stan: A probabilistic programming language. Journal of statistical software, 76: 0 1--32, 2017

2017

[15] [15]

P. S. Chodrow. Equivalence of informations characterizes Bregman divergences. Entropy, 27 0 (7), 2025. ISSN 1099-4300. doi:10.3390/e27070766. URL https://www.mdpi.com/1099-4300/27/7/766

work page doi:10.3390/e27070766 2025

[16] [16]

D. K. Dey and L. R. Birmiwal. Robust Bayesian analysis using divergence measures . Statistics & Probability Letters, 20 0 (4): 0 287--294, 1994

1994

[17] [17]

B. A. Frigyik, S. Srivastava, and M. R. Gupta. Functional Bregman Divergence and Bayesian Estimation of Distributions . IEEE Transactions on Information Theory, 54 0 (11): 0 5130--5139, 2008. doi:10.1109/TIT.2008.929943

work page doi:10.1109/tit.2008.929943 2008

[18] [18]

S. Geisser. The predictive sample reuse method with applications. Journal of the American statistical Association, 70 0 (350): 0 320--328, 1975

1975

[19] [19]

A. E. Gelfand, D. K. Dey, and H. Chang. Model determination using predictive distributions with implementation via sampling based methods. In J. Bernardo, J. Berger, A. Dawid, and A. Smith, editors, Bayesian Statistics 4, pages 147--167. Oxford University Press, 1992

1992

[20] [20]

Ghosh and A

A. Ghosh and A. Basu. Robust Bayes estimation using the density power divergence. Annals of the Institute of Statistical Mathematics, 68 0 (2): 0 413--437, 2016

2016

[21] [21]

Girardi, L

P. Girardi, L. Greco, V. Mameli, M. Musio, W. Racugno, E. Ruli, and L. Ventura. Robust inference for non-linear regression models from the Tsallis score: application to coronavirus disease 2019 contagion in Italy . Stat, 9 0 (1): 0 e309, 2020

2019

[22] [22]

Giummol \`e , V

F. Giummol \`e , V. Mameli, E. Ruli, and L. Ventura. Objective Bayesian inference with proper scoring rules . Test, 28 0 (3): 0 728--755, 2019

2019

[23] [23]

Gneiting and A

T. Gneiting and A. E. Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association, 102 0 (477): 0 359--378, 2007

2007

[24] [24]

Goh and D

G. Goh and D. K. Dey. Bayesian model diagnostics using functional Bregman divergence . Journal of Multivariate Analysis, 124: 0 371--383, 2014

2014

[25] [25]

Goh and D

G. Goh and D. K. Dey. Bayesian model assessment and selection using Bregman divergence . Advances in Statistics-Theory and Applications: Honoring the Contributions of Barry C. Arnold in Statistical Science, pages 295--313, 2021

2021

[26] [26]

Gr \"u nwald

P. Gr \"u nwald. The safe Bayesian : learning the learning rate via the mixability gap. In International Conference on Algorithmic Learning Theory, pages 169--183. Springer, 2012

2012

[27] [27]

P. D. Gr \"u nwald and A. P. Dawid. Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory . The Annals of Statistics, 32 0 (4): 0 1367 -- 1433, 2004

2004

[28] [28]

J. A. Hoeting, D. Madigan, A. E. Raftery, and C. T. Volinsky. Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors . Statistical Science, 14 0 (4): 0 382 -- 417, 1999. doi:10.1214/ss/1009212519

work page doi:10.1214/ss/1009212519 1999

[29] [29]

Hooker and A

G. Hooker and A. N. Vidyashankar. Bayesian model robustness via disparities. Test, 23 0 (3): 0 556--584, 2014

2014

[30] [30]

P. J. Huber. Robust Statistics. Wiley Series in Probability and Mathematical Statistics. Wiley, New York, 1981

1981

[31] [31]

Jewson, J

J. Jewson, J. Q. Smith, and C. Holmes. Principles of Bayesian inference using general divergence criteria . Entropy, 20 0 (6): 0 442, 2018

2018

[32] [32]

Jewson, J

J. Jewson, J. Q. Smith, and C. Holmes. On the Stability of General Bayesian Inference . Bayesian Analysis, pages 1 -- 31, 2024. doi:10.1214/24-BA1502. URL https://doi.org/10.1214/24-BA1502

work page doi:10.1214/24-ba1502 2024

[33] [33]

Kaplan-Damary, M

N. Kaplan-Damary, M. Mandel, Y. Yekutieli, Y. Shor, and S. Wiesner. Location distribution of randomly acquired characteristics on a shoe sole. Journal of Forensic Sciences, 67 0 (5): 0 1801--1809, 2022

2022

[34] [34]

Kellermann, S

V. Kellermann, S. L. Chown, M. F. Schou, I. Aitkenhead, C. Janion-Scheepers, A. Clemson, M. T. Scott, and C. M. Sgr \`o . Comparing thermal performance curves across traits: how consistent are they? Journal of Experimental Biology, 222 0 (11): 0 jeb193433, 2019

2019

[35] [35]

Kellett, D

D. Kellett, D. Lagnado, R. Morgan, and S. Nakhaeizadeh. A Bayesian network approach to evaluating footwear evidence. Forensic Science International: Synergy, 12: 0 100673, 2026. ISSN 2589-871X. doi:https://doi.org/10.1016/j.fsisyn.2026.100673. URL https://www.sciencedirect.com/science/article/pii/S2589871X26000161

work page doi:10.1016/j.fsisyn.2026.100673 2026

[36] [36]

Knoblauch, J

J. Knoblauch, J. E. Jewson, and T. Damoulas. Doubly robust B ayesian inference for non-stationary streaming data with -divergences. Advances in Neural Information Processing Systems, 31, 2018

2018

[37] [37]

Knoblauch, J

J. Knoblauch, J. Jewson, and T. Damoulas. An optimization-centric view on Bayes' rule: reviewing and generalizing variational inference . Journal of Machine Learning Research, 23 0 (132): 0 1--109, 2022

2022

[38] [38]

Kontopoulos, A

D.-G. Kontopoulos, A. Sentis, M. Daufresne, N. Glazman, A. I. Dell, and S. Pawar. No universal mathematical model for thermal performance curves across traits and taxonomic groups. Nature communications, 15 0 (1): 0 8855, 2024

2024

[39] [39]

D. V. Lindley. The choice of variables in B ayesian analysis. Journal of the Royal Statistical Society. Series B (Methodological), 30 0 (2): 0 239--251, 1968

1968

[40] [41]

Martin and N

R. Martin and N. Syring. Direct Gibbs posterior inference on risk minimizers: Construction, concentration, and calibration. In Handbook of Statistics, volume 47, pages 1--41. Elsevier, 2022

2022

[41] [42]

Matsubara, J

T. Matsubara, J. Knoblauch, F.-X. Briol, and C. J. Oates. Robust generalised Bayesian inference for intractable likelihoods . Journal of the Royal Statistical Society Series B: Statistical Methodology, 84 0 (3): 0 997--1022, 2022

2022

[42] [43]

McLatchie, E

Y. McLatchie, E. Fong, D. T. Frazier, and J. Knoblauch. Predictive performance of power posteriors. Biometrika, page asaf034, 2025 a

2025

[43] [44]

McLatchie, S

Y. McLatchie, S. R \"o gnvaldsson, F. Weber, and A. Vehtari. Advances in projection predictive inference. Statistical Science, 40 0 (1): 0 128--147, 2025 b

2025

[44] [45]

J. W. Miller. Asymptotic normality, concentration, and coverage of generalized posteriors. The Journal of Machine Learning Research, 22 0 (1): 0 7598--7650, 2021

2021

[45] [46]

J. W. Miller and D. B. Dunson. Robust Bayesian inference via coarsening. Journal of the American Statistical Association, 114 0 (527): 0 1113--1125, 2019

2019

[46] [47]

Nakagawa and S

T. Nakagawa and S. Hashimoto. Robust Bayesian inference via -divergence . Communications in Statistics-Theory and Methods, 49 0 (2): 0 343--360, 2020

2020

[47] [48]

Pacchiardi, S

L. Pacchiardi, S. Khoo, and R. Dutta. Generalized Bayesian likelihood-free inference . Electronic Journal of Statistics, 18 0 (2): 0 3628--3686, 2024

2024

[48] [49]

Piironen and A

J. Piironen and A. Vehtari. Comparison of Bayesian predictive methods for model selection . Statistics and Computing, 27: 0 711--735, 2017

2017

[49] [50]

Piironen, M

J. Piironen, M. Paasiniemi, and A. Vehtari. Projective inference in high-dimensional problems: prediction and feature selection. Electronic Journal of Statistics, 14 0 (1): 0 2155 -- 2197, 2020

2020

[50] [51]

D. A. Ratkowsky, J. Olley, and T. Ross. Unifying temperature effects on the growth rate of bacteria and the stability of globular proteins. Journal of theoretical biology, 233 0 (3): 0 351--362, 2005

2005

[51] [52]

T. Sawa. Information criteria for discriminating among alternative regression models. Econometrica: Journal of the Econometric Society, pages 1273--1291, 1978

1978

[52] [53]

B. J. Sinclair, K. E. Marshall, M. A. Sewell, D. L. Levesque, C. S. Willett, S. Slotsbo, Y. Dong, C. D. Harley, D. J. Marshall, B. S. Helmuth, et al. Can we predict ectotherm responses to climate change using thermal performance curves and body temperatures? Ecology letters, 19 0 (11): 0 1372--1385, 2016

2016

[53] [54]

Sivula, M

T. Sivula, M. Magnusson, A. A. Matamoros, and A. Vehtari. Uncertainty in Bayesian leave-one-out cross-validation based model comparison . Bayesian Analysis, 1 0 (1): 0 1--31, 2025

2025

[54] [55]

N. A. Spencer and J. S. Murray. A Bayesian hierarchical model for evaluating forensic footwear evidence. The Annals of Applied Statistics, 14 0 (3): 0 1449--1470, 2020

2020

[55] [56]

M. Stone. Cross-validation and multinomial prediction. Biometrika, pages 509--515, 1974

1974

[56] [57]

Sugasawa

S. Sugasawa. Robust empirical Bayes small area estimation with density power divergence. Biometrika, 107 0 (2): 0 467--480, 2020

2020

[57] [58]

Vehtari and J

A. Vehtari and J. Ojanen. A survey of Bayesian predictive methods for model assessment, selection and comparison . Statistics Surveys, 6 0 (none): 0 142 -- 228, 2012

2012

[58] [59]

Vehtari, A

A. Vehtari, A. Gelman, and J. Gabry. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC . Statistics and computing, 27: 0 1413--1432, 2017

2017

[59] [60]

Vehtari, D

A. Vehtari, D. Simpson, A. Gelman, Y. Yao, and J. Gabry. Pareto smoothed importance sampling. Journal of Machine Learning Research, 25 0 (72): 0 1--58, 2024

2024

[60] [61]

Wiesner, Y

S. Wiesner, Y. Shor, T. Tsach, N. Kaplan-Damary, and Y. Yekutieli. Dataset of digitized racs and their rarity score analysis for strengthening shoeprint evidence. Journal of forensic sciences, 65 0 (3): 0 762--774, 2020

2020

[61] [62]

Y. Yao, A. Vehtari, D. Simpson, and A. Gelman. Using stacking to average Bayesian predictive distributions (with discussion) . Bayesian Analysis, 13 0 (3): 0 917--1003, 2018

2018

[62] [63]

Statistics Surveys , number =

Aki Vehtari and Janne Ojanen , title =. Statistics Surveys , number =

[63] [64]

Bayesian Analysis , volume=

Sivula, Tuomas and Magnusson, M. Bayesian Analysis , volume=. 2025 , publisher=

2025

[64] [65]

Journal of statistical software , volume=

Stan: A probabilistic programming language , author=. Journal of statistical software , volume=

[65] [66]

2017 , publisher=

Piironen, Juho and Vehtari, Aki , journal=. 2017 , publisher=

2017

[66] [67]

Journal of the American statistical Association , volume=

The predictive sample reuse method with applications , author=. Journal of the American statistical Association , volume=. 1975 , publisher=

1975

[67] [68]

Journal of the American Statistical Association , volume=

A predictive approach to model selection , author=. Journal of the American Statistical Association , volume=. 1979 , publisher=

1979

[68] [69]

, author=

Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. , author=. Journal of machine learning research , volume=

[69] [70]

Danyela Kellett and David Lagnado and Ruth Morgan and Sherry Nakhaeizadeh , doi =. A. Forensic Science International: Synergy , keywords =. 2026 , bdsk-url-1 =

2026

[70] [71]

Journal of forensic sciences , volume=

Dataset of digitized RACs and their rarity score analysis for strengthening shoeprint evidence , author=. Journal of forensic sciences , volume=. 2020 , publisher=

2020

[71] [72]

Spencer, Neil A and Murray, Jared S , journal=. A. 2020 , publisher=

2020

[72] [73]

Journal of Forensic Sciences , volume=

Location distribution of randomly acquired characteristics on a shoe sole , author=. Journal of Forensic Sciences , volume=. 2022 , publisher=

2022

[73] [74]

Annals of the institute of statistical mathematics , volume=

Bayesian image restoration, with two applications in spatial statistics , author=. Annals of the institute of statistical mathematics , volume=. 1991 , publisher=

1991

[74] [75]

arXiv preprint arXiv:2602.07006 , year=

Scalable spatial point process models for forensic footwear analysis , author=. arXiv preprint arXiv:2602.07006 , year=

Pith/arXiv arXiv

[75] [76]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Bayesian measures of model complexity and fit , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2002 , publisher=

2002

[76] [77]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Predictive model selection , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 1995 , publisher=

1995

[77] [78]

Biometrika , volume=

Model choice: a minimum posterior predictive loss approach , author=. Biometrika , volume=. 1998 , publisher=

1998

[78] [79]

Journal of the American Statistical Association , volume=

Bayes factors , author=. Journal of the American Statistical Association , volume=. 1995 , publisher=

1995

[79] [80]

Journal of the American Statistical Association , volume=

Markov chain monte carlo methods for computing Bayes factors: A comparative review , author=. Journal of the American Statistical Association , volume=. 2001 , publisher=

2001

[80] [81]

Optimal predictive model selection , author=