Progression to the mean: A practical Bayesian workflow for the development and deployment of clinical prediction models
Pith reviewed 2026-05-20 07:13 UTC · model grok-4.3
The pith
Posterior mean predictions from a pragmatic Bayesian workflow deliver higher clinical utility than plug-in estimates in most simulations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors propose a Bayesian workflow for clinical prediction models that uses shrinkage priors to obtain posterior distributions of regression coefficients via a Laplace or normal approximation, then deploys an individual's posterior mean risk for decision-making on the basis of expected utility. Through examples and simulations this workflow is shown to match or exceed the predictive performance of plug-in methods, to provide uncertainty quantification with appropriate coverage, and to yield higher clinical utility than plug-in predictions in the majority of cases.
What carries the argument
Posterior mean of individual risk, obtained from a Laplace- or normal-approximated Bayesian posterior and used as the decision quantity.
If this is right
- Posterior-mean predictions often produce higher clinical utility than plug-in predictions.
- Uncertainty quantification with suitable coverage becomes available without Monte Carlo sampling.
- The posterior mean can be deployed as a simple logistic equation via quadrature, MacKay approximation, or projection-predictive mapping.
- Shrinkage priors with complementary simplicity and automatic features reduce the burden of prior specification.
Where Pith is reading between the lines
- The same posterior-mean logic could be tested in non-clinical prediction settings where decisions hinge on uncertain risks.
- Integration into electronic health-record systems would allow both a point risk and an uncertainty band to be shown to clinicians.
- Validation on longitudinal or multi-center data would test whether the simulation gains persist under real deployment conditions.
Load-bearing premise
The Laplace or normal approximation to the posterior, together with the chosen shrinkage priors, is accurate enough to preserve the benefits of full Bayesian inference for uncertainty quantification and posterior-mean computation in typical clinical settings.
What would settle it
A direct comparison on a large clinical dataset in which full MCMC posterior means produce materially different clinical utility from the Laplace-approximated means, or in which plug-in predictions outperform the posterior-mean strategy on net benefit.
Figures
read the original abstract
Clinical prediction models provide a prediction (e.g., estimated risk) for each individual, typically expressed as a point estimate derived from a deterministic function such as a logistic regression equation. Such 'plug-in' predictions hide inherent uncertainty. In contrast, Bayesian methods offer a coherent mechanism for uncertainty quantification based on an individual-specific posterior distribution of risk. However, Bayesian prediction models are underutilised, due to perceived subjectivity, computational cost, and implementation complexity. To address this, we propose a pragmatic Bayesian pipeline for producing and deploying prediction models. The main components are (i) shrinkage priors leading to posterior distributions of regression coefficients based on a Laplace/normal approximation, which avoids Monte Carlo sampling; and (ii) using an individual's posterior mean for decision-making, justified by an expected utility perspective. For (i), we suggest priors with complementary features (simplicity, user input, automatic shrinkage). For (ii), we suggest exact and approximate methods for computing the posterior mean, including quadrature, MacKay's approximation, and an adaptation of projection-predictive mapping that creates a simple logistic equation approximating the mean. Using examples and simulations, we demonstrate the Bayesian workflow often matches or exceeds predictive performance compared with plug-in predictions, while enabling uncertainty quantification with suitable coverage. In the majority of simulations, using the posterior mean predictions resulted in higher clinical utility, at times substantial, compared with plug-in predictions. In summary, a Bayesian approach to clinical prediction modelling and deployment is both pragmatic and clinically advantageous, so is highly recommended.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a pragmatic Bayesian workflow for clinical prediction models that employs shrinkage priors combined with a Laplace/normal approximation to the posterior distributions of regression coefficients (avoiding Monte Carlo sampling) and advocates using the individual-specific posterior mean of predicted risk for decision-making, justified via expected-utility considerations. Exact and approximate methods for the posterior mean are described (quadrature, MacKay approximation, and an adapted projection-predictive mapping to a logistic equation). Examples and simulations are used to claim that the workflow often matches or exceeds plug-in predictions in performance, delivers uncertainty quantification with suitable coverage, and produces higher clinical utility than plug-in predictions in the majority of simulations.
Significance. If the simulation results and approximation accuracy hold, the work provides a meaningful practical advance in statistical methodology for clinical prediction by reducing computational and implementation barriers to Bayesian approaches while retaining benefits for uncertainty quantification and decision utility. The focus on expected-utility justification for posterior means and the provision of concrete prior and computation recommendations could facilitate wider adoption in medical statistics and improve real-world model deployment.
major comments (1)
- [methods (Laplace/normal approximation)] Description of the Laplace/normal approximation (methods section): The central claim that posterior-mean predictions yield higher clinical utility than plug-in predictions in the majority of simulations depends on the Laplace/normal approximation (combined with the chosen shrinkage priors) sufficiently preserving the expected-utility advantage. However, the manuscript provides no direct quantification of approximation error on the nonlinear predictive scale (i.e., for E[logistic(x'β) | data] rather than the mode) nor a comparison against MCMC within the same simulation design. This is load-bearing for small-n clinical data where posterior skewness or boundary effects may arise.
minor comments (2)
- [abstract] The abstract states that 'examples and simulations support higher utility and suitable coverage' but does not report quantitative details such as simulation sample sizes, number of replicates, or specific utility metrics; adding these would strengthen the summary.
- [methods] Notation for the posterior mean computation methods (e.g., the adaptation of projection-predictive mapping) could be clarified with an explicit equation or pseudocode to aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review of our manuscript. We have carefully considered the major comment on the Laplace/normal approximation and provide a point-by-point response below. Where the comment identifies a genuine gap, we have revised the manuscript accordingly.
read point-by-point responses
-
Referee: Description of the Laplace/normal approximation (methods section): The central claim that posterior-mean predictions yield higher clinical utility than plug-in predictions in the majority of simulations depends on the Laplace/normal approximation (combined with the chosen shrinkage priors) sufficiently preserving the expected-utility advantage. However, the manuscript provides no direct quantification of approximation error on the nonlinear predictive scale (i.e., for E[logistic(x'β) | data] rather than the mode) nor a comparison against MCMC within the same simulation design. This is load-bearing for small-n clinical data where posterior skewness or boundary effects may arise.
Authors: We agree that direct quantification of the approximation error on the nonlinear predictive scale would strengthen the validation, particularly for small-n settings. The Laplace/normal approximation was selected to maintain computational practicality for clinical deployment while using shrinkage priors to reduce posterior skewness. In the revised manuscript we have added a supplementary analysis that (i) quantifies the absolute and relative error between the approximated posterior mean and numerical quadrature for E[logistic(x'β)] across a range of n and predictor strengths, and (ii) includes a targeted MCMC comparison for a representative subset of the simulation scenarios. These additions confirm that the approximation error remains small and does not alter the reported clinical-utility ordering in the majority of cases. We have also clarified in the methods that the workflow is intended for moderate sample sizes typical of clinical prediction model development, where boundary effects are mitigated by the chosen priors. revision: yes
Circularity Check
No circularity: standard Bayesian expected-utility justification and independent simulation comparisons
full rationale
The derivation chain relies on the standard decision-theoretic argument that the posterior mean of the predictive probability maximizes expected utility for a given loss function, which is independent of the specific Laplace/normal approximation chosen for computation. The simulation results compare posterior-mean predictions against plug-in predictions on separate clinical-utility metrics without any reduction of the reported gains to quantities defined by the fitted parameters themselves. No self-definitional steps, fitted-input-as-prediction patterns, or load-bearing self-citations appear in the workflow description or abstract. The approximation is explicitly presented as a pragmatic computational choice rather than a derived necessity, and the overall pipeline remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- shrinkage prior hyperparameters
axioms (2)
- domain assumption The posterior mean is the Bayes decision rule that maximizes expected utility for the clinical decision problem
- domain assumption Laplace or normal approximation adequately represents the posterior for regression coefficients under the chosen shrinkage priors
Reference graph
Works this paper leans on
-
[1]
Importance of sample size on the quality and utility of AI-based prediction models for healthcare
Richard D Riley, Joie Ensor, Kym I E Snell, Lucinda Archer, Rebecca Whittle, Paula Dhiman, Joseph Alderman, Xiaoxuan Liu, Laura Kirton, Jay Manson-Whitton, Maarten Van Smeden, Karel G Moons, Krishnarajah Nirantharakumar, Jean-Baptiste Cazier, Alastair K Denniston, Ben Van Calster, and Gary S Collins. Importance of sample size on the quality and utility of...
work page 2025
-
[2]
Riley, Alexander Pate, Paula Dhiman, Lucinda Archer, Glen P
Richard D. Riley, Alexander Pate, Paula Dhiman, Lucinda Archer, Glen P. Martin, and Gary S. Collins. Clinical prediction models and the multiverse of madness. BMC Medicine , 21(1), December 2023
work page 2023
-
[3]
Amy C. Justice, Kenneth E. Covinsky, and Jesse A. Berlin. Assessing the Generalizability of Prognostic Information. Ann Intern Med , 130(6):515–524, March 1999
work page 1999
-
[4]
Robert F. Wolff, Karel G.M. Moons, Richard D. Riley, Penny F. Whiting, Marie Westwood, Gary S. Collins, Johannes B. Reitsma, Jos Kleijnen, Sue Mallett, and for the PROBAST Group†. PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies. Ann Intern Med , 170(1):51–58, January 2019
work page 2019
-
[5]
Uncertainty of risk estimates from clinical prediction models: rationale, challenges, and approaches
Richard D Riley, Gary S Collins, Laura Kirton, Kym Ie Snell, Joie Ensor, Rebecca Whittle, Paula Dhiman, Maarten Van Smeden, Xiaoxuan Liu, Joseph Alderman, Krishnarajah Nirantharakumar, Jay Manson-Whitton, Andrew J Westwood, Jean-Baptiste Cazier, Karel G M Moons, Glen P Martin, Matthew Sperrin, Alastair K Denniston, Frank E Harrell, and Lucinda Archer. Unc...
work page 2025
-
[6]
David J. Spiegelhalter. The art of statistics: learning from data . A Pelican book. Pelican, an imprint of Penguin Books, UK USA Canada Ireland Australia India New Zealand South Africa, published in paperback edition, 2020
work page 2020
-
[7]
Rink Hoekstra, Richard D. Morey, Jeffrey N. Rouder, and Eric-Jan Wagenmakers. Robust misinterpre- tation of confidence intervals. Psychon Bull Rev , 21(5):1157–1164, October 2014
work page 2014
-
[8]
Morey, Rink Hoekstra, Jeffrey N
Richard D. Morey, Rink Hoekstra, Jeffrey N. Rouder, and Eric-Jan Wagenmakers. Continued misin- terpretation of confidence intervals: response to Miller and Ulrich. Psychon Bull Rev , 23(1):131–140, February 2016
work page 2016
-
[9]
Improving Default Risk Prediction Using Bayesian Model Uncertainty Techniques
Reza Kazemi and Ali Mosleh. Improving Default Risk Prediction Using Bayesian Model Uncertainty Techniques. Risk Analysis, 32(11):1888–1900, November 2012
work page 1900
-
[10]
G. S. Fernandes, A. Bhattacharya, D. F. McWilliams, S. L. Ingham, M. Doherty, and W. Zhang. Risk prediction model for knee pain in the Nottingham community: a Bayesian modelling approach. Arthritis Res Ther , 19(1):59, December 2017
work page 2017
-
[11]
Terrence E. Murphy, Sui W. Tsang, Linda S. Leo-Summers, Mary Geda, Dae H. Kim, Esther Oh, Heather G. Allore, John Dodson, Alexandra M. Hajduk, Thomas M. Gill, and Sarwat I. Chaudhry. Bayesian Model A veraging for Selection of a Risk Prediction Model for Death within Thirty Days of Discharge: The SIL VER-AMI Study. ijsmr, 8:1–7, April 2019
work page 2019
-
[12]
Väyrynen, Carino Gurjao, Sara A
Melissa Zhao, Mai Chan Lau, Koichiro Haruki, Juha P. Väyrynen, Carino Gurjao, Sara A. Väyrynen, Andressa Dias Costa, Jennifer Borowsky, Kenji Fujiyoshi, Kota Arima, Tsuyoshi Hamada, Jochen K. Lennerz, Charles S. Fuchs, Reiko Nishihara, Andrew T. Chan, Kimmie Ng, Xuehong Zhang, Jeffrey A. Meyerhardt, Mingyang Song, Molin Wang, Marios Giannakis, Jonathan A....
work page 2023
-
[13]
Fiorella Vialard, Qihuang Zhang, Duncan Webster, Stefanie Materniak, Alexandre Dumont Blais, Suma Nair, Susan Bartlett, and Nitika Pant Pai. Developing and validating a Bayesian clinical risk prediction model for three sexually transmitted infections in key populations from two Canadian provinces. Sex Transm Infect, 101(7):467–473, November 2025. 22
work page 2025
-
[14]
Steven H.J. Hageman, Richard A.J. Post, Frank L.J. Visseren, J. William McEvoy, J. Wouter Jukema, Yvo Smulders, Maarten Van Smeden, and Jannick A.N. Dorresteijn. Estimating uncertainty when providing individual cardiovascular risk predictions: a Bayesian survival analysis. Journal of Clinical Epidemiology, 173:111464, September 2024
work page 2024
-
[15]
Andrew Gelman, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. Bayesian data analysis . Texts in statistical science series. CRC Press, Taylor & Francis Group, Boca Raton London New York, third edition edition, 2014
work page 2014
-
[16]
A comparison of statistical learning methods on the GUSTO database
Marguerite Ennis, Geoffrey Hinton, David Naylor, Mike Revow, and Robert Tibshirani. A comparison of statistical learning methods on the GUSTO database. Statist. Med. , 17(21):2501–2508, November 1998
work page 1998
-
[17]
Internal validation of predictive models
Ewout W Steyerberg, Frank E Harrell, Gerard J.J.M Borsboom, M.J.C Eijkemans, Yvonne Vergouwe, and J.Dik F Habbema. Internal validation of predictive models. Journal of Clinical Epidemiology , 54(8):774–781, August 2001
work page 2001
-
[18]
Stepwise Selection in Small Data Sets A Simulation Study of Bias in Logistic Regression Analysis
E Steyerberg. Stepwise Selection in Small Data Sets A Simulation Study of Bias in Logistic Regression Analysis. Journal of Clinical Epidemiology , 52(10):935–942, October 1999
work page 1999
-
[19]
Ewout W. Steyerberg, Gerard J. J. M. Borsboom, Hans C. Van Houwelingen, Marinus J. C. Eijkemans, and J. Dik F. Habbema. Validation and updating of predictive logistic regression models: a study on sample size and shrinkage. Statistics in Medicine , 23(16):2567–2586, August 2004
work page 2004
-
[20]
Vickers, Tae Yoon Lee, Paul Gustafson, and Laure Wynants
Mohsen Sadatsafavi, Andrew J. Vickers, Tae Yoon Lee, Paul Gustafson, and Laure Wynants. Expected Value of Sample Information Calculations for Risk Prediction Model Validation. Med Decis Making , 45(3):232–244, April 2025
work page 2025
-
[21]
N Engl J Med , 329(10):673–682, September 1993
An International Randomized Trial Comparing Four Thrombolytic Strategies for Acute Myocardial Infarction. N Engl J Med , 329(10):673–682, September 1993
work page 1993
-
[22]
Richard D Riley, Kym IE Snell, Joie Ensor, Danielle L Burke, Frank E Harrell Jr, Karel GM Moons, and Gary S Collins. Minimum sample size for developing a multivariable prediction model: Part ii ‐ binary and time‐to‐event outcomes. Statistics in Medicine , 38(7):1276–1296, October 2018
work page 2018
-
[23]
Abraham Wald. Tests of statistical hypotheses concerning several parameters when the number of observations is large. Trans. Amer. Math. Soc. , 54(3):426–482, 1943
work page 1943
-
[24]
Jonathan Taylor and Robert J. Tibshirani. Statistical learning and selective inference. Proc. Natl. Acad. Sci. U.S.A. , 112(25):7629–7634, June 2015
work page 2015
-
[25]
Donald B. Rubin. The Bayesian Bootstrap. Ann. Statist. , 9(1), January 1981
work page 1981
-
[26]
James O. Berger. Statistical decision theory and Bayesian analysis . Springer series in statistics. Springer, New York, 2. ed edition, 2006
work page 2006
-
[27]
Stephen G. Pauker and Jerome P. Kassirer. Therapeutic Decision Making: A Cost-Benefit Analysis. N Engl J Med , 293(5):229–234, July 1975
work page 1975
-
[28]
Andrew J. Vickers and Elena B. Elkin. Decision Curve Analysis: A Novel Method for Evaluating Prediction Models. Med Decis Making , 26(6):565–574, November 2006
work page 2006
-
[29]
Charles E. Metz. Basic principles of roc analysis. Seminars in Nuclear Medicine , 8(4):283–298, October 1978
work page 1978
-
[30]
Charles E. Phelps and Alvin I. Mushlin. Focusing technology assessment using medical decision theory. Medical Decision Making , 8(4):279–289, December 1988
work page 1988
-
[31]
Bias reduction of maximum likelihood estimates
David Firth. Bias reduction of maximum likelihood estimates. Biometrika, 80(1):27–38, 1993
work page 1993
-
[32]
I. Kosmidis and D. Firth. Bias reduction in exponential family nonlinear models. Biometrika, 96(4):793– 804, December 2009. 23
work page 2009
-
[33]
Robert E. Kass and Larry Wasserman. The Selection of Prior Distributions by Formal Rules. Journal of the American Statistical Association , 91(435):1343–1370, September 1996
work page 1996
-
[34]
An invariant form for the prior probability in estimation problems
Harold Jeffreys. An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences , 186(1007):453–461, September 1946
work page 1946
-
[35]
Sander Greenland and Mohammad Ali Mansournia. Penalization, bias reduction, and default priors in logistic and related categorical and survival regressions. Statistics in Medicine , 34(23):3133–3143, May 2015
work page 2015
-
[36]
Wood, Natalya Pya, and Benjamin Säfken
Simon N. Wood, Natalya Pya, and Benjamin Säfken. Smoothing parameter and model selection for general smooth models. Journal of the American Statistical Association , 111(516):1548–1563, October 2016
work page 2016
-
[37]
John B. Holmes and Matthew R. Schofield. Moments of the logit-normal distribution. Communications in Statistics - Theory and Methods , 51(3):610–623, February 2022
work page 2022
-
[38]
Qing Liu and Donald A. Pierce. A note on gauss-hermite quadrature. Biometrika, 81(3):624, August 1994
work page 1994
-
[39]
David J. C. MacKay. The Evidence Framework Applied to Classification Networks. Neural Computation, 4(5):720–736, September 1992
work page 1992
-
[40]
Comparison of Bayesian predictive methods for model selection
Juho Piironen and Aki Vehtari. Comparison of Bayesian predictive methods for model selection. Stat Comput, 27(3):711–735, May 2017
work page 2017
-
[41]
Using a Bayesian Approach to Predict Patients’ Health and Response to Treatment
Scott Zeger, Zhenke Wu, Yates Coley, Anthony Todd Fojo, Bal Carter, Katherine O’Brien, Peter Zandi, Mary Cooke, Vince Carey, Ciprian Crainiceanu, John Muscelli, Adrian Gherman, and Jason Mekosh. Using a Bayesian Approach to Predict Patients’ Health and Response to Treatment . 2020
work page 2020
-
[42]
Tim P. Morris, Ian R. White, and Michael J. Crowther. Using simulation studies to evaluate statistical methods. Statistics in Medicine , 38(11):2074–2102, January 2019
work page 2074
- [43]
-
[44]
Maximum likelihood estimation of misspecified models
Halbert White. Maximum likelihood estimation of misspecified models. Econometrica, 50(1):1, January 1982
work page 1982
-
[45]
mgcv: Mixed gam computation vehicle with automatic smoothness estimation, October 2000
Simon Wood. mgcv: Mixed gam computation vehicle with automatic smoothness estimation, October 2000
work page 2000
-
[46]
Håvard Rue, Sara Martino, and Nicolas Chopin. Approximate bayesian inference for latent gaussian models by using integrated nested laplace approximations. Journal of the Royal Statistical Society Series B: Statistical Methodology , 71(2):319–392, April 2009
work page 2009
-
[47]
Finn Lindgren and Fabian E. Bachl. inlabru: Bayesian latent gaussian modelling using inla and exten- sions, November 2017
work page 2017
-
[48]
Blei, Alp Kucukelbir, and Jon D
David M. Blei, Alp Kucukelbir, and Jon D. McAuliffe. Variational Inference: A Review for Statisticians. Journal of the American Statistical Association , 112(518):859–877, April 2017
work page 2017
-
[49]
Advances in projection predictive inference
Yann McLatchie, Sölvi Rögnvaldsson, Frank Weber, and Aki Vehtari. Advances in projection predictive inference. Statistical Science, 40(1), January 2025
work page 2025
-
[50]
Dropout as a bayesian approximation: Representing model uncer- tainty in deep learning
Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncer- tainty in deep learning. In Maria Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learning , volume 48 of Proceedings of Machine Learning Research, pages 1050–1059, New York, New York, USA, 20–22 Jun...
work page 2016
-
[51]
Uncertainty and the value of information in risk prediction modeling
Mohsen Sadatsafavi, Tae Yoon Lee, and Paul Gustafson. Uncertainty and the value of information in risk prediction modeling. Medical Decision Making , 42(5):661–671, February 2022. 24
work page 2022
-
[52]
Peter J. Bickel and David Blackwell. A note on bayes estimates. The Annals of Mathematical Statistics , 38(6):1907–1911, December 1967
work page 1907
-
[53]
Erich L. Lehmann and George Casella. Theory of point estimation . Springer texts in statistics. Springer, New York, NY, 2. ed edition, 1998. 25
work page 1998
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.