Convergence fragility in probit Bayesian kernel machine regression implemented in the bkmr R package for binary-outcome environmental mixture analyses: a simulation study

Akifumi Eguchi; Takayuki Kawashima; Tomotaka Momozaki; Tomoyuki Nakagawa

arxiv: 2607.02000 · v1 · pith:SP3XB3FDnew · submitted 2026-07-02 · 📊 stat.AP

Convergence fragility in probit Bayesian kernel machine regression implemented in the bkmr R package for binary-outcome environmental mixture analyses: a simulation study

Akifumi Eguchi , Takayuki Kawashima , Tomotaka Momozaki , Tomoyuki Nakagawa This is my paper

Pith reviewed 2026-07-03 03:15 UTC · model grok-4.3

classification 📊 stat.AP

keywords Bayesian kernel machine regressionprobit BKMRMCMC convergencebkmr R packageenvironmental mixturessimulation studyR-hateffective sample size

0 comments

The pith

Completion of a probit BKMR fit in bkmr does not ensure MCMC convergence of the retained draws.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether successful execution of probit Bayesian kernel machine regression in the bkmr package equates to having converged MCMC chains for binary environmental mixture data. Using bkmr's data simulator and fitter with four chains initialized from a fixed seed, it applies standard convergence checks including rank-normalized R-hat at or below 1.01 and effective sample sizes of at least 400 for both bulk and tail. Across 431 tasks only 30 satisfied the combined criteria even though 430 produced fitted objects. This separation of execution success from diagnostic adequacy shows that relying on fit completion alone can leave analyses without adequate posterior information. Applied work therefore needs to report full chain counts, iteration details, and all three diagnostics rather than assume convergence from a completed run.

Core claim

Of 431 prespecified simulation tasks using family = "binomial", hfun = 2, beta.true = 0.5, ind = 1:2, M = 4 and X = 3*cos(z1) + 2*rnorm(n), 430 returned fitted objects but only 30 achieved rank-normalized R-hat ≤ 1.01 together with bulk-ESS and tail-ESS both ≥ 400. The study therefore concludes that completion of probit BKMR fits in bkmr should not be equated with convergence of the retained MCMC draws and that analyses should report the number of chains, warmup and retained iterations, rank-normalized R-hat, bulk-ESS, and tail-ESS.

What carries the argument

Four-chain MCMC simulation with bkmr::SimData() for binary data generation and bkmr::kmbayes() for probit model fitting, evaluated by rank-normalized R-hat, bulk-ESS and tail-ESS.

If this is right

Fit completion alone is not a reliable indicator that retained draws carry adequate effective posterior information in probit BKMR.
Fixed iteration counts or default settings may leave many analyses without converged chains for binary outcomes.
Reporting only successful model fits without diagnostics risks basing environmental mixture conclusions on under-converged samples.
Users must document the number of chains, warmup and retained iterations plus the three convergence statistics for reproducible results.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Convergence shortfalls may be more common with binary than continuous outcomes under the same kernel machine setup.
Package maintainers could add automatic post-fit diagnostic summaries and warnings when thresholds are not met.
Re-running non-converged tasks with altered initial values or longer chains offers a direct way to test whether the fragility is fixable within the current sampler.

Load-bearing premise

The chosen simulation parameters and four-chain setup with the given seed produce representative cases for typical binary-outcome environmental mixture analyses.

What would settle it

Repeating the identical simulation protocol but with substantially more iterations per chain and observing that nearly all 431 tasks then satisfy the combined R-hat and ESS criteria would falsify the reported fragility.

read the original abstract

Background. Bayesian kernel machine regression (BKMR) is widely used for exposure-mixture analyses with binary outcomes through a probit extension. Because a bkmr fit can complete without providing adequate effective posterior information, simulation studies should separate execution success from MCMC convergence diagnostics. Methods. We evaluated the public bkmr probit workflow using bkmr::SimData() for data generation, bkmr::kmbayes() for model fitting, and posterior for convergence diagnostics. The balanced generator used family = "binomial", hfun = 2, beta.true = 0.5, ind = 1:2, and M = 4. SimData() generated the covariate as X = 3*cos(z1) + 2*rnorm(n). Four chains were initialized with chain-specific randomized starting values generated reproducibly from the fixed initial-value base seed 20260621. These values affected only the initial state of the sampler and did not alter the BKMR model, default priors, or Metropolis-Hastings proposals. Results. Of 431 prespecified tasks, 430 returned fitted objects and one task had a numerical non-completion. Diagnostic adequacy was limited: rank-normalized R-hat <= 1.01 threshold was achieved in 55/431 tasks, bulk-ESS >= 400 in 85/431, tail-ESS >= 400 in 44/431, and both ESS criteria in 44/431. The primary diagnostic criterion, R-hat at or below the 1.01 threshold with both bulk-ESS and tail-ESS >= 400, was met in 30/431 prespecified tasks, corresponding to 30/430 completed fits. Conclusions. Completion of probit BKMR fits in bkmr should not be equated with convergence of the retained MCMC draws. Applied analyses should report the number of chains, warmup and retained iterations, rank-normalized R-hat, bulk-ESS, and tail-ESS rather than rely on a fixed iteration count or on fit completion alone.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

bkmr probit fits often complete without meeting basic MCMC convergence criteria, with only 30 of 430 doing so in this 431-task simulation.

read the letter

The main thing to know is that completing a bkmr probit fit does not mean the MCMC draws have converged. Their simulation of 431 prespecified tasks found only 30 met the combined rank-normalized R-hat at or below 1.01 plus bulk-ESS and tail-ESS at or above 400.

The paper does a clean job of separating execution success from diagnostic adequacy. It uses the package's own SimData and kmbayes functions, applies standard posterior-package diagnostics, and reports exact counts from a reproducible setup with four chains started from a fixed seed. That gives direct, non-circular evidence for the tested scenarios.

The soft spot is the narrow simulation design. The data generator is locked to hfun=2, beta.true=0.5, a specific cosine covariate, and one seed for initial values. This shows the problem occurs under those conditions but leaves open how often it appears in other environmental-mixture settings. The central claim still holds inside the cases they ran.

This is for applied users of bkmr in environmental health who work with binary outcomes. A reader who cares about MCMC reliability or software defaults would get concrete numbers to act on. The citation pattern is straightforward and the thinking is clear.

It deserves peer review. The evidence is sharp enough on the narrow point to warrant referee time even if broader generalizability needs work.

Referee Report

0 major / 1 minor

Summary. The manuscript reports a simulation study of the probit BKMR implementation in the bkmr R package. Using bkmr::SimData() with family="binomial", hfun=2, beta.true=0.5, ind=1:2, M=4 and X=3*cos(z1)+2*rnorm(n), and bkmr::kmbayes() with four chains initialized from seed 20260621, the authors executed 431 prespecified tasks. Of the 430 completed fits, only 30 satisfied the joint convergence criteria of rank-normalized R-hat ≤ 1.01 together with bulk-ESS ≥ 400 and tail-ESS ≥ 400. The central claim is that fit completion alone does not guarantee adequate MCMC convergence of retained draws and that applied analyses should report chain count, iterations, R-hat, and both ESS diagnostics.

Significance. The result, if it holds, supplies direct, reproducible evidence that standard convergence diagnostics are frequently not met even when bkmr probit fits complete. The simulation design relies on externally generated data and independent posterior diagnostics rather than self-referential quantities, and the large number of prespecified tasks (431) with fixed initialization strengthens the observation within the tested scenarios. This supports the practical recommendation to report full MCMC diagnostics instead of relying on completion status or fixed iteration counts.

minor comments (1)

The abstract states that four chains were run but does not report the default or chosen values of iter, burnin, or thin; adding these numbers would allow readers to interpret the reported ESS thresholds directly from the methods description.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript, including the recognition of its reproducible design and practical implications for reporting MCMC diagnostics in probit BKMR analyses. The recommendation to accept is appreciated.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The manuscript reports an empirical simulation study that generates data via bkmr::SimData, fits models via bkmr::kmbayes, and tallies the fraction of completed fits that satisfy external, pre-specified MCMC diagnostics (rank-normalized R-hat ≤ 1.01 together with bulk-ESS and tail-ESS ≥ 400). These counts are direct observations from independently seeded runs; they are not obtained by fitting any parameter to the target convergence metric, by redefining the metric in terms of itself, or by invoking a self-citation chain whose validity depends on the present results. The central claim therefore rests on external software behavior and standard diagnostic thresholds rather than on any internal reduction to the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The findings depend on the bkmr package's implementation and the simulation parameters chosen to generate test data; these are standard tools and settings rather than new postulates.

axioms (1)

domain assumption The rank-normalized R-hat and effective sample size thresholds are valid indicators of MCMC convergence for this model.
Applied in defining the primary diagnostic criterion in the results.

pith-pipeline@v0.9.1-grok · 5937 in / 1284 out tokens · 57143 ms · 2026-07-03T03:15:18.303266+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 2 canonical work pages

[1]

Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures

Bobb JF, Valeri L, Claus Henn B, et al. Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures. Biostatistics. 2015;16(3):493-508

2015
[2]

Statistical software for analyzing the health effects of multiple concurrent exposures via Bayesian kernel machine regression

Bobb JF, Claus Henn B, Valeri L, Coull BA. Statistical software for analyzing the health effects of multiple concurrent exposures via Bayesian kernel machine regression. Environ Health. 2018;17(1):67

2018
[3]

An overview of methods to address distinct research questions on environmental mixtures: an application to persistent organic pollutants and leukocyte telomere length

Gibson EA, Nunez Y, Abuawad A, et al. An overview of methods to address distinct research questions on environmental mixtures: an application to persistent organic pollutants and leukocyte telomere length. Environ Health. 2019;18(1):76

2019
[4]

Bayesian analysis of binary and polychotomous response data

Albert JH, Chib S. Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc. 1993;88(422):669-679

1993
[5]

Inference from iterative simulation using multiple sequences

Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences. Stat Sci. 1992;7(4):457-472

1992
[6]

Rank-normalization, folding, and localization: An improved Rˆ for assessing convergence of MCMC (with discussion)

Vehtari A, Gelman A, Simpson D, Carpenter B, Bürkner PC. Rank-normalization, folding, and localization: An improved Rˆ for assessing convergence of MCMC (with discussion). Bayesian Anal. 2021;16(2). doi:10.1214/20-ba1221

work page doi:10.1214/20-ba1221 2021
[7]

Stan Reference Manual

Stan Development Team. Stan Reference Manual. Version 2.39. 2026. Accessed June 21, 2026. https://mc- stan.org/docs/reference-manual/

2026
[8]

Example using the bkmr R package for probit regression with simulated data

Bobb JF. Example using the bkmr R package for probit regression with simulated data. 2018. Accessed June 9, 2026. https://jenfb.github.io/bkmr/ProbitEx.html

2018
[9]

The traceplot thickens: Developing all- purpose convergence diagnostics for any Markov Chain Monte Carlo algorithm

Duttweiler L, Klus J, Coull BA, Geller RJ, Henn BC, Thurston SW. The traceplot thickens: Developing all- purpose convergence diagnostics for any Markov Chain Monte Carlo algorithm. arXiv [statCO]. Published online August 27, 2024. doi:10.48550/arXiv.2408.15392

work page doi:10.48550/arxiv.2408.15392 2024
[10]

bkmr: Bayesian kernel machine regression

Bobb JF. bkmr: Bayesian kernel machine regression. R package version 0.2.2.9000 (development version, commit 45413e338a316362d629f53bd2a917c4bf485c1e). 2024. Accessed June 9, 2026. https://github.com/jenfb/bkmr

2024
[11]

The sensitivity of Bayesian kernel machine regression (BKMR) to data distribution: a comprehensive simulation analysis

Tanvir Hasan K, Odom G, Bursac Z, Ibrahimou B. The sensitivity of Bayesian kernel machine regression (BKMR) to data distribution: a comprehensive simulation analysis. J Stat Comput Simul. 2026;96(7):1752- 1771

2026
[12]

Convergence diagnostics for Markov chain Monte Carlo

Roy V. Convergence diagnostics for Markov chain Monte Carlo. Annu Rev Stat Appl. 2020;7(1):387-412

2020
[13]

binomial

Bürkner PC, Gabry J, Kay M, Vehtari A. posterior: Tools for Working with Posterior Distributions.” R package version 1.7.0,. 2026. Accessed June 19, 2026. https://cran.r- project.org/web/packages/posterior/index.html Supplemental Digital Content (eAppendix) Convergence fragility in probit Bayesian kernel machine regression for binary-outcome environmental...

2026

[1] [1]

Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures

Bobb JF, Valeri L, Claus Henn B, et al. Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures. Biostatistics. 2015;16(3):493-508

2015

[2] [2]

Statistical software for analyzing the health effects of multiple concurrent exposures via Bayesian kernel machine regression

Bobb JF, Claus Henn B, Valeri L, Coull BA. Statistical software for analyzing the health effects of multiple concurrent exposures via Bayesian kernel machine regression. Environ Health. 2018;17(1):67

2018

[3] [3]

An overview of methods to address distinct research questions on environmental mixtures: an application to persistent organic pollutants and leukocyte telomere length

Gibson EA, Nunez Y, Abuawad A, et al. An overview of methods to address distinct research questions on environmental mixtures: an application to persistent organic pollutants and leukocyte telomere length. Environ Health. 2019;18(1):76

2019

[4] [4]

Bayesian analysis of binary and polychotomous response data

Albert JH, Chib S. Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc. 1993;88(422):669-679

1993

[5] [5]

Inference from iterative simulation using multiple sequences

Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences. Stat Sci. 1992;7(4):457-472

1992

[6] [6]

Rank-normalization, folding, and localization: An improved Rˆ for assessing convergence of MCMC (with discussion)

Vehtari A, Gelman A, Simpson D, Carpenter B, Bürkner PC. Rank-normalization, folding, and localization: An improved Rˆ for assessing convergence of MCMC (with discussion). Bayesian Anal. 2021;16(2). doi:10.1214/20-ba1221

work page doi:10.1214/20-ba1221 2021

[7] [7]

Stan Reference Manual

Stan Development Team. Stan Reference Manual. Version 2.39. 2026. Accessed June 21, 2026. https://mc- stan.org/docs/reference-manual/

2026

[8] [8]

Example using the bkmr R package for probit regression with simulated data

Bobb JF. Example using the bkmr R package for probit regression with simulated data. 2018. Accessed June 9, 2026. https://jenfb.github.io/bkmr/ProbitEx.html

2018

[9] [9]

The traceplot thickens: Developing all- purpose convergence diagnostics for any Markov Chain Monte Carlo algorithm

Duttweiler L, Klus J, Coull BA, Geller RJ, Henn BC, Thurston SW. The traceplot thickens: Developing all- purpose convergence diagnostics for any Markov Chain Monte Carlo algorithm. arXiv [statCO]. Published online August 27, 2024. doi:10.48550/arXiv.2408.15392

work page doi:10.48550/arxiv.2408.15392 2024

[10] [10]

bkmr: Bayesian kernel machine regression

Bobb JF. bkmr: Bayesian kernel machine regression. R package version 0.2.2.9000 (development version, commit 45413e338a316362d629f53bd2a917c4bf485c1e). 2024. Accessed June 9, 2026. https://github.com/jenfb/bkmr

2024

[11] [11]

The sensitivity of Bayesian kernel machine regression (BKMR) to data distribution: a comprehensive simulation analysis

Tanvir Hasan K, Odom G, Bursac Z, Ibrahimou B. The sensitivity of Bayesian kernel machine regression (BKMR) to data distribution: a comprehensive simulation analysis. J Stat Comput Simul. 2026;96(7):1752- 1771

2026

[12] [12]

Convergence diagnostics for Markov chain Monte Carlo

Roy V. Convergence diagnostics for Markov chain Monte Carlo. Annu Rev Stat Appl. 2020;7(1):387-412

2020

[13] [13]

binomial

Bürkner PC, Gabry J, Kay M, Vehtari A. posterior: Tools for Working with Posterior Distributions.” R package version 1.7.0,. 2026. Accessed June 19, 2026. https://cran.r- project.org/web/packages/posterior/index.html Supplemental Digital Content (eAppendix) Convergence fragility in probit Bayesian kernel machine regression for binary-outcome environmental...

2026