pith. sign in

arxiv: 2607.02000 · v1 · pith:SP3XB3FDnew · submitted 2026-07-02 · 📊 stat.AP

Convergence fragility in probit Bayesian kernel machine regression implemented in the bkmr R package for binary-outcome environmental mixture analyses: a simulation study

Pith reviewed 2026-07-03 03:15 UTC · model grok-4.3

classification 📊 stat.AP
keywords Bayesian kernel machine regressionprobit BKMRMCMC convergencebkmr R packageenvironmental mixturessimulation studyR-hateffective sample size
0
0 comments X

The pith

Completion of a probit BKMR fit in bkmr does not ensure MCMC convergence of the retained draws.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether successful execution of probit Bayesian kernel machine regression in the bkmr package equates to having converged MCMC chains for binary environmental mixture data. Using bkmr's data simulator and fitter with four chains initialized from a fixed seed, it applies standard convergence checks including rank-normalized R-hat at or below 1.01 and effective sample sizes of at least 400 for both bulk and tail. Across 431 tasks only 30 satisfied the combined criteria even though 430 produced fitted objects. This separation of execution success from diagnostic adequacy shows that relying on fit completion alone can leave analyses without adequate posterior information. Applied work therefore needs to report full chain counts, iteration details, and all three diagnostics rather than assume convergence from a completed run.

Core claim

Of 431 prespecified simulation tasks using family = "binomial", hfun = 2, beta.true = 0.5, ind = 1:2, M = 4 and X = 3*cos(z1) + 2*rnorm(n), 430 returned fitted objects but only 30 achieved rank-normalized R-hat ≤ 1.01 together with bulk-ESS and tail-ESS both ≥ 400. The study therefore concludes that completion of probit BKMR fits in bkmr should not be equated with convergence of the retained MCMC draws and that analyses should report the number of chains, warmup and retained iterations, rank-normalized R-hat, bulk-ESS, and tail-ESS.

What carries the argument

Four-chain MCMC simulation with bkmr::SimData() for binary data generation and bkmr::kmbayes() for probit model fitting, evaluated by rank-normalized R-hat, bulk-ESS and tail-ESS.

If this is right

  • Fit completion alone is not a reliable indicator that retained draws carry adequate effective posterior information in probit BKMR.
  • Fixed iteration counts or default settings may leave many analyses without converged chains for binary outcomes.
  • Reporting only successful model fits without diagnostics risks basing environmental mixture conclusions on under-converged samples.
  • Users must document the number of chains, warmup and retained iterations plus the three convergence statistics for reproducible results.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Convergence shortfalls may be more common with binary than continuous outcomes under the same kernel machine setup.
  • Package maintainers could add automatic post-fit diagnostic summaries and warnings when thresholds are not met.
  • Re-running non-converged tasks with altered initial values or longer chains offers a direct way to test whether the fragility is fixable within the current sampler.

Load-bearing premise

The chosen simulation parameters and four-chain setup with the given seed produce representative cases for typical binary-outcome environmental mixture analyses.

What would settle it

Repeating the identical simulation protocol but with substantially more iterations per chain and observing that nearly all 431 tasks then satisfy the combined R-hat and ESS criteria would falsify the reported fragility.

read the original abstract

Background. Bayesian kernel machine regression (BKMR) is widely used for exposure-mixture analyses with binary outcomes through a probit extension. Because a bkmr fit can complete without providing adequate effective posterior information, simulation studies should separate execution success from MCMC convergence diagnostics. Methods. We evaluated the public bkmr probit workflow using bkmr::SimData() for data generation, bkmr::kmbayes() for model fitting, and posterior for convergence diagnostics. The balanced generator used family = "binomial", hfun = 2, beta.true = 0.5, ind = 1:2, and M = 4. SimData() generated the covariate as X = 3*cos(z1) + 2*rnorm(n). Four chains were initialized with chain-specific randomized starting values generated reproducibly from the fixed initial-value base seed 20260621. These values affected only the initial state of the sampler and did not alter the BKMR model, default priors, or Metropolis-Hastings proposals. Results. Of 431 prespecified tasks, 430 returned fitted objects and one task had a numerical non-completion. Diagnostic adequacy was limited: rank-normalized R-hat <= 1.01 threshold was achieved in 55/431 tasks, bulk-ESS >= 400 in 85/431, tail-ESS >= 400 in 44/431, and both ESS criteria in 44/431. The primary diagnostic criterion, R-hat at or below the 1.01 threshold with both bulk-ESS and tail-ESS >= 400, was met in 30/431 prespecified tasks, corresponding to 30/430 completed fits. Conclusions. Completion of probit BKMR fits in bkmr should not be equated with convergence of the retained MCMC draws. Applied analyses should report the number of chains, warmup and retained iterations, rank-normalized R-hat, bulk-ESS, and tail-ESS rather than rely on a fixed iteration count or on fit completion alone.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 1 minor

Summary. The manuscript reports a simulation study of the probit BKMR implementation in the bkmr R package. Using bkmr::SimData() with family="binomial", hfun=2, beta.true=0.5, ind=1:2, M=4 and X=3*cos(z1)+2*rnorm(n), and bkmr::kmbayes() with four chains initialized from seed 20260621, the authors executed 431 prespecified tasks. Of the 430 completed fits, only 30 satisfied the joint convergence criteria of rank-normalized R-hat ≤ 1.01 together with bulk-ESS ≥ 400 and tail-ESS ≥ 400. The central claim is that fit completion alone does not guarantee adequate MCMC convergence of retained draws and that applied analyses should report chain count, iterations, R-hat, and both ESS diagnostics.

Significance. The result, if it holds, supplies direct, reproducible evidence that standard convergence diagnostics are frequently not met even when bkmr probit fits complete. The simulation design relies on externally generated data and independent posterior diagnostics rather than self-referential quantities, and the large number of prespecified tasks (431) with fixed initialization strengthens the observation within the tested scenarios. This supports the practical recommendation to report full MCMC diagnostics instead of relying on completion status or fixed iteration counts.

minor comments (1)
  1. The abstract states that four chains were run but does not report the default or chosen values of iter, burnin, or thin; adding these numbers would allow readers to interpret the reported ESS thresholds directly from the methods description.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript, including the recognition of its reproducible design and practical implications for reporting MCMC diagnostics in probit BKMR analyses. The recommendation to accept is appreciated.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The manuscript reports an empirical simulation study that generates data via bkmr::SimData, fits models via bkmr::kmbayes, and tallies the fraction of completed fits that satisfy external, pre-specified MCMC diagnostics (rank-normalized R-hat ≤ 1.01 together with bulk-ESS and tail-ESS ≥ 400). These counts are direct observations from independently seeded runs; they are not obtained by fitting any parameter to the target convergence metric, by redefining the metric in terms of itself, or by invoking a self-citation chain whose validity depends on the present results. The central claim therefore rests on external software behavior and standard diagnostic thresholds rather than on any internal reduction to the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The findings depend on the bkmr package's implementation and the simulation parameters chosen to generate test data; these are standard tools and settings rather than new postulates.

axioms (1)
  • domain assumption The rank-normalized R-hat and effective sample size thresholds are valid indicators of MCMC convergence for this model.
    Applied in defining the primary diagnostic criterion in the results.

pith-pipeline@v0.9.1-grok · 5937 in / 1284 out tokens · 57143 ms · 2026-07-03T03:15:18.303266+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

13 extracted references · 2 canonical work pages

  1. [1]

    Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures

    Bobb JF, Valeri L, Claus Henn B, et al. Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures. Biostatistics. 2015;16(3):493-508

  2. [2]

    Statistical software for analyzing the health effects of multiple concurrent exposures via Bayesian kernel machine regression

    Bobb JF, Claus Henn B, Valeri L, Coull BA. Statistical software for analyzing the health effects of multiple concurrent exposures via Bayesian kernel machine regression. Environ Health. 2018;17(1):67

  3. [3]

    An overview of methods to address distinct research questions on environmental mixtures: an application to persistent organic pollutants and leukocyte telomere length

    Gibson EA, Nunez Y, Abuawad A, et al. An overview of methods to address distinct research questions on environmental mixtures: an application to persistent organic pollutants and leukocyte telomere length. Environ Health. 2019;18(1):76

  4. [4]

    Bayesian analysis of binary and polychotomous response data

    Albert JH, Chib S. Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc. 1993;88(422):669-679

  5. [5]

    Inference from iterative simulation using multiple sequences

    Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences. Stat Sci. 1992;7(4):457-472

  6. [6]

    Rank-normalization, folding, and localization: An improved Rˆ for assessing convergence of MCMC (with discussion)

    Vehtari A, Gelman A, Simpson D, Carpenter B, Bürkner PC. Rank-normalization, folding, and localization: An improved Rˆ for assessing convergence of MCMC (with discussion). Bayesian Anal. 2021;16(2). doi:10.1214/20-ba1221

  7. [7]

    Stan Reference Manual

    Stan Development Team. Stan Reference Manual. Version 2.39. 2026. Accessed June 21, 2026. https://mc- stan.org/docs/reference-manual/

  8. [8]

    Example using the bkmr R package for probit regression with simulated data

    Bobb JF. Example using the bkmr R package for probit regression with simulated data. 2018. Accessed June 9, 2026. https://jenfb.github.io/bkmr/ProbitEx.html

  9. [9]

    The traceplot thickens: Developing all- purpose convergence diagnostics for any Markov Chain Monte Carlo algorithm

    Duttweiler L, Klus J, Coull BA, Geller RJ, Henn BC, Thurston SW. The traceplot thickens: Developing all- purpose convergence diagnostics for any Markov Chain Monte Carlo algorithm. arXiv [statCO]. Published online August 27, 2024. doi:10.48550/arXiv.2408.15392

  10. [10]

    bkmr: Bayesian kernel machine regression

    Bobb JF. bkmr: Bayesian kernel machine regression. R package version 0.2.2.9000 (development version, commit 45413e338a316362d629f53bd2a917c4bf485c1e). 2024. Accessed June 9, 2026. https://github.com/jenfb/bkmr

  11. [11]

    The sensitivity of Bayesian kernel machine regression (BKMR) to data distribution: a comprehensive simulation analysis

    Tanvir Hasan K, Odom G, Bursac Z, Ibrahimou B. The sensitivity of Bayesian kernel machine regression (BKMR) to data distribution: a comprehensive simulation analysis. J Stat Comput Simul. 2026;96(7):1752- 1771

  12. [12]

    Convergence diagnostics for Markov chain Monte Carlo

    Roy V. Convergence diagnostics for Markov chain Monte Carlo. Annu Rev Stat Appl. 2020;7(1):387-412

  13. [13]

    binomial

    Bürkner PC, Gabry J, Kay M, Vehtari A. posterior: Tools for Working with Posterior Distributions.” R package version 1.7.0,. 2026. Accessed June 19, 2026. https://cran.r- project.org/web/packages/posterior/index.html Supplemental Digital Content (eAppendix) Convergence fragility in probit Bayesian kernel machine regression for binary-outcome environmental...