pith. machine review for the scientific record. sign in

arxiv: 2604.20743 · v1 · submitted 2026-04-22 · 📊 stat.ME · stat.CO

Recognition: unknown

ProfileGLMM: a R Package Extending Bayesian Profile Regression using Generalised Linear Mixed Models

Authors on Pith no claims yet

Pith reviewed 2026-05-09 23:28 UTC · model grok-4.3

classification 📊 stat.ME stat.CO
keywords ProfileGLMMBayesian profile regressiongeneralized linear mixed modelsR packageclusteringhierarchical datalongitudinal datarandom effects
0
0 comments X

The pith

ProfileGLMM extends Bayesian profile regression to hierarchical data by using GLMMs as the outcome model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ProfileGLMM, an R package that integrates generalized linear mixed models into Bayesian profile regression. This approach simultaneously clusters observations based on interdependent covariates and models the outcome, now with random effects to handle hierarchical and longitudinal structures. It further permits direct examination of interactions between the resulting latent clusters and other observed covariates. The extension broadens applicability to complex data common in epidemiology, social sciences, and clinical research while supporting continuous or binary outcomes and both categorical and continuous clustering covariates. The package relies on efficient Rcpp implementation with few required parameters.

Core claim

Integrating GLMMs as the outcome model allows Bayesian profile regression to analyze hierarchical and longitudinal data through random effects and to study interactions between latent clusters and observable covariates, while still clustering observations on the basis of a specified set of interdependent covariates.

What carries the argument

The GLMM outcome model in which derived cluster memberships are included as explanatory variables together with random effects for hierarchical structure.

Load-bearing premise

That replacing the original outcome model with a GLMM preserves the statistical validity and clustering properties of Bayesian profile regression without creating new biases or identifiability problems.

What would settle it

Simulate hierarchical data with known true clusters, fit both the original profile regression and ProfileGLMM, and check whether cluster recovery accuracy declines under the GLMM extension.

Figures

Figures reproduced from arXiv: 2604.20743 by Mark A. van de Wiel, Matteo Amestoy, Wessel N. van Wieringen.

Figure 1
Figure 1. Figure 1: Simulated exposure distribution. Color representing the latent cluster. [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Decomposition of the outcome based on the true parameters. Left panel isolates the [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Graphical representation illustrating variable dependencies within the ProfileGLMM [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Profile LMM representative clustering centroids and variance estimates. Points represent [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Left panel isolates the fixed effect contribution with blue being the true contribution red [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Blue is the underlying true signal, red is the ProfileGLMM fit [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Trace of the DP concentration parameter after a burn-in of 200 iterations. [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗
read the original abstract

ProfileGLMM is an R package integrating Generalised Linear Mixed Models (GLMMs) as the outcome model for Bayesian profile regression. This statistical framework simultaneously i) explains the variation in the outcome and ii) clusters the observations based on a specified set of interdependent clustering covariates. The derived cluster memberships are then incorporated, alongside others, as explanatory variables in the regression to model the outcome. This framework efficiently handles complex, highly correlated covariate structures whose direct inclusion in a standard regression model would be statistically sub-optimal. ProfileGLMM significantly extends Bayesian profile regression's scope by resolving two key constraints of previous implementations: 1) it allows the analysis of hierarchical and longitudinal data structures through the inclusion of random effects, and 2) it enables the study of interactions between latent clusters and other observable covariates. ProfileGLMM accommodates various data types, supporting both continuous or binary outcomes and both categorical and continuous clustering covariates. Built on fast Rcpp code with minimal mandatory parameters, ProfileGLMM offers a flexible analytical tool. It significantly enhances the utility of profile regression for researchers in fields such as epidemiology, social sciences, and clinical studies dealing with complex data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper describes the ProfileGLMM R package for extending Bayesian profile regression with GLMMs as the outcome model. It simultaneously clusters observations using interdependent covariates and models the outcome incorporating cluster memberships, random effects for hierarchical/longitudinal structures, and interactions between clusters and covariates. The package handles continuous/binary outcomes and categorical/continuous clustering covariates using efficient Rcpp code with minimal parameters.

Significance. If the extension is implemented such that the clustering step remains valid and separate despite the added random effects and interactions, the package would meaningfully broaden the use of profile regression in applied fields dealing with complex data. The fast implementation and flexibility are positive features for practical use.

major comments (2)
  1. [Abstract] The claim that ProfileGLMM resolves two key constraints of previous implementations lacks any equations or model details on how random effects and cluster-covariate interactions are incorporated into the GLMM while preserving the clustering properties, which is load-bearing for the central claim.
  2. No results from simulations, real data applications, or comparisons to existing methods are provided to substantiate the assertions about handling various data types and avoiding statistical sub-optimality.
minor comments (1)
  1. [Title] Grammatical error: 'a R Package' should be 'an R Package'.

Circularity Check

0 steps flagged

No significant circularity: software package description without derivations or predictions

full rationale

The document is a description of an R package (ProfileGLMM) that integrates GLMMs into Bayesian profile regression for handling hierarchical data and cluster-covariate interactions. It presents no equations, likelihoods, derivations, or statistical predictions. Claims concern software functionality, data type support, and scope extensions relative to prior profile regression implementations. No fitted parameters are renamed as predictions, no self-definitional steps exist, and no load-bearing self-citations or uniqueness theorems are invoked in the provided text. The central assertions are descriptive rather than derived, rendering the paper self-contained against external benchmarks with no reduction of outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are introduced because the document is a package announcement rather than a theoretical derivation; the work relies on standard statistical assumptions from the GLMM and profile regression literature.

pith-pipeline@v0.9.0 · 5511 in / 1117 out tokens · 46288 ms · 2026-05-09T23:28:05.467544+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

84 extracted references · 54 canonical work pages

  1. [1]

    Bayesian Analysis , author =

    Selection of. Bayesian Analysis , author =. doi:10.1214/16-BA1025 , number =

  2. [2]

    doi:10.32614/CRAN.package.ProfileGLMM , abstract =

    Amestoy, Matteo , month = dec, year =. doi:10.32614/CRAN.package.ProfileGLMM , abstract =

  3. [3]

    Dose-Response: A Publication of International Hormesis Society , author =

    Advancing. Dose-Response: A Publication of International Hormesis Society , author =. 2025 , pmid =. doi:10.1177/15593258251395790 , abstract =

  4. [4]

    EMBO Molecular Medicine , author =

    Exposomics: a review of methodologies, applications, and future directions in molecular medicine , volume =. EMBO Molecular Medicine , author =. 2025 , pages =. doi:10.1038/s44321-025-00191-w , abstract =

  5. [5]

    Environment International , author =

    What is new in the exposome? , volume =. Environment International , author =. 2020 , pages =. doi:10.1016/j.envint.2020.105887 , language =

  6. [6]

    Cancer Epidemiology, Biomarkers & Prevention , author =

    Complementing the. Cancer Epidemiology, Biomarkers & Prevention , author =. 2005 , pages =. doi:10.1158/1055-9965.EPI-05-0456 , language =

  7. [7]

    Bayesian

    Amestoy, Matteo and van de Wiel, Mark and Lakerveld, Jeroen and van Wieringen, Wessel , year =. Bayesian. doi:10.48550/ARXIV.2510.08304 , abstract =

  8. [8]

    doi:10.18637/jss.v064.i07 , language =

    Journal of Statistical Software , author =. doi:10.18637/jss.v064.i07 , language =

  9. [9]

    2019 , note =

    Spectrum:. 2019 , note =. doi:10.32614/CRAN.package.Spectrum , abstract =

  10. [10]

    Ng, Andrew and Jordan, Michael and Weiss, Yair , year =. On. Advances in

  11. [11]

    The Lancet , author =

    Global burden of 87 risk factors in 204 countries and territories, 1990–2019: a systematic analysis for the. The Lancet , author =. 2020 , pages =. doi:https://doi.org/10.1016/S0140-6736(20)30752-2 , abstract =

  12. [12]

    European Heart Journal , author =

    Heart healthy cities: genetics loads the gun but the environment pulls the trigger , volume =. European Heart Journal , author =. 2021 , pages =. doi:10.1093/eurheartj/ehab235 , abstract =

  13. [13]

    Environmental Pollution , author =

    Long-term exposure to outdoor and household air pollution and blood pressure in the. Environmental Pollution , author =. 2020 , pages =. doi:10.1016/j.envpol.2020.114197 , language =

  14. [14]

    Nature Reviews Cardiology , author =

    Influence of urban and transport planning and the city environment on cardiovascular disease , volume =. Nature Reviews Cardiology , author =. 2018 , pages =. doi:10.1038/s41569-018-0003-2 , language =

  15. [15]

    Environmental Science & Technology , author =

    Using an. Environmental Science & Technology , author =. 2022 , pages =. doi:10.1021/acs.est.1c08327 , language =

  16. [16]

    Statistics and Computing , author =

    On. Statistics and Computing , author =. 1999 , keywords =. doi:10.1023/A:1008853808677 , abstract =

  17. [17]

    Chemosphere , author =

    Assessment of per- and polyfluoroalkyl substances (. Chemosphere , author =. 2022 , pages =. doi:10.1016/j.chemosphere.2022.134478 , language =

  18. [18]

    International Journal of Health Geographics , author =

    Deep phenotyping meets big data: the. International Journal of Health Geographics , author =. 2020 , pages =. doi:10.1186/s12942-020-00235-z , abstract =

  19. [19]

    International Journal of Epidemiology , author =

    Cohort. International Journal of Epidemiology , author =. 2022 , pages =. doi:10.1093/ije/dyab257 , language =

  20. [20]

    Hypertension , author =

    Impact of. Hypertension , author =. 2012 , pages =. doi:10.1161/HYPERTENSIONAHA.112.201400 , abstract =

  21. [21]

    Hypertension , author =

    Influence of. Hypertension , author =. 2003 , pages =. doi:10.1161/01.HYP.0000094221.86888.AE , abstract =

  22. [22]

    Hypertension , author =

    Gender. Hypertension , author =. 2001 , pages =. doi:10.1161/01.HYP.37.5.1199 , abstract =

  23. [23]

    Inhalation Toxicology , author =

    Is the air pollution health research community prepared to support a multipollutant air quality management framework? , volume =. Inhalation Toxicology , author =. 2010 , note =. doi:10.3109/08958371003793846 , abstract =

  24. [24]

    Environment International , author =

    Evaluating the application of multipollutant exposure metrics in air pollution health studies , volume =. Environment International , author =. 2014 , keywords =. doi:10.1016/j.envint.2014.03.030 , abstract =

  25. [25]

    The R Journal , author =

    mclust 5:. The R Journal , author =. 2016 , pmid =

  26. [26]

    The R Journal , author =

    mclust 5:. The R Journal , author =. 2016 , note =. doi:10.32614/rj-2016-021 , abstract =

  27. [27]

    K., & Gupta, N

    Identifiability of the random effects’ covariance matrix of the linear mixed model , volume =. Communications in Statistics - Theory and Methods , author =. 2024 , pages =. doi:10.1080/03610926.2023.2272003 , language =

  28. [28]

    Journal of Classification2(1), 193–218 (1985) https://doi.org/10.1007/BF01908075

    Comparing partitions , volume =. Journal of Classification , author =. 1985 , pages =. doi:10.1007/BF01908075 , language =

  29. [29]

    Bioinformatics , author =

    Bayesian mixture model based clustering of replicated microarray data , volume =. Bioinformatics , author =. 2004 , pages =. doi:10.1093/bioinformatics/bth068 , abstract =

  30. [30]

    Journal of the American Statistical Association , author =

    Multiple. Journal of the American Statistical Association , author =. 2007 , pages =. doi:10.1198/016214507000000211 , language =

  31. [31]

    Bayesian Analysis , author =

    Improved criteria for clustering based on the posterior similarity matrix , volume =. Bayesian Analysis , author =. doi:10.1214/09-BA414 , number =

  32. [32]

    Genetic Epidemiology , author =

    Exploring data from genetic association studies using. Genetic Epidemiology , author =. 2012 , pmid =. doi:10.1002/gepi.21661 , abstract =

  33. [33]

    Journal of the American Statistical Association , volume =

    Gibbs. Journal of the American Statistical Association , author =. 2001 , note =. doi:10.1198/016214501750332758 , abstract =

  34. [34]

    Statistica Sinica , author =

    A. Statistica Sinica , author =. 1994 , note =

  35. [35]

    2015 , note =

    Journal of statistical software , author =. 2015 , note =

  36. [36]

    Combinatorial stochastic processes:

    Pitman, Jim , year =. Combinatorial stochastic processes:

  37. [37]

    and Coker, Eric and Jerrett, Michael and Ritz, B

    Molitor, J. and Coker, Eric and Jerrett, Michael and Ritz, B. and Li, A. and Committee, Health , month = apr, year =. Part 3

  38. [38]

    Frontiers in Big Data , author =

    Disease. Frontiers in Big Data , author =. 2021 , pages =. doi:10.3389/fdata.2021.676168 , abstract =

  39. [39]

    Environment International , author =

    Multi-pollutant exposure profiles associated with term low birth weight in. Environment International , author =. 2016 , pages =. doi:10.1016/j.envint.2016.02.011 , language =

  40. [40]

    Current Environmental Health Reports , author =

    Multi-pollutant. Current Environmental Health Reports , author =. 2018 , pages =. doi:10.1007/s40572-018-0177-0 , language =

  41. [41]

    Journal of Statistical Software , author =

    Estimation of. Journal of Statistical Software , author =. doi:10.18637/jss.v078.i02 , language =

  42. [42]

    Computational Statistics & Data Analysis , author =

    Joint modelling of multivariate longitudinal outcomes and a time-to-event:. Computational Statistics & Data Analysis , author =. 2009 , pages =. doi:10.1016/j.csda.2008.10.017 , language =

  43. [43]

    Tong, Y. L. , collaborator =. Fundamental. The. 1990 , doi =

  44. [44]

    Journal of the Royal Statistical Society Series C: Applied Statistics , author =

    Bayesian profile regression for clustering analysis involving a longitudinal response and explanatory variables , issn =. Journal of the Royal Statistical Society Series C: Applied Statistics , author =. 2023 , pages =. doi:10.1093/jrsssc/qlad097 , abstract =

  45. [45]

    Biostatistics , author =

    Bayesian profile regression with an application to the. Biostatistics , author =. 2010 , pages =. doi:10.1093/biostatistics/kxq013 , abstract =

  46. [46]

    Brockhoff, and Rune H

    Journal of Statistical Software , author =. doi:10.18637/jss.v082.i13 , language =

  47. [48]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , author =

    Laplace approximation of high dimensional integrals , volume =. Journal of the Royal Statistical Society Series B: Statistical Methodology , author =. 1995 , note =

  48. [49]

    Scandinavian Journal of Statistics , author =

    Learning from a lot:. Scandinavian Journal of Statistics , author =. 2019 , pages =. doi:10.1111/sjos.12335 , abstract =

  49. [50]

    Bayesian

    Gómez-Rubio, Virgilio , month = feb, year =. Bayesian

  50. [51]

    2006 , number =

    Prior distributions for variance parameters in hierarchical models (comment on article by. Bayesian Analysis , author =. doi:10.1214/06-BA117A , number =

  51. [52]

    & Walker, S

    Fitting. Journal of Statistical Software , author =. doi:10.18637/jss.v067.i01 , abstract =

  52. [53]

    , month = apr, year =

    Stroup, Walter W. , month = apr, year =. Generalized

  53. [54]

    Radioelectronics and Communications Systems , author =

    Determination of the radiation source location based on the electromagnetic wave’s front curvature , volume =. Radioelectronics and Communications Systems , author =. 2008 , pages =. doi:10.3103/S0735272708030011 , language =

  54. [55]

    and Doksum, Kjell A

    Bickel, Peter J. and Doksum, Kjell A. , year =. Mathematical statistics: basic ideas and selected topics , isbn =

  55. [56]

    and Casella, George , year =

    Lehmann, Erich L. and Casella, George , year =. Theory of point estimation , isbn =

  56. [57]

    Robust mixed model analysis , isbn =

    Jiang, Jiming , year =. Robust mixed model analysis , isbn =

  57. [58]

    Mixed models: theory and applications with

    Demidenko, Eugene , year =. Mixed models: theory and applications with

  58. [59]

    Computational Statistics & Data Analysis , author =

    The effect of misspecifying the random-effects distribution in linear mixed models for longitudinal data , volume =. Computational Statistics & Data Analysis , author =. 1997 , keywords =. doi:10.1016/S0167-9473(96)00047-3 , abstract =

  59. [60]

    Statistics in Medicine , author =

    Does the covariance structure matter in longitudinal modelling for the prediction of future. Statistics in Medicine , author =. 1998 , pmid =. doi:10.1002/(sici)1097-0258(19981030)17:20<2381::aid-sim926>3.0.co;2-s , abstract =

  60. [61]

    Environmental Research , author =

    Short-term personal and outdoor exposure to ultrafine and fine particulate air pollution in association with blood pressure and lung function in healthy adults , volume =. Environmental Research , author =. 2021 , note =

  61. [62]

    Journal of Computational and Graphical Statistics , author =

    Glmmlasso: an algorithm for high-dimensional generalized linear mixed models using ℓ1-penalization , volume =. Journal of Computational and Graphical Statistics , author =. 2014 , note =

  62. [63]

    Scandinavian Journal of Statistics , author =

    Estimation for high-dimensional linear mixed-effects models using ℓ1-penalization , volume =. Scandinavian Journal of Statistics , author =. 2011 , note =

  63. [64]

    Statistics and Computing , author =

    Variable selection for generalized linear mixed models by. Statistics and Computing , author =. 2014 , note =

  64. [65]

    Bulletin of the American Mathematical Society , author =

    On direct product matrices , volume =. Bulletin of the American Mathematical Society , author =. 1934 , note =

  65. [66]

    Cybernetics and systems analysis , author =

    A family of face products of matrices and its properties , volume =. Cybernetics and systems analysis , author =. 1999 , note =

  66. [67]

    , month = oct, year =

    Stroup, Walter W. , month = oct, year =. Generalized

  67. [68]

    Statistics and Computing , author =

    Variable selection for generalized linear mixed models by. Statistics and Computing , author =. 2014 , pages =. doi:10.1007/s11222-012-9359-z , abstract =

  68. [69]

    Journal of choice modelling , author =

    Bayesian estimation of mixed logit models:. Journal of choice modelling , author =. 2018 , note =

  69. [70]

    Bayesian Analysis , author =

    A default conjugate prior for variance components in generalized linear mixed models (comment on article by. Bayesian Analysis , author =. 2006 , note =. doi:10.1214/06-BA117B , abstract =

  70. [71]

    Journal of the American Statistical Association , author =

    Reference. Journal of the American Statistical Association , author =. 2000 , note =. doi:10.2307/2669540 , abstract =

  71. [72]

    Radioelectronics and Communications Systems , author =

    End products in matrices in radar applications , volume =. Radioelectronics and Communications Systems , author =. 1998 , note =

  72. [73]

    Electronic Journal of Statistics , author =

    Identifiability of linear mixed effects models , volume =. Electronic Journal of Statistics , author =. 2013 , note =. doi:10.1214/13-EJS770 , abstract =

  73. [74]

    Linear Algebra and its Applications , author =

    Identifiability of covariance parameters in linear mixed effects models , volume =. Linear Algebra and its Applications , author =. 2016 , keywords =. doi:10.1016/j.laa.2016.06.022 , abstract =

  74. [75]

    Journal of Pharmacokinetics and Pharmacodynamics , author =

    What do we mean by identifiability in mixed effects models? , volume =. Journal of Pharmacokinetics and Pharmacodynamics , author =. 2016 , pages =. doi:10.1007/s10928-015-9459-4 , abstract =

  75. [76]

    The. SIAM J. Matrix Anal. Appl. , author =. doi:10.1137/0601049 , abstract =

  76. [77]

    Computational statistics & data analysis , author =

    Shrinkage estimation in general linear models , volume =. Computational statistics & data analysis , author =. 2009 , note =

  77. [78]

    Statistical methods in medical research , author =

    Joint modelling of mixed outcome types using latent variables , volume =. Statistical methods in medical research , author =. 2008 , note =

  78. [79]

    Statistical Modelling , author =

    A multivariate generalized linear mixed model for joint modelling of clustered outcomes in the exponential family , volume =. Statistical Modelling , author =. 2001 , note =

  79. [80]

    Biometrics , author =

    Random effects modeling of multiple binomial responses using the multivariate binomial logit-normal distribution , volume =. Biometrics , author =. 2000 , note =

  80. [81]

    Computational Statistics , author =

    Bayesian variable selection for mixed effects model with shrinkage prior , volume =. Computational Statistics , author =. 2020 , note =

Showing first 80 references.