Recognition: unknown
ProfileGLMM: a R Package Extending Bayesian Profile Regression using Generalised Linear Mixed Models
Pith reviewed 2026-05-09 23:28 UTC · model grok-4.3
The pith
ProfileGLMM extends Bayesian profile regression to hierarchical data by using GLMMs as the outcome model.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Integrating GLMMs as the outcome model allows Bayesian profile regression to analyze hierarchical and longitudinal data through random effects and to study interactions between latent clusters and observable covariates, while still clustering observations on the basis of a specified set of interdependent covariates.
What carries the argument
The GLMM outcome model in which derived cluster memberships are included as explanatory variables together with random effects for hierarchical structure.
Load-bearing premise
That replacing the original outcome model with a GLMM preserves the statistical validity and clustering properties of Bayesian profile regression without creating new biases or identifiability problems.
What would settle it
Simulate hierarchical data with known true clusters, fit both the original profile regression and ProfileGLMM, and check whether cluster recovery accuracy declines under the GLMM extension.
Figures
read the original abstract
ProfileGLMM is an R package integrating Generalised Linear Mixed Models (GLMMs) as the outcome model for Bayesian profile regression. This statistical framework simultaneously i) explains the variation in the outcome and ii) clusters the observations based on a specified set of interdependent clustering covariates. The derived cluster memberships are then incorporated, alongside others, as explanatory variables in the regression to model the outcome. This framework efficiently handles complex, highly correlated covariate structures whose direct inclusion in a standard regression model would be statistically sub-optimal. ProfileGLMM significantly extends Bayesian profile regression's scope by resolving two key constraints of previous implementations: 1) it allows the analysis of hierarchical and longitudinal data structures through the inclusion of random effects, and 2) it enables the study of interactions between latent clusters and other observable covariates. ProfileGLMM accommodates various data types, supporting both continuous or binary outcomes and both categorical and continuous clustering covariates. Built on fast Rcpp code with minimal mandatory parameters, ProfileGLMM offers a flexible analytical tool. It significantly enhances the utility of profile regression for researchers in fields such as epidemiology, social sciences, and clinical studies dealing with complex data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper describes the ProfileGLMM R package for extending Bayesian profile regression with GLMMs as the outcome model. It simultaneously clusters observations using interdependent covariates and models the outcome incorporating cluster memberships, random effects for hierarchical/longitudinal structures, and interactions between clusters and covariates. The package handles continuous/binary outcomes and categorical/continuous clustering covariates using efficient Rcpp code with minimal parameters.
Significance. If the extension is implemented such that the clustering step remains valid and separate despite the added random effects and interactions, the package would meaningfully broaden the use of profile regression in applied fields dealing with complex data. The fast implementation and flexibility are positive features for practical use.
major comments (2)
- [Abstract] The claim that ProfileGLMM resolves two key constraints of previous implementations lacks any equations or model details on how random effects and cluster-covariate interactions are incorporated into the GLMM while preserving the clustering properties, which is load-bearing for the central claim.
- No results from simulations, real data applications, or comparisons to existing methods are provided to substantiate the assertions about handling various data types and avoiding statistical sub-optimality.
minor comments (1)
- [Title] Grammatical error: 'a R Package' should be 'an R Package'.
Circularity Check
No significant circularity: software package description without derivations or predictions
full rationale
The document is a description of an R package (ProfileGLMM) that integrates GLMMs into Bayesian profile regression for handling hierarchical data and cluster-covariate interactions. It presents no equations, likelihoods, derivations, or statistical predictions. Claims concern software functionality, data type support, and scope extensions relative to prior profile regression implementations. No fitted parameters are renamed as predictions, no self-definitional steps exist, and no load-bearing self-citations or uniqueness theorems are invoked in the provided text. The central assertions are descriptive rather than derived, rendering the paper self-contained against external benchmarks with no reduction of outputs to inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Selection of. Bayesian Analysis , author =. doi:10.1214/16-BA1025 , number =
-
[2]
doi:10.32614/CRAN.package.ProfileGLMM , abstract =
Amestoy, Matteo , month = dec, year =. doi:10.32614/CRAN.package.ProfileGLMM , abstract =
-
[3]
Dose-Response: A Publication of International Hormesis Society , author =
Advancing. Dose-Response: A Publication of International Hormesis Society , author =. 2025 , pmid =. doi:10.1177/15593258251395790 , abstract =
-
[4]
EMBO Molecular Medicine , author =
Exposomics: a review of methodologies, applications, and future directions in molecular medicine , volume =. EMBO Molecular Medicine , author =. 2025 , pages =. doi:10.1038/s44321-025-00191-w , abstract =
-
[5]
Environment International , author =
What is new in the exposome? , volume =. Environment International , author =. 2020 , pages =. doi:10.1016/j.envint.2020.105887 , language =
-
[6]
Cancer Epidemiology, Biomarkers & Prevention , author =
Complementing the. Cancer Epidemiology, Biomarkers & Prevention , author =. 2005 , pages =. doi:10.1158/1055-9965.EPI-05-0456 , language =
-
[7]
Amestoy, Matteo and van de Wiel, Mark and Lakerveld, Jeroen and van Wieringen, Wessel , year =. Bayesian. doi:10.48550/ARXIV.2510.08304 , abstract =
-
[8]
doi:10.18637/jss.v064.i07 , language =
Journal of Statistical Software , author =. doi:10.18637/jss.v064.i07 , language =
-
[9]
Spectrum:. 2019 , note =. doi:10.32614/CRAN.package.Spectrum , abstract =
-
[10]
Ng, Andrew and Jordan, Michael and Weiss, Yair , year =. On. Advances in
-
[11]
Global burden of 87 risk factors in 204 countries and territories, 1990–2019: a systematic analysis for the. The Lancet , author =. 2020 , pages =. doi:https://doi.org/10.1016/S0140-6736(20)30752-2 , abstract =
-
[12]
European Heart Journal , author =
Heart healthy cities: genetics loads the gun but the environment pulls the trigger , volume =. European Heart Journal , author =. 2021 , pages =. doi:10.1093/eurheartj/ehab235 , abstract =
-
[13]
Environmental Pollution , author =
Long-term exposure to outdoor and household air pollution and blood pressure in the. Environmental Pollution , author =. 2020 , pages =. doi:10.1016/j.envpol.2020.114197 , language =
-
[14]
Nature Reviews Cardiology , author =
Influence of urban and transport planning and the city environment on cardiovascular disease , volume =. Nature Reviews Cardiology , author =. 2018 , pages =. doi:10.1038/s41569-018-0003-2 , language =
-
[15]
Environmental Science & Technology , author =
Using an. Environmental Science & Technology , author =. 2022 , pages =. doi:10.1021/acs.est.1c08327 , language =
-
[16]
Statistics and Computing , author =
On. Statistics and Computing , author =. 1999 , keywords =. doi:10.1023/A:1008853808677 , abstract =
-
[17]
Assessment of per- and polyfluoroalkyl substances (. Chemosphere , author =. 2022 , pages =. doi:10.1016/j.chemosphere.2022.134478 , language =
-
[18]
International Journal of Health Geographics , author =
Deep phenotyping meets big data: the. International Journal of Health Geographics , author =. 2020 , pages =. doi:10.1186/s12942-020-00235-z , abstract =
-
[19]
International Journal of Epidemiology , author =
Cohort. International Journal of Epidemiology , author =. 2022 , pages =. doi:10.1093/ije/dyab257 , language =
-
[20]
Impact of. Hypertension , author =. 2012 , pages =. doi:10.1161/HYPERTENSIONAHA.112.201400 , abstract =
-
[21]
Influence of. Hypertension , author =. 2003 , pages =. doi:10.1161/01.HYP.0000094221.86888.AE , abstract =
-
[22]
Gender. Hypertension , author =. 2001 , pages =. doi:10.1161/01.HYP.37.5.1199 , abstract =
-
[23]
Inhalation Toxicology , author =
Is the air pollution health research community prepared to support a multipollutant air quality management framework? , volume =. Inhalation Toxicology , author =. 2010 , note =. doi:10.3109/08958371003793846 , abstract =
-
[24]
Environment International , author =
Evaluating the application of multipollutant exposure metrics in air pollution health studies , volume =. Environment International , author =. 2014 , keywords =. doi:10.1016/j.envint.2014.03.030 , abstract =
-
[25]
The R Journal , author =
mclust 5:. The R Journal , author =. 2016 , pmid =
2016
-
[26]
mclust 5:. The R Journal , author =. 2016 , note =. doi:10.32614/rj-2016-021 , abstract =
-
[27]
Identifiability of the random effects’ covariance matrix of the linear mixed model , volume =. Communications in Statistics - Theory and Methods , author =. 2024 , pages =. doi:10.1080/03610926.2023.2272003 , language =
-
[28]
Journal of Classification2(1), 193–218 (1985) https://doi.org/10.1007/BF01908075
Comparing partitions , volume =. Journal of Classification , author =. 1985 , pages =. doi:10.1007/BF01908075 , language =
-
[29]
Bayesian mixture model based clustering of replicated microarray data , volume =. Bioinformatics , author =. 2004 , pages =. doi:10.1093/bioinformatics/bth068 , abstract =
-
[30]
Journal of the American Statistical Association , author =
Multiple. Journal of the American Statistical Association , author =. 2007 , pages =. doi:10.1198/016214507000000211 , language =
-
[31]
Improved criteria for clustering based on the posterior similarity matrix , volume =. Bayesian Analysis , author =. doi:10.1214/09-BA414 , number =
-
[32]
Genetic Epidemiology , author =
Exploring data from genetic association studies using. Genetic Epidemiology , author =. 2012 , pmid =. doi:10.1002/gepi.21661 , abstract =
-
[33]
Journal of the American Statistical Association , volume =
Gibbs. Journal of the American Statistical Association , author =. 2001 , note =. doi:10.1198/016214501750332758 , abstract =
-
[34]
Statistica Sinica , author =
A. Statistica Sinica , author =. 1994 , note =
1994
-
[35]
2015 , note =
Journal of statistical software , author =. 2015 , note =
2015
-
[36]
Combinatorial stochastic processes:
Pitman, Jim , year =. Combinatorial stochastic processes:
-
[37]
and Coker, Eric and Jerrett, Michael and Ritz, B
Molitor, J. and Coker, Eric and Jerrett, Michael and Ritz, B. and Li, A. and Committee, Health , month = apr, year =. Part 3
-
[38]
Frontiers in Big Data , author =
Disease. Frontiers in Big Data , author =. 2021 , pages =. doi:10.3389/fdata.2021.676168 , abstract =
-
[39]
Environment International , author =
Multi-pollutant exposure profiles associated with term low birth weight in. Environment International , author =. 2016 , pages =. doi:10.1016/j.envint.2016.02.011 , language =
-
[40]
Current Environmental Health Reports , author =
Multi-pollutant. Current Environmental Health Reports , author =. 2018 , pages =. doi:10.1007/s40572-018-0177-0 , language =
-
[41]
Journal of Statistical Software , author =
Estimation of. Journal of Statistical Software , author =. doi:10.18637/jss.v078.i02 , language =
-
[42]
Computational Statistics & Data Analysis , author =
Joint modelling of multivariate longitudinal outcomes and a time-to-event:. Computational Statistics & Data Analysis , author =. 2009 , pages =. doi:10.1016/j.csda.2008.10.017 , language =
-
[43]
Tong, Y. L. , collaborator =. Fundamental. The. 1990 , doi =
1990
-
[44]
Journal of the Royal Statistical Society Series C: Applied Statistics , author =
Bayesian profile regression for clustering analysis involving a longitudinal response and explanatory variables , issn =. Journal of the Royal Statistical Society Series C: Applied Statistics , author =. 2023 , pages =. doi:10.1093/jrsssc/qlad097 , abstract =
-
[45]
Bayesian profile regression with an application to the. Biostatistics , author =. 2010 , pages =. doi:10.1093/biostatistics/kxq013 , abstract =
-
[46]
Journal of Statistical Software , author =. doi:10.18637/jss.v082.i13 , language =
-
[48]
Journal of the Royal Statistical Society Series B: Statistical Methodology , author =
Laplace approximation of high dimensional integrals , volume =. Journal of the Royal Statistical Society Series B: Statistical Methodology , author =. 1995 , note =
1995
-
[49]
Scandinavian Journal of Statistics , author =
Learning from a lot:. Scandinavian Journal of Statistics , author =. 2019 , pages =. doi:10.1111/sjos.12335 , abstract =
-
[50]
Bayesian
Gómez-Rubio, Virgilio , month = feb, year =. Bayesian
-
[51]
Prior distributions for variance parameters in hierarchical models (comment on article by. Bayesian Analysis , author =. doi:10.1214/06-BA117A , number =
-
[52]
Fitting. Journal of Statistical Software , author =. doi:10.18637/jss.v067.i01 , abstract =
-
[53]
, month = apr, year =
Stroup, Walter W. , month = apr, year =. Generalized
-
[54]
Radioelectronics and Communications Systems , author =
Determination of the radiation source location based on the electromagnetic wave’s front curvature , volume =. Radioelectronics and Communications Systems , author =. 2008 , pages =. doi:10.3103/S0735272708030011 , language =
-
[55]
and Doksum, Kjell A
Bickel, Peter J. and Doksum, Kjell A. , year =. Mathematical statistics: basic ideas and selected topics , isbn =
-
[56]
and Casella, George , year =
Lehmann, Erich L. and Casella, George , year =. Theory of point estimation , isbn =
-
[57]
Robust mixed model analysis , isbn =
Jiang, Jiming , year =. Robust mixed model analysis , isbn =
-
[58]
Mixed models: theory and applications with
Demidenko, Eugene , year =. Mixed models: theory and applications with
-
[59]
Computational Statistics & Data Analysis , author =
The effect of misspecifying the random-effects distribution in linear mixed models for longitudinal data , volume =. Computational Statistics & Data Analysis , author =. 1997 , keywords =. doi:10.1016/S0167-9473(96)00047-3 , abstract =
-
[60]
Statistics in Medicine , author =
Does the covariance structure matter in longitudinal modelling for the prediction of future. Statistics in Medicine , author =. 1998 , pmid =. doi:10.1002/(sici)1097-0258(19981030)17:20<2381::aid-sim926>3.0.co;2-s , abstract =
-
[61]
Environmental Research , author =
Short-term personal and outdoor exposure to ultrafine and fine particulate air pollution in association with blood pressure and lung function in healthy adults , volume =. Environmental Research , author =. 2021 , note =
2021
-
[62]
Journal of Computational and Graphical Statistics , author =
Glmmlasso: an algorithm for high-dimensional generalized linear mixed models using ℓ1-penalization , volume =. Journal of Computational and Graphical Statistics , author =. 2014 , note =
2014
-
[63]
Scandinavian Journal of Statistics , author =
Estimation for high-dimensional linear mixed-effects models using ℓ1-penalization , volume =. Scandinavian Journal of Statistics , author =. 2011 , note =
2011
-
[64]
Statistics and Computing , author =
Variable selection for generalized linear mixed models by. Statistics and Computing , author =. 2014 , note =
2014
-
[65]
Bulletin of the American Mathematical Society , author =
On direct product matrices , volume =. Bulletin of the American Mathematical Society , author =. 1934 , note =
1934
-
[66]
Cybernetics and systems analysis , author =
A family of face products of matrices and its properties , volume =. Cybernetics and systems analysis , author =. 1999 , note =
1999
-
[67]
, month = oct, year =
Stroup, Walter W. , month = oct, year =. Generalized
-
[68]
Statistics and Computing , author =
Variable selection for generalized linear mixed models by. Statistics and Computing , author =. 2014 , pages =. doi:10.1007/s11222-012-9359-z , abstract =
-
[69]
Journal of choice modelling , author =
Bayesian estimation of mixed logit models:. Journal of choice modelling , author =. 2018 , note =
2018
-
[70]
A default conjugate prior for variance components in generalized linear mixed models (comment on article by. Bayesian Analysis , author =. 2006 , note =. doi:10.1214/06-BA117B , abstract =
-
[71]
Journal of the American Statistical Association , author =
Reference. Journal of the American Statistical Association , author =. 2000 , note =. doi:10.2307/2669540 , abstract =
-
[72]
Radioelectronics and Communications Systems , author =
End products in matrices in radar applications , volume =. Radioelectronics and Communications Systems , author =. 1998 , note =
1998
-
[73]
Electronic Journal of Statistics , author =
Identifiability of linear mixed effects models , volume =. Electronic Journal of Statistics , author =. 2013 , note =. doi:10.1214/13-EJS770 , abstract =
-
[74]
Linear Algebra and its Applications , author =
Identifiability of covariance parameters in linear mixed effects models , volume =. Linear Algebra and its Applications , author =. 2016 , keywords =. doi:10.1016/j.laa.2016.06.022 , abstract =
-
[75]
Journal of Pharmacokinetics and Pharmacodynamics , author =
What do we mean by identifiability in mixed effects models? , volume =. Journal of Pharmacokinetics and Pharmacodynamics , author =. 2016 , pages =. doi:10.1007/s10928-015-9459-4 , abstract =
-
[76]
The. SIAM J. Matrix Anal. Appl. , author =. doi:10.1137/0601049 , abstract =
-
[77]
Computational statistics & data analysis , author =
Shrinkage estimation in general linear models , volume =. Computational statistics & data analysis , author =. 2009 , note =
2009
-
[78]
Statistical methods in medical research , author =
Joint modelling of mixed outcome types using latent variables , volume =. Statistical methods in medical research , author =. 2008 , note =
2008
-
[79]
Statistical Modelling , author =
A multivariate generalized linear mixed model for joint modelling of clustered outcomes in the exponential family , volume =. Statistical Modelling , author =. 2001 , note =
2001
-
[80]
Biometrics , author =
Random effects modeling of multiple binomial responses using the multivariate binomial logit-normal distribution , volume =. Biometrics , author =. 2000 , note =
2000
-
[81]
Computational Statistics , author =
Bayesian variable selection for mixed effects model with shrinkage prior , volume =. Computational Statistics , author =. 2020 , note =
2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.