Fitting Large Nonlinear Mixed Effects Models Using Variational Expectation Maximization

Mohamed Tarek , Pedro Afonso

Authors on Pith no claims yet

Pith reviewed 2026-05-07 12:20 UTC · model grok-4.3

classification 📊 stat.ME cs.CEcs.LGcs.MSstat.CO

keywords modelsnlmeeffectsmodelvariationalfittingparametersdeepnlme

0 comments

The pith

Variational Expectation Maximization scales NLME model fitting to over 15,000 population parameters using flexible variational families and reverse-mode automatic differentiation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Nonlinear mixed effects models are used to study data where each person or subject has slightly different parameters drawn from a larger population, such as how individuals respond differently to the same drug. Fitting these models requires integrating over all possible individual variations, which becomes very slow when there are thousands of parameters. The paper tests whether variational expectation maximization, an approximation technique that replaces the hard integral with a simpler optimized distribution, can make this feasible. They show it works on a standard warfarin model and demonstrate speed on a much larger DeepNLME Friberg model with 15,410 population parameters and 16 random effects. The method uses automatic differentiation to compute updates efficiently without manual derivatives.

Core claim

VEM can efficiently maximize the marginal likelihood, scaling to NLME models with over 15,000 population parameters.

Load-bearing premise

The chosen variational family provides a sufficiently accurate approximation to the true posterior for the large-scale NLME models tested, and the Pumas implementation correctly recovers known results on the warfarin model.

read the original abstract

Nonlinear Mixed Effects models (NLME) models are widely used in pharmacometrics and related fields to analyze hierarchical and longitudinal data. However, as the number of parameters and random effects increases, traditional methods for maximizing the marginal likelihood become computationally expensive. This paper explores the Variational Expectation Maximization (VEM) algorithm, a scalable alternative for fitting NLME models. Originally introduced in the context of probabilistic graphical models and later popularized through variational autoencoders, VEM has not been extensively applied to NLME modeling. By leveraging flexible variational families and reverse-mode automatic differentiation, VEM can efficiently maximize the marginal likelihood, scaling to NLME models with over 15,000 population parameters. This work provides a detailed description of VEM, compares it to other NLME fitting algorithms, and highlights its scalability through computational experiments. Using the Pumas statistical software, we fit two test models: 1) a standard warfarin model, and 2) a DeepNLME Friberg model with 15,410 population parameters and 16 random effects. The warfarin model was fitted to completion to demonstrate the correctness of VEM, while the DeepNLME Friberg model was fitted for a limited number of iterations to measure the time per iteration and demonstrate VEM's scalability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

VEM scales per-iteration time to a 15k-parameter NLME model in Pumas, but the large case only reports timing after limited iterations with no convergence or accuracy checks.

read the letter

The paper takes variational EM, already used in VAEs and graphical models, and applies it to nonlinear mixed effects models for pharmacometrics. They implement it in Pumas, validate correctness on the warfarin dataset by running to completion and matching published results, and then time a few iterations on the much larger DeepNLME Friberg model with 15,410 population parameters and 16 random effects. That timing result is the concrete new piece: it shows the method can move forward on models that standard approaches struggle with computationally. The description of the algorithm, the comparison to FOCE and SAEM, and the use of reverse-mode AD are all straightforward and useful for readers who want to understand how the pieces fit together. The warfarin check gives some reassurance that the implementation recovers known answers on a standard problem. The soft spot is exactly where the stress-test note points: the headline scalability claim rests on the 15k-parameter run, yet that run stops after a small number of iterations solely to record wall-clock time. There are no ELBO traces, no stability checks on the parameters, and no comparison to a reference solution at that scale. Without those, it is hard to know whether the chosen variational family stays accurate or whether the procedure actually reaches a useful maximum. Minor issues include the lack of quantitative error metrics even on the small model beyond matching known results. This work is aimed at pharmacometricians and statisticians who routinely fit hierarchical longitudinal models and need faster options when parameter counts grow. A reader looking for practical timing data and an open implementation will find value, while someone needing strong guarantees on approximation quality at scale will want more diagnostics. It deserves peer review because the small-model validation is in place and the timing numbers are worth seeing in full, but referees should ask for convergence evidence on the large example before the central claim can be taken as demonstrated.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review based on abstract only; no explicit free parameters, axioms, or invented entities are detailed in the provided text.

axioms (1)

domain assumption A flexible variational family can approximate the intractable posterior in NLME models sufficiently well for optimization purposes
Implicit in the use of VEM for marginal likelihood maximization

pith-pipeline@v0.9.0 · 5535 in / 1176 out tokens · 59866 ms · 2026-05-07T12:20:26.567783+00:00 · methodology

Fitting Large Nonlinear Mixed Effects Models Using Variational Expectation Maximization

Core claim

Load-bearing premise

discussion (0)