Assessment of Simulation-based Inference Methods for Stochastic Compartmental Models in Epidemiological Research

Jan Hasenauer; Lorenzo Contento; Martin K\"uhn; Nils Wassmuth; Vincent Wieland

arxiv: 2512.02528 · v4 · pith:Q75UUA2Knew · submitted 2025-12-02 · 🧬 q-bio.QM

Assessment of Simulation-based Inference Methods for Stochastic Compartmental Models in Epidemiological Research

Vincent Wieland , Nils Wassmuth , Lorenzo Contento , Martin K\"uhn , Jan Hasenauer This is my paper

Pith reviewed 2026-05-17 02:52 UTC · model grok-4.3

classification 🧬 q-bio.QM

keywords simulation-based inferencestochastic compartmental modelsepidemiological modelingparticle Markov chain Monte Carloconditional normalizing flowsparameter estimationlikelihood-free inferencepublic health forecasting

0 comments

The pith

Likelihood-free Bayesian methods accurately estimate parameters in stochastic SIS, SIR and SEIR epidemic models from noisy data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares pseudo-marginal particle Markov chain Monte Carlo, which relies on a particle filter for unbiased likelihood estimates, against conditional normalizing flows for inferring parameters in stochastic compartmental models of disease spread. It evaluates both approaches on SIS, SIR, and two-variant SEIR models together with observation models that translate latent trajectories into observable data. Simulation studies show the methods recover parameters while capturing stochastic epidemic dynamics, and application to an Ethiopian cohort dataset confirms they remain effective under real-world noise and irregular sampling. These capabilities support generation of nowcasts and short-term forecasts that can guide public health responses. The authors release code and synthetic datasets to enable reuse in building decision-support pipelines.

Core claim

Likelihood-free inference via particle-filter pseudo-marginal MCMC and conditional normalizing flows yields accurate and robust parameter estimates for stochastic compartmental models even when likelihoods are intractable, as shown by successful recovery of parameters in simulated SIS, SIR, and SEIR trajectories and by maintained performance on real epidemiological observations from an Ethiopian cohort subject to noise and irregular sampling.

What carries the argument

Pseudo-marginal particle Markov chain Monte Carlo using particle filters for unbiased likelihood estimates, together with conditional normalizing flows, applied to stochastic compartmental models equipped with observation models.

If this is right

The methods support fast nowcasts and short-term forecasts that can inform control of epidemic outbreaks.
They capture stochastic dynamics across classical SIS, SIR, and multi-variant SEIR models.
Performance remains stable under real-world noise and irregular data sampling as seen in the Ethiopian cohort.
Public release of code and synthetic datasets enables construction of reusable inference pipelines for public health applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same workflow could be tested on models that add spatial structure or explicit intervention effects without changing the core inference machinery.
Hybrid use of MCMC for calibration and normalizing flows for rapid sampling might further reduce computation time for ongoing surveillance.
Results point toward replacing simplified deterministic models with stochastic ones in operational forecasting systems when data irregularities are present.
Wider deployment could allow parameter tracking from limited or delayed reports during future outbreaks.

Load-bearing premise

The observation models and noise structures chosen for the simulation study adequately represent the irregularities and biases found in real epidemiological surveillance data.

What would settle it

Application of the same methods to additional real outbreak datasets with independently known transmission parameters that produces systematic bias or high uncertainty in recovered values would falsify the claim of operational robustness.

Figures

Figures reproduced from arXiv: 2512.02528 by Jan Hasenauer, Lorenzo Contento, Martin K\"uhn, Nils Wassmuth, Vincent Wieland.

**Figure 1.** Figure 1: Schematic representations of the models and methods. (a) Graph of compartments and possible transitions with corresponding rate parameters for the SIR model. (b) Graph of compartments and possible transitions with corresponding rate parameters for the two-variant SEIR model. (c) Workflow for the assessment of the Bayesian inference. 5 [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 2.** Figure 2: Evaluation of the posterior approximations for the SIR model A Posterior approximations based on 10,000 samples, true parameters are indicated by bold black lines and joint MAP estimates appear in a darker shade. B Model fit with percentiles (50%, 90%, 95%) based on simulating the 10,000 samples. Beyond these direct comparisons, the methods differed in sampling quality. PF achieved excellent convergence, w… view at source ↗

**Figure 3.** Figure 3: Marginal posterior approximations and prior distribution. Histograms of posterior approximations and prior distribution for the two-variant SEIR model using a dense dataset. mates differ under CNF and PF. We first examined posterior distributions and predictive performance using four dense simulated observational time series. Posterior distributions showed markedly lower variance compared to the correspo… view at source ↗

**Figure 4.** Figure 4: Evaluation of the posterior approximations for the two-variant SEIR model with a dense dataset. A Posterior approximations based on 10,000 samples, true parameters are indicated by bold black lines and joint MAP estimates appear in a darker shade. B Model fit with percentiles (50%, 90%, 95%) based on simulating the 10,000 samples. relatively low (∼500–600) for several parameters (Supplementary Table S13), … view at source ↗

**Figure 5.** Figure 5: Evaluation of the posterior approximations for the reparametrized two-variant SEIR model with dense data. A Posterior approximations based on 10,000 samples, true parameters are indicated by bold black lines and joint MAP estimates appear in a darker shade. B Model fit with percentiles (50%, 90%, 95%) based on simulating the 10,000 samples. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗

**Figure 6.** Figure 6: Evaluation of the posterior approximations for the two-variant SEIR model with real data. A Posterior approximations based on 10,000 samples, true parameters are indicated by bold black lines and joint MAP estimates appear in a darker shade. B Model fit with percentiles (50%, 90%, 95%) based on simulating the 10,000 samples. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗

read the original abstract

Global pandemics, such as the recent COVID-19 crisis, highlight the need for stochastic epidemic models that can capture the randomness inherent in the spread of disease. Such models must be accompanied by methods for estimating parameters in order to generate fast nowcasts and short-term forecasts that can inform public health decisions. This paper presents a comparison of two advanced Bayesian inference methods: 1) pseudo-marginal particle Markov chain Monte Carlo, using an unbiased likelihood estimate obtained by Particle Filter (PF), and 2) Conditional Normalizing Flows (CNF). We investigate their performance on three commonly used compartmental models: A classical Susceptible-Infected-Susceptible (SIS), a Susceptible-Infected-Recovered (SIR) model and a two-variant Susceptible-Exposed-Infected-Recovered (SEIR) model, complemented by an observation model that maps latent trajectories to empirical data. Addressing the challenges of intractable likelihoods for parameter inference in stochastic settings, our analysis highlights how these likelihood-free methods provide accurate and robust inference capabilities. The results of our simulation study further underscore the effectiveness of these approaches in capturing the stochastic dynamics of epidemics, providing prediction capabilities for the control of epidemic outbreaks. Results on an Ethiopian cohort study demonstrate operational robustness under real-world noise and irregular data sampling. To facilitate reuse and to enable building pipelines that ultimately contribute to better informed decision making in public health, we make code and synthetic datasets publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper compares PF-PMCMC and CNF on standard stochastic compartmental models with an Ethiopian data example and releases code, but the real-data robustness claim rests on an untested match between the assumed observation noise and actual surveillance irregularities.

read the letter

The one or two things your colleague should know: the paper benchmarks particle-filter based pseudo-marginal MCMC against conditional normalizing flows on stochastic SIS, SIR, and SEIR models, with an additional test on Ethiopian cohort data, and they make the code and synthetic data available. They do a good job running controlled simulations where they can check recovery of known parameters and then showing the methods can be applied to real surveillance data that has irregular sampling. The release of code and datasets is genuinely useful for building on this work in public health contexts. The softer area is the interpretation of the Ethiopian results as proof of operational robustness under real-world noise. The argument assumes that the observation model and its noise adequately represent the irregularities in actual data collection. Without perturbing those assumptions or adding more diagnostics beyond posterior predictive checks, it is hard to rule out that the performance is tied to the specific setup rather than general robustness. The simulation study stands on firmer ground. This paper is for researchers who fit stochastic epidemic models to incomplete or noisy data and want to see how these two inference techniques compare in practice. A reader focused on short-term epidemic forecasting or parameter estimation for decision support would find the head-to-head results and the open resources valuable. It deserves a serious referee. The work is relevant, the methods are appropriate for the problem, and the reproducibility elements make it worth the time even if some additional sensitivity checks are requested in revision. I recommend sending it out for peer review.

Referee Report

1 major / 1 minor

Summary. The manuscript compares two likelihood-free Bayesian inference methods—pseudo-marginal particle Markov chain Monte Carlo (PF-PMCMC) using a particle filter and Conditional Normalizing Flows (CNF)—for parameter estimation in stochastic SIS, SIR, and two-variant SEIR compartmental models. An observation model maps latent states to data. Performance is assessed via simulation studies with known ground-truth parameters and via application to an Ethiopian cohort study; the authors conclude that both methods deliver accurate, robust inference and operational robustness under real-world noise and irregular sampling. Code and synthetic datasets are released publicly.

Significance. If the central claims hold, the work supplies a practical benchmark of two modern likelihood-free methods for stochastic epidemic models, directly supporting nowcasting and short-term forecasting in public-health settings. The public release of code and synthetic data is a clear strength that enables reuse and pipeline building.

major comments (1)

[Abstract and Ethiopian-cohort results] Abstract and Ethiopian-cohort results section: the claim that the cohort results 'demonstrate operational robustness under real-world noise and irregular data sampling' is load-bearing for the paper’s central robustness conclusion. No sensitivity analysis is reported that perturbs the observation-model noise structure (e.g., time-varying underreporting or clustered missingness) and re-runs inference; without such a check, any mismatch between the assumed noise and actual surveillance irregularities directly undermines the robustness statement for real data that lack ground truth.

minor comments (1)

Add quantitative diagnostics (e.g., posterior predictive p-values or discrepancy measures) comparing simulated versus observed data features in addition to the visual checks already presented.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comment below and outline the revisions we will make.

read point-by-point responses

Referee: [Abstract and Ethiopian-cohort results] Abstract and Ethiopian-cohort results section: the claim that the cohort results 'demonstrate operational robustness under real-world noise and irregular data sampling' is load-bearing for the paper’s central robustness conclusion. No sensitivity analysis is reported that perturbs the observation-model noise structure (e.g., time-varying underreporting or clustered missingness) and re-runs inference; without such a check, any mismatch between the assumed noise and actual surveillance irregularities directly undermines the robustness statement for real data that lack ground truth.

Authors: We agree that the robustness claim for the Ethiopian cohort application is central and would be strengthened by explicit sensitivity checks on the observation model. Our simulation studies already vary observation noise levels and sampling irregularity to evaluate performance under controlled mismatches, but we did not re-run the real-data inference under perturbed noise structures such as time-varying underreporting or clustered missingness. We will add this sensitivity analysis to the revised manuscript, including results under alternative noise assumptions, to provide stronger support for the operational robustness statement. revision: yes

Circularity Check

0 steps flagged

No circularity in simulation-based assessment of inference methods

full rationale

The paper performs an empirical comparison of PF-PMCMC and CNF on SIS/SIR/SEIR models via simulation studies with known ground-truth parameters and applies the methods to an Ethiopian cohort dataset. Performance is evaluated using standard metrics such as parameter recovery accuracy and posterior predictive checks against external benchmarks (simulated trajectories and observed data features). These results are generated from independent simulation runs and real-data application rather than reducing to quantities defined by fitted parameters or self-referential definitions within the paper. The observation model is introduced as a separate component mapping latent states to data, with no evidence that predictions or robustness claims are equivalent to inputs by construction. The work is self-contained against external benchmarks and makes code and synthetic data available for verification.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper relies on standard assumptions of stochastic compartmental models and Bayesian inference; no new entities are postulated. Free parameters are the model parameters being inferred rather than invented by the authors.

axioms (2)

domain assumption Stochastic compartmental models generate trajectories whose likelihood is intractable under partial observations.
Stated in the abstract as the core challenge addressed by likelihood-free methods.
standard math Particle filters produce unbiased estimates of the likelihood for use in pseudo-marginal MCMC.
Standard property invoked when describing the first method.

pith-pipeline@v0.9.0 · 5571 in / 1339 out tokens · 37690 ms · 2026-05-17T02:52:57.448172+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We investigate their performance on three commonly used compartmental models: A classical Susceptible-Infected-Susceptible (SIS), a Susceptible-Infected-Recovered (SIR) model and a two-variant Susceptible-Exposed-Infected-Recovered (SEIR) model, complemented by an observation model that maps latent trajectories to empirical data.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Results on an Ethiopian cohort study demonstrate operational robustness under real-world noise and irregular data sampling.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.