pith. machine review for the scientific record. sign in

arxiv: 2605.07300 · v1 · submitted 2026-05-08 · 📊 stat.ME · stat.AP

Recognition: 2 theorem links

· Lean Theorem

A Beta-GAM Hidden Markov Model for Proportion Time Series

Andrea Nigri, Han Lin Shang, Marco Bonetti

Pith reviewed 2026-05-11 00:51 UTC · model grok-4.3

classification 📊 stat.ME stat.AP
keywords hidden Markov modelBeta distributiongeneralized additive modelproportion time seriesregime switchingpenalized EMmortality ratios
0
0 comments X

The pith

A hidden Markov model with Beta emissions and GAM means captures latent regime shifts in proportion time series.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a hidden Markov model for time series consisting of proportions between zero and one. Within each unobserved regime the proportions follow a Beta distribution whose mean is linked to covariates through a generalized additive model with splines while the precision remains fixed for that regime. This structure lets the model detect abrupt changes in the generating process and still fit smooth nonlinear covariate effects together with regime-specific variability. Estimation proceeds via a penalized expectation-maximization algorithm, the number of regimes is chosen by information criteria with a degeneracy filter, and uncertainty is obtained by parametric bootstrap. Simulations recover the true transitions, precisions, and state sequence accurately, and the model applied to Russian age-specific mortality ratios identifies two persistent regimes that align with documented historical shocks.

Core claim

The authors establish that a hidden Markov model with Beta emissions, state-specific GAM spline means, and state-specific precisions can be estimated by penalized EM, that standard information criteria with a degeneracy check select a suitable number of states, and that the resulting model recovers transition dynamics and decodes latent states in both simulated data and Russian female-to-total mortality proportions from 1960 to 2014, where the two recovered regimes admit a demographic interpretation in terms of known mortality shocks.

What carries the argument

The Beta-GAM hidden Markov model, in which a latent Markov chain switches between regimes, each emitting Beta-distributed proportions whose mean follows a state-specific GAM spline while precision is held constant within the regime.

If this is right

  • Transition probabilities and state-dependent parameters can be recovered accurately from proportion series.
  • Smooth nonlinear covariate effects are accommodated without assuming linearity.
  • Regime-specific variability is captured by allowing different precisions across latent states.
  • Degenerate solutions are avoided by combining information criteria with a precision-based filter.
  • Uncertainty in transitions and parameters is quantified by parametric bootstrap.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same structure could be applied to other bounded series that exhibit both gradual covariate effects and abrupt regime changes, such as economic shares or ecological coverage proportions.
  • Forecasting future proportions would follow naturally by propagating the estimated transition matrix and sampling from the state-specific GAM-Beta distributions.
  • Adding external historical indicators as covariates could test whether the recovered regimes remain stable or whether they collapse into one when more context is supplied.

Load-bearing premise

The observed proportions are generated by a small number of hidden regimes that follow a Markov process, each with constant precision and a mean that is adequately described by a smooth GAM spline of the covariates.

What would settle it

If the regimes decoded from the Russian mortality series do not align with the documented shocks of the late twentieth century, or if new simulations with known parameters produce systematically inaccurate state recovery, the central modeling assumptions would be challenged.

Figures

Figures reproduced from arXiv: 2605.07300 by Andrea Nigri, Han Lin Shang, Marco Bonetti.

Figure 1
Figure 1. Figure 1: displays the estimated state-specific mean curves from all 100 replications, overlaid on the true functions (dashed). The variability across replications is moderate and centered on the truth, confirming the consistency of the estimator. As expected, the dispersion is largest for State 1, which combines the lowest precision (ϕ1 = 10) with correspondingly noisier emissions, and increases at the boundaries o… view at source ↗
Figure 2
Figure 2. Figure 2: Estimated state-specific age profiles of the female-to-total mortality ratio with 95% pointwise bootstrap confidence bands (Russia, 1960–2014, ages 0–40, B = 200). The Viterbi-decoded state sequence should be interpreted as a model-based summary of the latent regime structure, rather than as evidence of precisely dated structural breaks. The decoded path suggests a broad transition from the more dispersed … view at source ↗
Figure 3
Figure 3. Figure 3: Estimated state-specific mean curves from 100 Monte Carlo replications (thin grey lines), with the true functions shown as dashed black lines (K = 4, T = 1,500, δ = 0.85). 27 [PITH_FULL_IMAGE:figures/full_fig_p027_3.png] view at source ↗
read the original abstract

We propose a hidden Markov model for univariate proportion time series taking values in (0,1), where regime switching captures latent structural changes and the emission distribution belongs to the Beta family. In each latent state, the Beta mean is linked to covariates through a generalized additive model (GAM) with spline-based smooth functions, while the Beta precision is state-specific, enabling flexible modeling of both nonlinear covariate effects and regime-dependent variability. Estimation is carried out via a penalized expectation--maximization algorithm, combining smoothing with numerical maximization of the penalized emission likelihood. To select the number of latent states and the smoothing penalty, we implement a grid search guided by standard information criteria (Akaike Information Criterion/Bayesian Information Criterion/Integrated Completed Likelihood) with a diagnostic filter that removes degenerate solutions characterized by explosive precision estimates. Uncertainty is quantified through a parametric bootstrap procedure for transition probabilities and state-dependent parameters. Simulation results demonstrate accurate recovery of transition dynamics, state precisions, and latent-state decoding. A motivating application to Russian age-specific mortality data (1960--2014, ages 0--40) illustrates how the proposed model summarizes smooth age patterns in female-to-total mortality ratios while identifying two persistent latent regimes that admit a substantive demographic interpretation in light of the country's well-documented mortality shocks that occurred over the second half of the twentieth century.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper proposes a hidden Markov model for univariate proportion time series in (0,1) with Beta emissions. In each latent state the Beta mean is modeled via a state-specific generalized additive model (GAM) using spline smooths on covariates, while the precision parameter is constant within states. Estimation proceeds via a penalized EM algorithm that incorporates smoothing penalties, with the number of states and penalty parameters selected by a grid search over AIC/BIC/ICL after discarding degenerate solutions with explosive precisions. Uncertainty for transitions and state parameters is obtained via parametric bootstrap. Simulations are used to demonstrate recovery of transition probabilities, state precisions, and latent-state sequences. The model is applied to Russian female-to-total mortality ratios (ages 0–40, 1960–2014) to recover two persistent latent regimes that are linked post hoc to documented mortality shocks.

Significance. If the generative assumptions are appropriate, the Beta-GAM HMM supplies a practical tool for regime-switching proportion data that simultaneously accommodates nonlinear covariate effects and state-dependent dispersion. The simulation evidence supports reliable recovery of the key quantities, and the mortality application shows how the framework can produce interpretable summaries of age patterns. The reliance on standard information criteria and bootstrap inference is a methodological strength that facilitates use by practitioners.

major comments (1)
  1. [Application to Russian mortality data] Mortality application: the claim that the decoded regimes possess substantive demographic meaning rests on post-hoc alignment with known shocks, yet the manuscript reports neither residual diagnostics, posterior predictive checks, out-of-sample log-likelihood comparisons, nor formal contrasts against a single-state Beta-GAM or non-Markov alternatives. Without these checks it is impossible to determine whether the two-regime structure is recovered from the data or imposed by the model’s flexibility.
minor comments (2)
  1. [Abstract] The abstract states that information criteria are used but does not indicate which criterion ultimately selected the two-state solution in the mortality example; this detail should be added for reproducibility.
  2. [Estimation procedure] Notation for the state-specific precision parameters and the smoothing penalty terms could be introduced earlier and used consistently throughout the estimation section.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that the mortality application would be strengthened by additional validation to support the interpretation of the latent regimes. We address this below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Application to Russian mortality data] Mortality application: the claim that the decoded regimes possess substantive demographic meaning rests on post-hoc alignment with known shocks, yet the manuscript reports neither residual diagnostics, posterior predictive checks, out-of-sample log-likelihood comparisons, nor formal contrasts against a single-state Beta-GAM or non-Markov alternatives. Without these checks it is impossible to determine whether the two-regime structure is recovered from the data or imposed by the model’s flexibility.

    Authors: The referee correctly notes that the application section currently lacks these validation steps and relies on post-hoc alignment with historical events. We will revise the manuscript to include: (i) residual diagnostics using randomized quantile residuals for the Beta emissions to check for systematic misfit; (ii) posterior predictive checks comparing replicated time series features (e.g., regime persistence and age-pattern smoothness) to the observed data; (iii) out-of-sample log-likelihood evaluation by holding out the final 10 years and refitting on the earlier period; and (iv) formal model comparisons via AIC, BIC, and ICL against a single-state Beta-GAM and a non-Markov alternative (independent Beta-GAM fits per time point). These additions will demonstrate that the two-regime structure is supported by the data. The methodological core and simulation results are unaffected. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper introduces a novel Beta-GAM HMM for proportion time series, with estimation performed via a standard penalized EM algorithm and uncertainty quantified by parametric bootstrap. Simulations demonstrate parameter recovery under the stated generative model, and the Russian mortality application is presented as illustrative with post-hoc demographic interpretation of decoded regimes. No load-bearing step reduces by the paper's own equations to a fitted quantity defined in terms of itself, nor does any central claim rest on a self-citation chain or imported uniqueness theorem. The derivation chain remains independent of its outputs.

Axiom & Free-Parameter Ledger

3 free parameters · 3 axioms · 0 invented entities

The model rests on standard statistical assumptions for HMMs and GAMs plus the choice of Beta family and spline basis; no new entities are postulated. Free parameters include the number of states, smoothing penalties, and state-specific precisions, all selected or fitted from data.

free parameters (3)
  • number of latent states
    Selected via grid search over AIC/BIC/ICL after discarding degenerate solutions; directly affects regime interpretation.
  • smoothing penalty parameters
    Tuned jointly with state selection; control the flexibility of the GAM splines within each state.
  • state-specific Beta precisions
    Estimated per state; allow regime-dependent variability but are fitted quantities.
axioms (3)
  • domain assumption The observed series is generated by a finite-state homogeneous Markov chain.
    Invoked in the model definition and transition probability estimation.
  • domain assumption Within each state the proportion follows a Beta distribution whose mean is a smooth function of covariates via GAM splines.
    Core emission assumption stated in the abstract.
  • standard math Standard regularity conditions for the penalized EM algorithm and bootstrap to be consistent.
    Implicit in the estimation and uncertainty quantification sections.

pith-pipeline@v0.9.0 · 5534 in / 1684 out tokens · 42108 ms · 2026-05-11T00:51:06.670171+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

    We propose a hidden Markov model for univariate proportion time series taking values in (0,1), where regime switching captures latent structural changes and the emission distribution belongs to the Beta family. In each latent state, the Beta mean is linked to covariates through a generalized additive model (GAM) with spline-based smooth functions, while the Beta precision is state-specific

  • IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

    Estimation is carried out via a penalized expectation–maximization algorithm... Simulation results demonstrate accurate recovery of transition dynamics, state precisions, and latent-state decoding. A motivating application to Russian age-specific mortality data

Reference graph

Works this paper leans on

27 extracted references

  1. [1]

    Ferrari and F

    S. Ferrari and F. Beta regression for modelling rates and proportions , journal =. 2004 , volume =

  2. [2]

    J. C. Douma and J. T. Weedon , title =. Methods in Ecology and Evolution , year =

  3. [3]

    Nocedal and S

    J. Nocedal and S. J. Wright , title =. 2006 , edn =

  4. [4]

    Basellini and S

    U. Basellini and S. Kjaergaard and C. G. Camarda , title =. Insurance: Mathematics and Economics , year =

  5. [5]

    M. D. Pascariu and A. Lenart and Y. The maximum entropy mortality model: Forecasting mortality using statistical moments , journal =. 2019 , volume =

  6. [6]

    Brouhns and M

    N. Brouhns and M. Denuit and J. K. Vermunt , title =. Bulletin of the Swiss Association of Actuaries , year =

  7. [7]

    Denuit and P

    M. Denuit and P. Devolder and A.-C. Goderniaux , title =. The Journal of Risk and Insurance , year =

  8. [8]

    Kneip and K

    A. Kneip and K. Utikal , title =. Journal of the American Statistical Association: Theory and Methods , year =

  9. [9]

    Kokoszka and H

    P. Kokoszka and H. Miao and A. Petersen and H. L. Shang , title =. International Journal of Forecasting , year =

  10. [10]

    Jank and G

    W. Jank and G. Shmeuli and C. Plaisant and B. Shneiderman , title =. Handbook of Data Visualization , publisher =. 2008 , address =

  11. [11]

    S. N. Wood , title =. 2017 , address =

  12. [12]

    F. K. C. Hui and C. You and H. L. Shang and S. M\". Semiparametric regression using variational approximations , journal =. 2019 , volume =

  13. [13]

    Methodology and Computing in Applied Probability , year =

    Can, Ceren Eda and Ergun, Gul and Soyer, Refik , title =. Methodology and Computing in Applied Probability , year =

  14. [14]

    Markov-switching generalized additive models , journal =

    Langrock, Roland and Kneib, Thomas and Glennie, Richard and Michelot, Th\'. Markov-switching generalized additive models , journal =. 2017 , volume =

  15. [15]

    Eilers, Paul H. C. and Marx, Brian D. , title =. Statistical Science , year =

  16. [16]

    Eilers, Paul H. C. and Marx, Brian D. , title =. 2021 , address =

  17. [17]

    , title =

    Gray, Robert J. , title =. Journal of the American Statistical Association:. 1992 , volume =

  18. [18]

    , title =

    Zucchini, Walter and MacDonald, Iain L. , title =. 2009 , address =

  19. [19]

    Computational Statistics , year =

    Celeux, Gilles and Durand, Jean-Baptiste , title =. Computational Statistics , year =

  20. [20]

    Dempster, A. P. and Laird, N. M. and Rubin, D. B. , title =. Journal of the Royal Statistical Society: Series B , year =

  21. [21]

    Health crisis in

    Shkolnikov, Vladimir and Mesl. Health crisis in. Population: An English Selection , year =

  22. [22]

    Zucchini and I

    W. Zucchini and I. L. MacDonald and R. Langrock , title =. 2016 , edition =

  23. [23]

    , title =

    Schwarz, G. , title =. Annals of Statistics , year =

  24. [24]

    Viterbi, A. J. , title =. IEEE Transactions on Information Theory , year =

  25. [25]

    and Celeux, G

    Biernacki, C. and Celeux, G. and Govaert, G. , title =. IEEE Transactions on Pattern Analysis and Machine Intelligence , year =

  26. [26]

    Akaike , title =

    H. Akaike , title =. IEEE Transactions on Automatic Control , year =

  27. [27]

    and Chenet, Laurent and Shkolnikov, Vladimir M

    Leon, David A. and Chenet, Laurent and Shkolnikov, Vladimir M. and Zakharov, Sergei and Shapiro, Judith and Rakhmanova, Galina and Vassin, Sergei and McKee, Martin , title =. The Lancet , year =