Doubly Outlier-Robust Online Infinite Hidden Markov Model

\'Alvaro Cartea; Gerardo Duran-Martin; Horace Yiu; Leandro S\'anchez-Betancourt

arxiv: 2604.14322 · v2 · submitted 2026-04-15 · 📊 stat.ML · cs.LG

Doubly Outlier-Robust Online Infinite Hidden Markov Model

Horace Yiu , Leandro S\'anchez-Betancourt , \'Alvaro Cartea , Gerardo Duran-Martin This is my paper

Pith reviewed 2026-05-13 07:58 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords robust Bayesian inferenceinfinite hidden Markov modelonline learningoutlier robustnessposterior influence functionstreaming data forecasting

0 comments

The pith

BR-iHMM bounds outlier influence in online infinite hidden Markov models and reduces forecasting error by up to 67%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a robust update rule for the online infinite hidden Markov model to handle outliers in streaming data and model misspecification. Using generalised Bayesian inference, it ensures the posterior influence function remains bounded, providing theoretical robustness guarantees. The Batched Robust iHMM incorporates two tunable parameters to manage the trade-off between robustness and adaptation to regime switches. Experiments on limit order book data, electricity demand, and synthetic systems demonstrate up to 67% reduction in one-step-ahead forecasting error compared to standard online Bayesian methods. This establishes a practical approach for robust online learning and forecasting.

Core claim

A batched robust update rule for the online iHMM bounds the posterior influence function under outliers and misspecification, implemented in the BR-iHMM with two parameters that balance adaptivity and robustness, leading to improved forecasting performance.

What carries the argument

The batched robust update rule that enforces a bounded posterior influence function (PIF) while allowing controlled adaptation lag.

If this is right

Provides conditions for bounded PIF in online iHMM.
Reduces one-step-ahead forecasting error by up to 67% on tested datasets.
Uses two tunable parameters to balance robustness and adaptation speed.
Supports interpretable online learning in streaming environments with outliers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may extend to other online Bayesian nonparametric models for similar robustness benefits.
Developing an automatic procedure for selecting the tunable parameters could enhance applicability without manual tuning.
Bounded influence suggests greater stability for applications in finance and energy forecasting under noisy conditions.

Load-bearing premise

The two additional tunable parameters can be chosen so that the adaptation lag remains acceptable for the target application.

What would settle it

Observing unbounded posterior influence or no reduction in forecasting error when outliers are present would contradict the central claim.

Figures

Figures reproduced from arXiv: 2604.14322 by \'Alvaro Cartea, Gerardo Duran-Martin, Horace Yiu, Leandro S\'anchez-Betancourt.

**Figure 1.** Figure 1: Data (blue dotted line) generated according to (139) with Student-t observation noise. The outliers are marked with x. The solid red line shows our BR-iHMM, and the solid orange line shows the standard online iHMM. Regions around the posterior predictive mean cover ±2 standard deviations as error bounds. after a changepoint (CP) and cannot readily reuse previously encountered regimes. On the other hand, on… view at source ↗

**Figure 2.** Figure 2: Bayesian architecture of the BR-iHMM. Variables in a circle are updated during inference. Red variables are fixed hyperparameters. For instance, the priors of γ and α are both uninformed Γ(1, 1) for all our experiments. Table C.4 in Appendix C.4 summarises the hyperparameters, which we optimise via bayesian-optimization on a training partition (Nogueira, 2014; Garrido-Merchan & ´ Hernandez-Lobato ´ , 2020)… view at source ↗

**Figure 3.** Figure 3: Synthetic linear data. Outliers are marked by red vertical dashes. Yellow and blue regions correspond to regimes 2 and 3. First subplot tracks rolling RMSE of mean predictions across the 100 runs. Bottom plot shows average MAP state. First 900 predictions are plotted. The full Figure E.11 is in the appendix. short calibration period, recovers the true number of regimes, and achieves forecasting accuracy co… view at source ↗

**Figure 4.** Figure 4: reports cumulative RMSE for one-step-ahead predictions. In this setting, hyperparameter optimisation selects a batch size of B = 1 for BR-iHMM, effectively recovering the WoLF-iHMM variant. This indicates that larger batch sizes degrade performance by forcing consecutive minutes into the same latent state, while regime changes in (141) can occur rapidly and are not primarily driven by outliers [PITH_FULL… view at source ↗

read the original abstract

We derive a robust update rule for the online infinite hidden Markov model (iHMM) for when the streaming data contains outliers and the model is misspecified. Leveraging recent advances in generalised Bayesian inference, we define robustness via the posterior influence function (PIF), and provide conditions under which the online iHMM has bounded PIF. Imposing robustness inevitably induces an adaptation lag for regime switching. Our method, which is called Batched Robust iHMM (BR-iHMM), balances adaptivity and robustness with two additional tunable parameters. Across limit order book data, hourly electricity demand, and a synthetic high-dimensional linear system, BR-iHMM reduces one-step-ahead forecasting error by up to 67% relative to competing online Bayesian methods. Together with theoretical guarantees of bounded PIF, our results highlight the practicality of our approach for both forecasting and interpretable online learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes the Batched Robust infinite Hidden Markov Model (BR-iHMM) for online inference on streaming data subject to outliers and model misspecification. It uses generalised Bayesian inference to derive an update rule with bounded posterior influence function (PIF) under stated theoretical conditions, introduces two tunable parameters to trade off robustness against adaptation lag during regime switches, and reports up to 67% reduction in one-step-ahead forecasting error relative to standard online Bayesian methods on limit-order-book, electricity-demand, and synthetic data.

Significance. If the bounded-PIF conditions are non-vacuous and the reported error reductions remain stable under transparent parameter selection that keeps lag within typical inter-regime intervals, the work would supply a practically useful combination of theoretical robustness guarantees and empirical forecasting gains for online learning under contamination.

major comments (2)

[Abstract] Abstract: the headline claim of up to 67% forecasting-error reduction is obtained only after choosing the two additional tunable parameters that control the robustness–lag trade-off; no automatic selection procedure, cross-validation scheme, or worst-case analytic bound on the induced adaptation lag is supplied, so it is impossible to verify whether the reported gains are attainable without lag that exceeds the typical spacing of regime switches on the target streams.
[Theoretical development] Theoretical development (conditions for bounded PIF): the manuscript asserts that the online iHMM possesses bounded PIF under the proposed robust update, yet the derivation is not shown to be independent of the two tunable parameters; if the boundedness result holds only for parameter values that produce unacceptable lag, the central theoretical guarantee does not support the practical claim.

minor comments (1)

[Abstract] The title uses 'Doubly Outlier-Robust' but the abstract does not explicitly identify the two distinct robustness mechanisms; a short clarifying sentence would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and have revised the manuscript to incorporate clarifications and additional discussion where appropriate.

read point-by-point responses

Referee: The headline claim of up to 67% forecasting-error reduction is obtained only after choosing the two additional tunable parameters that control the robustness–lag trade-off; no automatic selection procedure, cross-validation scheme, or worst-case analytic bound on the induced adaptation lag is supplied, so it is impossible to verify whether the reported gains are attainable without lag that exceeds the typical spacing of regime switches on the target streams.

Authors: We agree that explicit guidance on parameter selection is needed to make the empirical claims verifiable. In the revised manuscript we have added a dedicated subsection on practical parameter tuning, including domain-informed heuristics based on expected regime-switch frequency, a sensitivity analysis across the three datasets showing that the reported error reductions remain above 50% for lag values well within typical inter-regime spacing, and a simple worst-case bound on adaptation lag derived from the batch size and robustness parameter. While we do not introduce an automatic cross-validation procedure (which would require additional computational overhead not central to the contribution), the added material allows readers to reproduce and assess the gains under realistic lag constraints. revision: yes
Referee: The manuscript asserts that the online iHMM possesses bounded PIF under the proposed robust update, yet the derivation is not shown to be independent of the two tunable parameters; if the boundedness result holds only for parameter values that produce unacceptable lag, the central theoretical guarantee does not support the practical claim.

Authors: We thank the referee for highlighting this point. The bounded-PIF result is in fact independent of the specific lag-inducing value of the second tuning parameter: the proof relies only on the first robustness parameter being strictly positive and finite, which is a condition that can be satisfied while keeping the adaptation lag within any prescribed bound. We have revised the theoretical section to state this independence explicitly, to include the precise parameter restrictions under which bounded PIF holds, and to note that these restrictions are compatible with lag values that do not exceed typical regime-switch intervals on the target streams. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper derives a robust update rule for the online iHMM via generalised Bayesian inference, establishes conditions for bounded PIF, and introduces two explicit tunable parameters to trade off robustness against adaptation lag. Empirical forecasting gains are reported on held-out streams after parameter selection; no equation reduces a claimed prediction to a fitted input by construction, no self-citation supplies a load-bearing uniqueness theorem, and no ansatz is smuggled via prior work. The derivation remains self-contained against external benchmarks and the tunable parameters are openly acknowledged rather than hidden inside the result.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on the existence of conditions that bound the PIF for the chosen generalized-Bayesian update and on the two extra parameters being sufficient to control the robustness-adaptivity trade-off. No new entities are postulated.

free parameters (2)

robustness tuning parameter
One of the two additional parameters that controls the strength of the robust update.
adaptation-lag tuning parameter
Second parameter that trades off how quickly the model reacts to genuine regime changes.

axioms (1)

domain assumption Generalized Bayesian inference yields a well-defined posterior influence function for the iHMM update
Invoked to define robustness; standard in the generalized-Bayes literature but not re-derived here.

pith-pipeline@v0.9.0 · 5458 in / 1330 out tokens · 26408 ms · 2026-05-13T07:58:57.777384+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages

[1]

Antoniak, C

PMLR, 2023. Antoniak, C. E. Mixtures of dirichlet processes with appli- cations to bayesian nonparametric problems.The annals of statistics, pp. 1152–1174, 1974. Baum, L. E. and Petrie, T. Statistical inference for proba- bilistic functions of finite state markov chains.The annals of mathematical statistics, 37(6):1554–1563, 1966. Beal, M., Ghahramani, Z....

work page doi:10.1214/10-sts325 2023
[2]

Duran-Martin, G., S´anchez-Betancourt, L., Shestopaloff, A., and Murphy, K

URL https://proceedings.mlr.press/ v235/duran-martin24a.html. Duran-Martin, G., S´anchez-Betancourt, L., Shestopaloff, A., and Murphy, K. A unifying framework for generalised bayesian online learning in non-stationary environments. Transactions on Machine Learning Research, 2025. Escobar, M. D. and West, M. Bayesian density estimation and inference using ...

work page doi:10.21227/67vy-bs34 2025
[3]

Sgouralis, I

PMLR, 2021. Sgouralis, I. and Press ´e, S. Icon: An adaptation of infinite hmms for time traces with drift.Bio- physical Journal, 112(10):2117–2126, 2017. ISSN 0006-3495. doi: https://doi.org/10.1016/j.bpj.2017.04

work page doi:10.1016/j.bpj.2017.04 2021
[4]

URL https://www.sciencedirect.com/ science/article/pii/S0006349517303971. Shah, S. P., Xuan, X., DeLeeuw, R. J., Khojasteh, M., Lam, W. L., Ng, R., and Murphy, K. P. Integrating copy number polymorphisms into array cgh analysis using a robust hmm.Bioinformatics, 22(14):e431–e439, 2006. Stanley, H. E. Statistical physics and economic fluctuations: do outli...

work page doi:10.1162/neco 2006
[5]

Particle learning (PL) schemes exploit conjugate observation models to enable closed-form parameter updates within each particle (Carvalho et al., 2010)

to approximate the posterior over latent regimes, while propagating sufficient statistics for the observation parameters via Rao-Blackwellised particle filters (Murphy & Russell, 2001). Particle learning (PL) schemes exploit conjugate observation models to enable closed-form parameter updates within each particle (Carvalho et al., 2010). While computation...

work page 2001
[6]

We assume the outlier is the first observation of the batch for convenience, the location of the outlier within the batch is irrelevant

Consider two possible state trajectories sdegen 1:t = (0, ...,0) and sswitch 1:t = (1,0, ...0) . We assume the outlier is the first observation of the batch for convenience, the location of the outlier within the batch is irrelevant. As the outlier yc 1 moves closer to 5 (the mean of the incorrect state 1), the system prefers sswitch 1:t which harms inter...

work page 2049

[1] [1]

Antoniak, C

PMLR, 2023. Antoniak, C. E. Mixtures of dirichlet processes with appli- cations to bayesian nonparametric problems.The annals of statistics, pp. 1152–1174, 1974. Baum, L. E. and Petrie, T. Statistical inference for proba- bilistic functions of finite state markov chains.The annals of mathematical statistics, 37(6):1554–1563, 1966. Beal, M., Ghahramani, Z....

work page doi:10.1214/10-sts325 2023

[2] [2]

Duran-Martin, G., S´anchez-Betancourt, L., Shestopaloff, A., and Murphy, K

URL https://proceedings.mlr.press/ v235/duran-martin24a.html. Duran-Martin, G., S´anchez-Betancourt, L., Shestopaloff, A., and Murphy, K. A unifying framework for generalised bayesian online learning in non-stationary environments. Transactions on Machine Learning Research, 2025. Escobar, M. D. and West, M. Bayesian density estimation and inference using ...

work page doi:10.21227/67vy-bs34 2025

[3] [3]

Sgouralis, I

PMLR, 2021. Sgouralis, I. and Press ´e, S. Icon: An adaptation of infinite hmms for time traces with drift.Bio- physical Journal, 112(10):2117–2126, 2017. ISSN 0006-3495. doi: https://doi.org/10.1016/j.bpj.2017.04

work page doi:10.1016/j.bpj.2017.04 2021

[4] [4]

URL https://www.sciencedirect.com/ science/article/pii/S0006349517303971. Shah, S. P., Xuan, X., DeLeeuw, R. J., Khojasteh, M., Lam, W. L., Ng, R., and Murphy, K. P. Integrating copy number polymorphisms into array cgh analysis using a robust hmm.Bioinformatics, 22(14):e431–e439, 2006. Stanley, H. E. Statistical physics and economic fluctuations: do outli...

work page doi:10.1162/neco 2006

[5] [5]

Particle learning (PL) schemes exploit conjugate observation models to enable closed-form parameter updates within each particle (Carvalho et al., 2010)

to approximate the posterior over latent regimes, while propagating sufficient statistics for the observation parameters via Rao-Blackwellised particle filters (Murphy & Russell, 2001). Particle learning (PL) schemes exploit conjugate observation models to enable closed-form parameter updates within each particle (Carvalho et al., 2010). While computation...

work page 2001

[6] [6]

We assume the outlier is the first observation of the batch for convenience, the location of the outlier within the batch is irrelevant

Consider two possible state trajectories sdegen 1:t = (0, ...,0) and sswitch 1:t = (1,0, ...0) . We assume the outlier is the first observation of the batch for convenience, the location of the outlier within the batch is irrelevant. As the outlier yc 1 moves closer to 5 (the mean of the incorrect state 1), the system prefers sswitch 1:t which harms inter...

work page 2049