Generalized Stochastic Approximation of the Log-Likelihood Ratio for Robust Sequential Change-Point Detection

Serhii Zabolotnii

arxiv: 2605.23419 · v2 · pith:2R33HZVTnew · submitted 2026-05-22 · 📊 stat.ME · eess.SP· math.ST· stat.TH

Generalized Stochastic Approximation of the Log-Likelihood Ratio for Robust Sequential Change-Point Detection

Serhii Zabolotnii This is my paper

Pith reviewed 2026-05-25 04:06 UTC · model grok-4.3

classification 📊 stat.ME eess.SPmath.STstat.TH

keywords change-point detectionlog-likelihood ratio approximationstochastic basisheavy-tailed distributionsCUSUMKunchenko boundnon-Gaussian processesmoment methods

0 comments

The pith

Moment-based approximation of the log-likelihood ratio enables robust change-point detection in heavy-tailed non-Gaussian data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a unified framework to approximate the log-likelihood ratio on a generalized stochastic basis using only moments up to order 3s, without requiring the analytic form of the distribution. This approximation adapts classical procedures such as CUSUM, GRSh, and SRP to non-Gaussian settings by interpreting the convergence functional as the projection of the Kullback-Leibler divergence. It targets the regime of small relative change-points that alter distribution shape rather than scale. A threshold derived from Kunchenko's probability-error bound controls the false-alarm rate without tuning, and the approach is shown to function on extremely heavy-tailed data where classical methods produce complete failure.

Core claim

The log-likelihood ratio is approximated as the convergence functional J(s) = K^T Y, the projection of the Kullback-Leibler divergence onto the span of a generalized stochastic basis (polynomial, logarithmic, or fractional-power) using moments up to order 3s, which supplies both a formal order-selection criterion and the means to adapt CUSUM, GRSh, and SRP procedures with a robust threshold from Kunchenko's bound in the small relative change-point regime.

What carries the argument

Generalized stochastic approximation of the log-likelihood ratio on polynomial, logarithmic, or fractional-power bases using moments up to order 3s, interpreted as the projection of the Kullback-Leibler divergence onto the basis span.

If this is right

Classical CUSUM, GRSh, and SRP procedures adapt to non-Gaussian data without knowing the full density.
The false-alarm rate is controlled by Kunchenko's probability-error bound without empirical tuning.
The method remains operative on data with excess kurtosis greater than 20 where classical methods produce 100 percent false alarms.
Detection delay is reduced while the guaranteed false-alarm level is maintained.
Core theorems receive formal verification in Lean 4.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The moment-based projection approach could extend to other sequential decision tasks that rely on unknown distributions.
The order-selection criterion J(s) might be applied to approximate other information measures beyond the log-likelihood ratio.
The framework could support formal verification of additional statistical procedures in theorem provers.

Load-bearing premise

Moments up to order 3s on the chosen generalized basis provide a sufficient approximation to the true log-likelihood ratio for small relative change-points, and Kunchenko's bound directly controls the false-alarm rate without further assumptions.

What would settle it

Run the method and classical procedures on the nine public benchmarks with excess kurtosis greater than 20 and observe whether the proposed method maintains a low false-alarm rate and reduced detection delay while classical methods produce 100 percent false alarms.

Figures

Figures reproduced from arXiv: 2605.23419 by Serhii Zabolotnii.

**Figure 1.** Figure 1: ADD as a function of the target FAR ε for GSA-poly S=2 on the normal distribution (ntrials = 100). The empirical values are consistent with the theoretical asymptote ∝ 1/ √ ε (Theorem 6). Scenario C (ρ < 1) is the most challenging and most practically relevant: the change in signal energy is small, yet the shape of the distribution changes. This is precisely the regime in which the GSA detector with s ≥ 2 … view at source ↗

**Figure 2.** Figure 2: GSA robustness to contamination (H0 contaminated with Gaussian outliers). Left axis — ADD; right axis — FAR. The PE criterion keeps FAR low even at 10% contamination. ρ < 1, a significant fraction of the Kullback–Leibler divergence resides in moments of order ≥ 3. Confirmation of the Gaussian limit. For γ3 = 0 and s = 1, the GSA detector with the polynomial basis yields an ADD practically identical to that… view at source ↗

**Figure 3.** Figure 3: Gain from higher approximation order: ADD as a function of ε for s = 1 vs. s = 2 (normal distribution, ntrials = 100). At fixed FAR, the higher order yields a shorter detection delay. 4.4 Robustness Analysis for Weak Signals Of particular interest is the ability of the method to detect small parameter changes (“incipient faults”) in the most challenging scenario C (γ3 = 10). δ ADD (s=1) ADD (s=3) Improveme… view at source ↗

**Figure 4.** Figure 4: Comparison of GSA-CUSUM with baselines on the Laplace distribution (bar chart of ADD; ntrials as in the source experiment in reports/charts/). 4.7 Comparison of Decision Rules: CUSUM, GRSh, SRP To confirm the architectural modularity of the approach (§3.7), we compared three decision rules under a fixed GSA-LLR (Φpoly, s = 3, γ3 = 10, δ = 0.3). Decision rule ADD FAR DetRate Characteristic CUSUM 52.9 ± 2.2 … view at source ↗

**Figure 5.** Figure 5: Comparison of the poly / frac / log bases on the Pearson III distribution with skewness γ3 = 10. The fractional-power basis compresses the dynamic range, enabling lower ADD at the same FAR. 4.8 Summary of Monte Carlo Results The results of Sections 4.1–4.7, obtained on synthetic data with controlled parameters, support the following conclusions: 1. Benefit of higher approximation orders: Increasing s from … view at source ↗

**Figure 6.** Figure 6: NASA IMS 2nd_test, feature = vibration kurtosis. The true CP is at position 100 (black vertical line). Six GSA variants (green) trigger after the CP — true detections; classical Sign-CUSUM, MAD-CUSUM, EWMA (red) trigger before the CP — false alarms. γ series 4 = 70.16 for the full series, γ calib 4 = 6.5 for the calibration subsample. Source data: paper/shared/results_manifest.json → tier1_datasets.nasa_im… view at source ↗

**Figure 7.** Figure 7: NSL-KDD (15 traffic windows after log(1 + x)). FAR-DetRate space; the ideal corner is (0, 1). Four GSA variants (S=1 poly/frac/log and S=2 log) cluster at the ideal corner; Sign-CUSUM is nearly there (FAR=6.7%, DetRate=86.7%); MAD-CUSUM misses 73% of attacks; EWMA detects none. generate frequent false alarms. GSA with the PE threshold analytically accounts for the shape of the distribution, which allows it… view at source ↗

**Figure 8.** Figure 8: Ablation study of three detector parameters (exp8): winsorization, basis function clipping, and threshold scaling. Base configuration: Pearson III γ3=10, δ=0.3, GSA-poly S=2, ntrials=50. Labels show FAR / DR; green indicates FAR ≤ 5%, red indicates violation of the PE guarantee. This is the most influential parameter. sh = 1.5 yields 30% faster detection than the default sh = 2.0, at the same FAR= 0% and D… view at source ↗

read the original abstract

Sequential change-point detection in non-Gaussian stochastic processes is challenging because the underlying densities are rarely known in real time. Classical parametric procedures such as CUSUM lose optimality under distributional mismatch, whereas nonparametric alternatives often react slowly. We develop a unified framework that approximates the log-likelihood ratio (LLR) on a generalized stochastic basis -- polynomial, logarithmic, or fractional-power -- using only moments up to order 3s, with no analytic form of the distribution, and thereby adapts the classical CUSUM, GRSh, and SRP procedures to non-Gaussian data. The convergence functional J(s) = K^T Y is interpreted as the projection of the Kullback-Leibler divergence onto the basis span, yielding a formal criterion for selecting the approximation order. We target the regime of small relative change-points, where the signal energy changes little but the shape of the distribution -- tail structure and modality -- does. A robust threshold follows from Kunchenko's probability-error bound (KU-PE), which controls the false-alarm rate without empirical tuning. On nine public benchmarks across four domains, the method is, to our knowledge, the only one operative on extremely heavy-tailed data (excess kurtosis gamma_4 > 20), where classical methods produce 100% false alarms, while reducing the detection delay at a guaranteed false-alarm level. The core theorems are formally verified in Lean 4.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The moment-based LLR approximation lets them adapt CUSUM-style detectors to heavy-tailed data with Lean verification, but the projection error likely leaves the false-alarm bound uncontrolled in the small-change regime.

read the letter

The paper's main move is approximating the log-likelihood ratio via a projection onto a generalized stochastic basis (polynomial, log, or fractional-power) using moments up to order 3s, then plugging the result into adapted CUSUM, GRSh, and SRP procedures with Kunchenko's bound for the threshold. This targets non-Gaussian processes and especially the small relative change regime where shape and tails shift more than energy does. The Lean 4 verification of core theorems is a clear positive for rigor, and the benchmarks claim it is the only method that runs without 100% false alarms on data with excess kurtosis above 20. That combination of formal verification and practical reach on hard cases is what stands out. The soft spot is the approximation error. The stress-test concern holds: even when the first 3s moments exist, a finite basis can miss tail discrepancies in heavy-tailed small-change settings, so the additive error term may push the crossing probability outside Kunchenko's bound. The Lean results appear to cover the exact LLR case rather than the projected statistic, leaving that gap unclosed. This is for applied statisticians and signal-processing researchers who need robust detectors without full distributional knowledge. It has enough formal grounding and empirical claims to deserve a serious referee, though the error control will need direct attention. I would send it to peer review.

Referee Report

3 major / 2 minor

Summary. The manuscript develops a unified framework for sequential change-point detection in non-Gaussian processes by approximating the log-likelihood ratio via a generalized stochastic basis (polynomial, logarithmic, or fractional-power) using only moments up to order 3s. The approximation is formalized as the projection J(s) = K^T Y of the Kullback-Leibler divergence, which is then plugged into adapted CUSUM, GRSh, and SRP procedures; a threshold is obtained from Kunchenko's KU-PE probability-error bound to guarantee the false-alarm rate without tuning. The target regime is small relative change-points that alter distribution shape or tails rather than energy. Performance is reported on nine public benchmarks, with the claim that the method is the only one that remains operative for data with excess kurtosis gamma_4 > 20 (where classical methods yield 100% false alarms) while reducing detection delay; core theorems are formally verified in Lean 4.

Significance. If the projection error can be shown to be controlled so that the KU-PE bound remains valid, the work would constitute a meaningful advance in robust nonparametric sequential detection by supplying a moment-based procedure that retains formal false-alarm guarantees on heavy-tailed data where parametric and many nonparametric alternatives break down. The Lean 4 verification of the core theorems is a clear strength that supports reproducibility and rigor.

major comments (3)

[Threshold derivation using KU-PE] The direct invocation of Kunchenko's KU-PE bound on the approximated statistic J(s) (described after the definition of the projection) supplies no explicit additive error term that accounts for the discrepancy between the finite-basis projection and the true LLR when the underlying distribution has gamma_4 > 20; because the bound is asserted to deliver a guaranteed false-alarm rate, this omission is load-bearing for the central robustness claim.
[Moment approximation up to 3s] In the regime of small relative change-points (shape/tail shifts with little energy change), the manuscript states that moments up to order 3s suffice for the generalized basis approximation, yet provides no quantitative uniform bound on the resulting projection error that is independent of tail heaviness; without such control the crossing probabilities of the adapted CUSUM/GRSh/SRP procedures may exceed the KU-PE guarantee.
[Lean verification of core theorems] The Lean-verified theorems address the exact (non-approximated) LLR case; the manuscript must clarify whether and how the finite-basis projection step is incorporated into the verified statements or bounded so that the practical procedure inherits the same guarantees.

minor comments (2)

The nine benchmarks are referenced only by domain; adding a table that lists the specific datasets, their sample sizes, and measured gamma_4 values would improve reproducibility.
Notation for the basis functions and the vector K should be introduced with an explicit definition before the first use of J(s) = K^T Y.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We are grateful to the referee for the thorough review and valuable feedback on our manuscript. The comments have helped us identify areas where the presentation and theoretical guarantees can be strengthened. We provide point-by-point responses to the major comments below, indicating the revisions we plan to make.

read point-by-point responses

Referee: [Threshold derivation using KU-PE] The direct invocation of Kunchenko's KU-PE bound on the approximated statistic J(s) (described after the definition of the projection) supplies no explicit additive error term that accounts for the discrepancy between the finite-basis projection and the true LLR when the underlying distribution has gamma_4 > 20; because the bound is asserted to deliver a guaranteed false-alarm rate, this omission is load-bearing for the central robustness claim.

Authors: We thank the referee for highlighting this important aspect. The manuscript applies the KU-PE bound directly to J(s) as it is the observable statistic. To address the concern, we will revise the manuscript to include an analysis of the approximation error between J(s) and the true LLR, deriving an explicit bound in terms of the basis truncation and moment estimates. This will show that the error is bounded by a term that vanishes as s increases, allowing the false-alarm guarantee to hold with a modified threshold that accounts for the additive error. revision: yes
Referee: [Moment approximation up to 3s] In the regime of small relative change-points (shape/tail shifts with little energy change), the manuscript states that moments up to order 3s suffice for the generalized basis approximation, yet provides no quantitative uniform bound on the resulting projection error that is independent of tail heaviness; without such control the crossing probabilities of the adapted CUSUM/GRSh/SRP procedures may exceed the KU-PE guarantee.

Authors: The referee correctly notes the absence of a quantitative uniform bound independent of tail heaviness. Our framework is designed for small relative change-points where moments up to 3s capture the shape and tail changes. In the revision, we will add a quantitative bound on the projection error that depends on higher moments and demonstrate how increasing s controls the error to maintain the KU-PE guarantee. revision: yes
Referee: [Lean verification of core theorems] The Lean-verified theorems address the exact (non-approximated) LLR case; the manuscript must clarify whether and how the finite-basis projection step is incorporated into the verified statements or bounded so that the practical procedure inherits the same guarantees.

Authors: We will revise the manuscript to clarify that the Lean verification is for the exact LLR. We will incorporate a new result bounding the projection error and showing that the guarantees extend to the approximated statistic with high probability when s is chosen based on the data moments. This will be presented as an additional theorem. revision: yes

Circularity Check

0 steps flagged

No circularity: moment-based LLR projection and KU-PE bound are independent of target result

full rationale

The derivation approximates the LLR via finite moments on a chosen basis and interprets J(s)=K^T Y as its KL projection, then feeds the statistic into standard CUSUM/GRSh/SRP procedures whose false-alarm control is taken from the external Kunchenko KU-PE bound. The core theorems are stated to be Lean-verified. No equation reduces a prediction to a fitted parameter by construction, no self-citation chain bears the central claim, and the approximation order selection is presented as an explicit functional rather than a tautology. The supplied abstract and context contain no load-bearing step that collapses to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework relies on standard statistical assumptions about moments and the applicability of the cited bound; no new entities are postulated.

axioms (2)

domain assumption Moments up to order 3s are available and sufficient for the approximation
The method relies on using only these moments without the analytic form of the distribution.
domain assumption Kunchenko's probability-error bound (KU-PE) provides a robust threshold controlling false-alarm rate
Used for the threshold without empirical tuning.

pith-pipeline@v0.9.0 · 5786 in / 1397 out tokens · 29194 ms · 2026-05-25T04:06:43.772103+00:00 · methodology

Generalized Stochastic Approximation of the Log-Likelihood Ratio for Robust Sequential Change-Point Detection

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)