arxiv: 2604.13467 · v1 · submitted 2026-04-15 · 💻 cs.IT · math.IT

Recognition: unknown

Stability of the Shannon--McMillan--Breiman Theorem under Sublinear Parsings

Raphael Grondin

Authors on Pith no claims yet

Pith reviewed 2026-05-10 13:02 UTC · model grok-4.3

classification 💻 cs.IT math.IT

keywords Shannon-McMillan-Breiman theorementropy ratesublinear parsingshift-invariant measurestabilitycylinder probabilityinformation theoryergodic theory

0 comments

The pith

Sublinear data-dependent parsings preserve almost-sure convergence of normalized block log-likelihood sums to the entropy rate for any invariant measure.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes stability of the Shannon-McMillan-Breiman theorem for sequences on the one-sided shift space. Given any shift-invariant probability measure P, if a parsing divides the length-N string into a number of blocks that remains sublinear in N almost surely, then the sum of the negative log-likelihoods of those blocks divided by N converges almost surely and in L1 to the entropy rate h_P. This statement is equivalent to an approximate factorization property for the cylinder probabilities under the same parsings. The stability extends to subextensive changes in the chosen blocks, and the sublinearity condition cannot be relaxed because a direct counterexample shows failure when the block count grows linearly.

Core claim

For any shift-invariant probability measure P and any data-dependent parsing whose number of blocks is sublinear in N almost surely, the normalized sum of the negative log-likelihoods of the parsing blocks converges almost surely and in L1(P) to the entropy-rate function h_P. Equivalently, cylinder probabilities admit an approximate factorization under arbitrary sublinear parsings. The result persists under subextensive perturbations of the blocks, while sublinearity of the block count is the sharp threshold, as shown by an explicit counterexample for linear counts.

What carries the argument

A data-dependent parsing whose number of blocks is o(N) almost surely, applied to a shift-invariant measure P on the one-sided finite shift space, with the normalized sum of block negative log-likelihoods.

If this is right

The convergence holds both almost surely and in L1 norm under the stated conditions.
The result remains valid when the parsing blocks undergo subextensive perturbations.
Cylinder probabilities factorize approximately under any such sublinear parsing.
Sublinearity is a sharp threshold: linear block counts admit counterexamples where the convergence fails.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Flexible adaptive compression schemes could employ data-driven parsings with o(N) blocks while retaining asymptotic entropy-rate performance.
Analogous stability statements may hold for other ergodic theorems when the underlying division of the sequence is sublinear.
Numerical checks on Markov sources with controlled block-count growth rates could map the precise transition between convergent and non-convergent regimes.

Load-bearing premise

The parsing must produce a number of blocks that is sublinear in the sequence length N almost surely.

What would settle it

A concrete shift-invariant measure P together with a parsing that uses o(N) blocks almost surely, yet for which the normalized sum of negative log block probabilities fails to converge to h_P.

read the original abstract

We establish a stability result for the Shannon-McMillan-Breiman theorem on the one-sided finite shift space. For any shift-invariant probability measure P and any data-dependent parsing whose number of blocks is sublinear in N almost surely, we show that the normalized sum of the negative log-likelihoods of the parsing blocks converges almost surely and in L^1(P) to the entropy-rate function h_P. Equivalently, we obtain an approximate factorization of cylinder probabilities under arbitrary sublinear parsings. We further show that the stability result persists under subextensive perturbations of the parsing blocks, and that sublinearity of the block count is the sharp threshold for validity at this level of generality, via a direct counterexample.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a clean extension of the SMB theorem to arbitrary data-dependent parsings with o(N) blocks a.s., plus a direct counterexample showing the threshold is sharp.

read the letter

The main result is that for any shift-invariant measure on the one-sided shift space, if a parsing produces sublinearly many blocks almost surely, then the average negative log probability of those blocks converges almost surely and in L1 to the entropy rate h_P. The paper also shows this survives subextensive perturbations of the blocks and proves sublinearity is necessary by counterexample when the block count is linear in N. That combination is the actual new piece; the classical SMB is recovered when blocks are fixed length, but the stability under data-dependent choices with the sharp cutoff is not standard. The argument applies the ergodic theorem to the sum and lets sublinearity absorb the boundary and dependence terms, which is straightforward and matches the stress-test description with no circularity or free parameters. The counterexample is explicit and keeps the same measure class, so it lands cleanly. One soft spot is the setting: everything stays on one-sided shifts with shift-invariant probabilities, so non-stationary or two-sided cases are left aside, though that is the natural scope for the original theorem. The parsing is assumed measurable enough for the probabilities to be defined, but the paper does not appear to hide extra regularity conditions. This is for people working in ergodic theory or information theory on adaptive parsing, variable-length coding, or sequence statistics. A reader already comfortable with the standard SMB theorem will see the extension and the sharpness result without much extra background. It is not broad enough to interest everyone in the field, but the statement is precise and the counterexample is concrete. The work shows clear thinking and direct engagement with the literature. It deserves a serious referee because the claims are specific, the evidence for sharpness is given by example, and the convergence extension is non-trivial within its niche. I would send it out for review rather than desk reject.

Referee Report

0 major / 2 minor

Summary. The manuscript establishes a stability result for the Shannon-McMillan-Breiman theorem on the one-sided finite shift space. For any shift-invariant probability measure P and any data-dependent parsing with sublinear block count in N almost surely, the normalized sum of negative log-likelihoods of the blocks converges almost surely and in L^1 to the entropy rate h_P. It also provides an approximate factorization of cylinder probabilities, shows the result holds under subextensive perturbations of the blocks, and demonstrates that sublinearity is the sharp threshold by constructing a counterexample for linear block counts.

Significance. The result is significant because it generalizes the classical SMB theorem to a wide class of adaptive parsings that arise in practical information-theoretic settings such as compression and sequence modeling. The identification of sublinearity as the sharp condition, supported by an explicit counterexample, clarifies the limits of the stability and strengthens the theoretical contribution. The use of ergodic theory to handle the data-dependent nature without additional assumptions is a positive aspect, and the L^1 convergence provides a stronger mode of convergence than a.s. alone.

minor comments (2)

[Abstract] The term 'subextensive perturbations' is introduced without a precise definition or reference; a short explanation in the abstract or introduction would enhance accessibility.
[Counterexample section] The construction of the counterexample for linear block counts is key to the sharpness claim; ensuring it is fully detailed and self-contained would strengthen the paper.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary and recommendation of minor revision. The assessment correctly captures the main contributions, including the almost-sure and L1 convergence under sublinear parsings, the approximate factorization, the persistence under subextensive perturbations, and the sharpness result via counterexample. No major comments requiring point-by-point rebuttal were listed in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The derivation applies the ergodic theorem directly to the normalized sum of negative log-likelihoods of parsing blocks, with the sublinear block-count condition (o(N) a.s.) ensuring boundary effects vanish in the limit. This is a standard extension of the Shannon-McMillan-Breiman theorem on shift-invariant measures; the proof does not reduce any claimed convergence to a fitted parameter, a self-referential definition, or a load-bearing self-citation. The sharpness counterexample for linear block counts is constructed explicitly and independently. No enumerated circularity pattern appears in the argument chain.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The claim rests on standard ergodic-theory assumptions for stationary processes on finite alphabets; no free parameters, new entities, or ad-hoc axioms are introduced beyond the sublinearity condition that is proved sharp.

axioms (2)

domain assumption P is a shift-invariant probability measure on the one-sided finite shift space
Invoked throughout the abstract as the setting for the SMB theorem.
domain assumption The parsing is data-dependent with block count sublinear in N almost surely
Central hypothesis whose sharpness is demonstrated by counterexample.

pith-pipeline@v0.9.0 · 5411 in / 1273 out tokens · 38186 ms · 2026-05-10T13:02:18.328707+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 9 canonical work pages

[1]

On the Ziv--Merhav theorem beyond Markovianity I

Nicholas Barnfield, Rapha \"e l Grondin, Gaia Pozzoli, and Renaud Raqu \'e pas. On the Ziv--Merhav theorem beyond Markovianity I. Canadian Journal of Mathematics , 77(3):891--915, 2025. doi: 10.4153/S0008414X24000178

work page doi:10.4153/s0008414x24000178 2025
[2]

On the Ziv--Merhav theorem beyond Markovianity II: Leveraging the Thermodynamic Formalism

Nicholas Barnfield, Rapha \"e l Grondin, Gaia Pozzoli, and Renaud Raqu \'e pas. On the Ziv--Merhav theorem beyond Markovianity II: Leveraging the Thermodynamic Formalism. Annals of Applied Probability , 2026. To appear in Annals of Applied Probability

2026
[3]

Andrew R. Barron. The Strong Ergodic Theorem for Densities: Generalized Shannon--McMillan--Breiman Theorem. Annals of Probability , 13(4):1292--1303, 1985

1985
[4]

The individual ergodic theorem of information theory

Leo Breiman. The individual ergodic theorem of information theory. Annals of Mathematical Statistics , 28(3):809--811, 1957. doi: 10.1214/aoms/1177706678

work page doi:10.1214/aoms/1177706678 1957
[5]

(1961) Statistical Methods in Markov Chains.Ann

Leo Breiman. A correction to ``The individual ergodic theorem of information theory''. Annals of Mathematical Statistics , 31(3):809--810, 1960. doi: 10.1214/aoms/1177705812

work page doi:10.1214/aoms/1177705812 1960
[6]

Contributions to information theory for abstract alphabets

Alexandra Ionescu Tulcea. Contributions to information theory for abstract alphabets. Arkiv f \"o r Matematik , 4:235--247, 1961. doi: 10.1007/BF02592011

work page doi:10.1007/bf02592011 1961
[7]

Cover and Joy A

Thomas M. Cover and Joy A. Thomas. Elements of Information Theory . 2nd edition. John Wiley & Sons, 2006. doi: 10.1002/047174882X

work page doi:10.1002/047174882x 2006
[8]

A universal algorithm for sequential data compression.IEEE Trans

Abraham Lempel and Jacob Ziv. A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory , 23(3):337--343, 1977. doi: 10.1109/TIT.1977.1055714

work page doi:10.1109/tit.1977.1055714 1977
[9]

Compression of Individual Sequences via Variable-Rate Coding

Abraham Lempel and Jacob Ziv. Compression of Individual Sequences via Variable-Rate Coding. IEEE Transactions on Information Theory , 24(5):530--536, 1978. doi: 10.1109/TIT.1978.1055934

work page doi:10.1109/tit.1978.1055934 1978
[10]

The Annals of Mathe- matical Statistics22(1), 79–86 (1951) https://doi.org/10.1214/aoms/1177729694

Brockway McMillan. The basic theorems of information theory. Annals of Mathematical Statistics , 24(2):196--219, 1953. doi: 10.1214/aoms/1177729028

work page doi:10.1214/aoms/1177729028 1953
[11]

A measure of relative entropy between individual sequences with application to universal classification

Neri Merhav and Jacob Ziv. A measure of relative entropy between individual sequences with application to universal classification. IEEE Transactions on Information Theory , 39(4):1270--1279, 1993. doi: 10.1109/18.243437

work page doi:10.1109/18.243437 1993
[12]

Th \'e orie asymptotique des processus al \'e atoires faiblement d \'e pendants

Emmanuel Rio. Th \'e orie asymptotique des processus al \'e atoires faiblement d \'e pendants . Math \'e matiques & Applications, Vol. 31. Springer, 2000

2000
[13]

Real and Complex Analysis

Walter Rudin. Real and Complex Analysis . 3rd edition. McGraw--Hill, New York, 1987

1987
[14]

Paul C. Shields. The Ergodic Theory of Discrete Sample Paths . Graduate Studies in Mathematics, Vol. 13. American Mathematical Society, 1996

1996