pith. machine review for the scientific record. sign in

arxiv: 2604.13467 · v1 · submitted 2026-04-15 · 💻 cs.IT · math.IT

Recognition: unknown

Stability of the Shannon--McMillan--Breiman Theorem under Sublinear Parsings

Raphael Grondin

Authors on Pith no claims yet

Pith reviewed 2026-05-10 13:02 UTC · model grok-4.3

classification 💻 cs.IT math.IT
keywords Shannon-McMillan-Breiman theorementropy ratesublinear parsingshift-invariant measurestabilitycylinder probabilityinformation theoryergodic theory
0
0 comments X

The pith

Sublinear data-dependent parsings preserve almost-sure convergence of normalized block log-likelihood sums to the entropy rate for any invariant measure.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes stability of the Shannon-McMillan-Breiman theorem for sequences on the one-sided shift space. Given any shift-invariant probability measure P, if a parsing divides the length-N string into a number of blocks that remains sublinear in N almost surely, then the sum of the negative log-likelihoods of those blocks divided by N converges almost surely and in L1 to the entropy rate h_P. This statement is equivalent to an approximate factorization property for the cylinder probabilities under the same parsings. The stability extends to subextensive changes in the chosen blocks, and the sublinearity condition cannot be relaxed because a direct counterexample shows failure when the block count grows linearly.

Core claim

For any shift-invariant probability measure P and any data-dependent parsing whose number of blocks is sublinear in N almost surely, the normalized sum of the negative log-likelihoods of the parsing blocks converges almost surely and in L1(P) to the entropy-rate function h_P. Equivalently, cylinder probabilities admit an approximate factorization under arbitrary sublinear parsings. The result persists under subextensive perturbations of the blocks, while sublinearity of the block count is the sharp threshold, as shown by an explicit counterexample for linear counts.

What carries the argument

A data-dependent parsing whose number of blocks is o(N) almost surely, applied to a shift-invariant measure P on the one-sided finite shift space, with the normalized sum of block negative log-likelihoods.

If this is right

  • The convergence holds both almost surely and in L1 norm under the stated conditions.
  • The result remains valid when the parsing blocks undergo subextensive perturbations.
  • Cylinder probabilities factorize approximately under any such sublinear parsing.
  • Sublinearity is a sharp threshold: linear block counts admit counterexamples where the convergence fails.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Flexible adaptive compression schemes could employ data-driven parsings with o(N) blocks while retaining asymptotic entropy-rate performance.
  • Analogous stability statements may hold for other ergodic theorems when the underlying division of the sequence is sublinear.
  • Numerical checks on Markov sources with controlled block-count growth rates could map the precise transition between convergent and non-convergent regimes.

Load-bearing premise

The parsing must produce a number of blocks that is sublinear in the sequence length N almost surely.

What would settle it

A concrete shift-invariant measure P together with a parsing that uses o(N) blocks almost surely, yet for which the normalized sum of negative log block probabilities fails to converge to h_P.

read the original abstract

We establish a stability result for the Shannon-McMillan-Breiman theorem on the one-sided finite shift space. For any shift-invariant probability measure P and any data-dependent parsing whose number of blocks is sublinear in N almost surely, we show that the normalized sum of the negative log-likelihoods of the parsing blocks converges almost surely and in L^1(P) to the entropy-rate function h_P. Equivalently, we obtain an approximate factorization of cylinder probabilities under arbitrary sublinear parsings. We further show that the stability result persists under subextensive perturbations of the parsing blocks, and that sublinearity of the block count is the sharp threshold for validity at this level of generality, via a direct counterexample.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript establishes a stability result for the Shannon-McMillan-Breiman theorem on the one-sided finite shift space. For any shift-invariant probability measure P and any data-dependent parsing with sublinear block count in N almost surely, the normalized sum of negative log-likelihoods of the blocks converges almost surely and in L^1 to the entropy rate h_P. It also provides an approximate factorization of cylinder probabilities, shows the result holds under subextensive perturbations of the blocks, and demonstrates that sublinearity is the sharp threshold by constructing a counterexample for linear block counts.

Significance. The result is significant because it generalizes the classical SMB theorem to a wide class of adaptive parsings that arise in practical information-theoretic settings such as compression and sequence modeling. The identification of sublinearity as the sharp condition, supported by an explicit counterexample, clarifies the limits of the stability and strengthens the theoretical contribution. The use of ergodic theory to handle the data-dependent nature without additional assumptions is a positive aspect, and the L^1 convergence provides a stronger mode of convergence than a.s. alone.

minor comments (2)
  1. [Abstract] The term 'subextensive perturbations' is introduced without a precise definition or reference; a short explanation in the abstract or introduction would enhance accessibility.
  2. [Counterexample section] The construction of the counterexample for linear block counts is key to the sharpness claim; ensuring it is fully detailed and self-contained would strengthen the paper.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary and recommendation of minor revision. The assessment correctly captures the main contributions, including the almost-sure and L1 convergence under sublinear parsings, the approximate factorization, the persistence under subextensive perturbations, and the sharpness result via counterexample. No major comments requiring point-by-point rebuttal were listed in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The derivation applies the ergodic theorem directly to the normalized sum of negative log-likelihoods of parsing blocks, with the sublinear block-count condition (o(N) a.s.) ensuring boundary effects vanish in the limit. This is a standard extension of the Shannon-McMillan-Breiman theorem on shift-invariant measures; the proof does not reduce any claimed convergence to a fitted parameter, a self-referential definition, or a load-bearing self-citation. The sharpness counterexample for linear block counts is constructed explicitly and independently. No enumerated circularity pattern appears in the argument chain.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The claim rests on standard ergodic-theory assumptions for stationary processes on finite alphabets; no free parameters, new entities, or ad-hoc axioms are introduced beyond the sublinearity condition that is proved sharp.

axioms (2)
  • domain assumption P is a shift-invariant probability measure on the one-sided finite shift space
    Invoked throughout the abstract as the setting for the SMB theorem.
  • domain assumption The parsing is data-dependent with block count sublinear in N almost surely
    Central hypothesis whose sharpness is demonstrated by counterexample.

pith-pipeline@v0.9.0 · 5411 in / 1273 out tokens · 38186 ms · 2026-05-10T13:02:18.328707+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

14 extracted references · 9 canonical work pages

  1. [1]

    On the Ziv--Merhav theorem beyond Markovianity I

    Nicholas Barnfield, Rapha \"e l Grondin, Gaia Pozzoli, and Renaud Raqu \'e pas. On the Ziv--Merhav theorem beyond Markovianity I. Canadian Journal of Mathematics , 77(3):891--915, 2025. doi: 10.4153/S0008414X24000178

  2. [2]

    On the Ziv--Merhav theorem beyond Markovianity II: Leveraging the Thermodynamic Formalism

    Nicholas Barnfield, Rapha \"e l Grondin, Gaia Pozzoli, and Renaud Raqu \'e pas. On the Ziv--Merhav theorem beyond Markovianity II: Leveraging the Thermodynamic Formalism. Annals of Applied Probability , 2026. To appear in Annals of Applied Probability

  3. [3]

    Andrew R. Barron. The Strong Ergodic Theorem for Densities: Generalized Shannon--McMillan--Breiman Theorem. Annals of Probability , 13(4):1292--1303, 1985

  4. [4]

    The individual ergodic theorem of information theory

    Leo Breiman. The individual ergodic theorem of information theory. Annals of Mathematical Statistics , 28(3):809--811, 1957. doi: 10.1214/aoms/1177706678

  5. [5]

    (1961) Statistical Methods in Markov Chains.Ann

    Leo Breiman. A correction to ``The individual ergodic theorem of information theory''. Annals of Mathematical Statistics , 31(3):809--810, 1960. doi: 10.1214/aoms/1177705812

  6. [6]

    Contributions to information theory for abstract alphabets

    Alexandra Ionescu Tulcea. Contributions to information theory for abstract alphabets. Arkiv f \"o r Matematik , 4:235--247, 1961. doi: 10.1007/BF02592011

  7. [7]

    Cover and Joy A

    Thomas M. Cover and Joy A. Thomas. Elements of Information Theory . 2nd edition. John Wiley & Sons, 2006. doi: 10.1002/047174882X

  8. [8]

    A universal algorithm for sequential data compression.IEEE Trans

    Abraham Lempel and Jacob Ziv. A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory , 23(3):337--343, 1977. doi: 10.1109/TIT.1977.1055714

  9. [9]

    Compression of Individual Sequences via Variable-Rate Coding

    Abraham Lempel and Jacob Ziv. Compression of Individual Sequences via Variable-Rate Coding. IEEE Transactions on Information Theory , 24(5):530--536, 1978. doi: 10.1109/TIT.1978.1055934

  10. [10]

    The Annals of Mathe- matical Statistics22(1), 79–86 (1951) https://doi.org/10.1214/aoms/1177729694

    Brockway McMillan. The basic theorems of information theory. Annals of Mathematical Statistics , 24(2):196--219, 1953. doi: 10.1214/aoms/1177729028

  11. [11]

    A measure of relative entropy between individual sequences with application to universal classification

    Neri Merhav and Jacob Ziv. A measure of relative entropy between individual sequences with application to universal classification. IEEE Transactions on Information Theory , 39(4):1270--1279, 1993. doi: 10.1109/18.243437

  12. [12]

    Th \'e orie asymptotique des processus al \'e atoires faiblement d \'e pendants

    Emmanuel Rio. Th \'e orie asymptotique des processus al \'e atoires faiblement d \'e pendants . Math \'e matiques & Applications, Vol. 31. Springer, 2000

  13. [13]

    Real and Complex Analysis

    Walter Rudin. Real and Complex Analysis . 3rd edition. McGraw--Hill, New York, 1987

  14. [14]

    Paul C. Shields. The Ergodic Theory of Discrete Sample Paths . Graduate Studies in Mathematics, Vol. 13. American Mathematical Society, 1996