Recognition: unknown
Stability of the Shannon--McMillan--Breiman Theorem under Sublinear Parsings
Pith reviewed 2026-05-10 13:02 UTC · model grok-4.3
The pith
Sublinear data-dependent parsings preserve almost-sure convergence of normalized block log-likelihood sums to the entropy rate for any invariant measure.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For any shift-invariant probability measure P and any data-dependent parsing whose number of blocks is sublinear in N almost surely, the normalized sum of the negative log-likelihoods of the parsing blocks converges almost surely and in L1(P) to the entropy-rate function h_P. Equivalently, cylinder probabilities admit an approximate factorization under arbitrary sublinear parsings. The result persists under subextensive perturbations of the blocks, while sublinearity of the block count is the sharp threshold, as shown by an explicit counterexample for linear counts.
What carries the argument
A data-dependent parsing whose number of blocks is o(N) almost surely, applied to a shift-invariant measure P on the one-sided finite shift space, with the normalized sum of block negative log-likelihoods.
If this is right
- The convergence holds both almost surely and in L1 norm under the stated conditions.
- The result remains valid when the parsing blocks undergo subextensive perturbations.
- Cylinder probabilities factorize approximately under any such sublinear parsing.
- Sublinearity is a sharp threshold: linear block counts admit counterexamples where the convergence fails.
Where Pith is reading between the lines
- Flexible adaptive compression schemes could employ data-driven parsings with o(N) blocks while retaining asymptotic entropy-rate performance.
- Analogous stability statements may hold for other ergodic theorems when the underlying division of the sequence is sublinear.
- Numerical checks on Markov sources with controlled block-count growth rates could map the precise transition between convergent and non-convergent regimes.
Load-bearing premise
The parsing must produce a number of blocks that is sublinear in the sequence length N almost surely.
What would settle it
A concrete shift-invariant measure P together with a parsing that uses o(N) blocks almost surely, yet for which the normalized sum of negative log block probabilities fails to converge to h_P.
read the original abstract
We establish a stability result for the Shannon-McMillan-Breiman theorem on the one-sided finite shift space. For any shift-invariant probability measure P and any data-dependent parsing whose number of blocks is sublinear in N almost surely, we show that the normalized sum of the negative log-likelihoods of the parsing blocks converges almost surely and in L^1(P) to the entropy-rate function h_P. Equivalently, we obtain an approximate factorization of cylinder probabilities under arbitrary sublinear parsings. We further show that the stability result persists under subextensive perturbations of the parsing blocks, and that sublinearity of the block count is the sharp threshold for validity at this level of generality, via a direct counterexample.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript establishes a stability result for the Shannon-McMillan-Breiman theorem on the one-sided finite shift space. For any shift-invariant probability measure P and any data-dependent parsing with sublinear block count in N almost surely, the normalized sum of negative log-likelihoods of the blocks converges almost surely and in L^1 to the entropy rate h_P. It also provides an approximate factorization of cylinder probabilities, shows the result holds under subextensive perturbations of the blocks, and demonstrates that sublinearity is the sharp threshold by constructing a counterexample for linear block counts.
Significance. The result is significant because it generalizes the classical SMB theorem to a wide class of adaptive parsings that arise in practical information-theoretic settings such as compression and sequence modeling. The identification of sublinearity as the sharp condition, supported by an explicit counterexample, clarifies the limits of the stability and strengthens the theoretical contribution. The use of ergodic theory to handle the data-dependent nature without additional assumptions is a positive aspect, and the L^1 convergence provides a stronger mode of convergence than a.s. alone.
minor comments (2)
- [Abstract] The term 'subextensive perturbations' is introduced without a precise definition or reference; a short explanation in the abstract or introduction would enhance accessibility.
- [Counterexample section] The construction of the counterexample for linear block counts is key to the sharpness claim; ensuring it is fully detailed and self-contained would strengthen the paper.
Simulated Author's Rebuttal
We thank the referee for the positive summary and recommendation of minor revision. The assessment correctly captures the main contributions, including the almost-sure and L1 convergence under sublinear parsings, the approximate factorization, the persistence under subextensive perturbations, and the sharpness result via counterexample. No major comments requiring point-by-point rebuttal were listed in the report.
Circularity Check
No significant circularity
full rationale
The derivation applies the ergodic theorem directly to the normalized sum of negative log-likelihoods of parsing blocks, with the sublinear block-count condition (o(N) a.s.) ensuring boundary effects vanish in the limit. This is a standard extension of the Shannon-McMillan-Breiman theorem on shift-invariant measures; the proof does not reduce any claimed convergence to a fitted parameter, a self-referential definition, or a load-bearing self-citation. The sharpness counterexample for linear block counts is constructed explicitly and independently. No enumerated circularity pattern appears in the argument chain.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption P is a shift-invariant probability measure on the one-sided finite shift space
- domain assumption The parsing is data-dependent with block count sublinear in N almost surely
Reference graph
Works this paper leans on
-
[1]
On the Ziv--Merhav theorem beyond Markovianity I
Nicholas Barnfield, Rapha \"e l Grondin, Gaia Pozzoli, and Renaud Raqu \'e pas. On the Ziv--Merhav theorem beyond Markovianity I. Canadian Journal of Mathematics , 77(3):891--915, 2025. doi: 10.4153/S0008414X24000178
-
[2]
On the Ziv--Merhav theorem beyond Markovianity II: Leveraging the Thermodynamic Formalism
Nicholas Barnfield, Rapha \"e l Grondin, Gaia Pozzoli, and Renaud Raqu \'e pas. On the Ziv--Merhav theorem beyond Markovianity II: Leveraging the Thermodynamic Formalism. Annals of Applied Probability , 2026. To appear in Annals of Applied Probability
2026
-
[3]
Andrew R. Barron. The Strong Ergodic Theorem for Densities: Generalized Shannon--McMillan--Breiman Theorem. Annals of Probability , 13(4):1292--1303, 1985
1985
-
[4]
The individual ergodic theorem of information theory
Leo Breiman. The individual ergodic theorem of information theory. Annals of Mathematical Statistics , 28(3):809--811, 1957. doi: 10.1214/aoms/1177706678
-
[5]
(1961) Statistical Methods in Markov Chains.Ann
Leo Breiman. A correction to ``The individual ergodic theorem of information theory''. Annals of Mathematical Statistics , 31(3):809--810, 1960. doi: 10.1214/aoms/1177705812
-
[6]
Contributions to information theory for abstract alphabets
Alexandra Ionescu Tulcea. Contributions to information theory for abstract alphabets. Arkiv f \"o r Matematik , 4:235--247, 1961. doi: 10.1007/BF02592011
-
[7]
Thomas M. Cover and Joy A. Thomas. Elements of Information Theory . 2nd edition. John Wiley & Sons, 2006. doi: 10.1002/047174882X
-
[8]
A universal algorithm for sequential data compression.IEEE Trans
Abraham Lempel and Jacob Ziv. A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory , 23(3):337--343, 1977. doi: 10.1109/TIT.1977.1055714
-
[9]
Compression of Individual Sequences via Variable-Rate Coding
Abraham Lempel and Jacob Ziv. Compression of Individual Sequences via Variable-Rate Coding. IEEE Transactions on Information Theory , 24(5):530--536, 1978. doi: 10.1109/TIT.1978.1055934
-
[10]
The Annals of Mathe- matical Statistics22(1), 79–86 (1951) https://doi.org/10.1214/aoms/1177729694
Brockway McMillan. The basic theorems of information theory. Annals of Mathematical Statistics , 24(2):196--219, 1953. doi: 10.1214/aoms/1177729028
-
[11]
Neri Merhav and Jacob Ziv. A measure of relative entropy between individual sequences with application to universal classification. IEEE Transactions on Information Theory , 39(4):1270--1279, 1993. doi: 10.1109/18.243437
-
[12]
Th \'e orie asymptotique des processus al \'e atoires faiblement d \'e pendants
Emmanuel Rio. Th \'e orie asymptotique des processus al \'e atoires faiblement d \'e pendants . Math \'e matiques & Applications, Vol. 31. Springer, 2000
2000
-
[13]
Real and Complex Analysis
Walter Rudin. Real and Complex Analysis . 3rd edition. McGraw--Hill, New York, 1987
1987
-
[14]
Paul C. Shields. The Ergodic Theory of Discrete Sample Paths . Graduate Studies in Mathematics, Vol. 13. American Mathematical Society, 1996
1996
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.