Entropy-Rate Selection for Partially Observed Processes
Pith reviewed 2026-05-10 15:28 UTC · model grok-4.3
The pith
The entropy-rate maximizer among hidden stationary laws consistent with a visible law is the Markov extension of the visible block statistics.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In the finite-state finite-memory setting, retained visible constraints determine a feasible class of stationary (r+1)-block laws, and the entropy maximizer is defined as the entropy-rate maximizer on this class. Existence and uniqueness are proved, the latter under a fixed-context-marginal hypothesis or via a strict-concavity characterization by row proportionality. Two global regimes are central: a fixed one-point marginal yields the i.i.d. maximizer, and a fixed r-block law yields the (r-1)-step Markov extension. The gap functional equals a conditional mutual information and vanishes exactly at the maximizing completion.
What carries the argument
the entropy-rate maximizer on the observational fiber of hidden stationary laws consistent with a given visible law
If this is right
- When only the single-symbol marginal is fixed, the maximizer is the i.i.d. process with those marginals.
- When the entire visible r-block law is fixed, the maximizer is the (r-1)-step Markov process whose r-block statistics match the visible law.
- The entropy-rate difference between any other feasible hidden law and the maximizer equals the conditional mutual information between the current hidden state and the next visible symbol given the visible history.
- A latent random-mapping realization of the maximizer exists that leaves the visible process unchanged.
- The maximizer satisfies a local empirical consistency property with respect to visible samples.
Where Pith is reading between the lines
- The selection rule offers a variational principle for constructing minimal hidden generators from coarse-grained time series without having to enumerate full transition matrices.
- It may extend to online settings where visible statistics are learned incrementally, yielding adaptive hidden models whose entropy rate tracks the empirical maximum.
- One could test the framework by taking a known hidden Markov chain, applying an aliasing observation map, recovering the maximizer from visible data alone, and checking whether its entropy rate matches the original chain.
Load-bearing premise
The visible stationary law is generated by at least one hidden stationary law belonging to the finite-state finite-memory class, so that the feasible set of (r+1)-block distributions is non-empty.
What would settle it
A concrete small-state example in which two distinct hidden stationary laws generate the same visible r-block law yet achieve the same entropy rate without being related by row proportionality.
read the original abstract
I formulate an entropy-rate maximization problem at the observable level for stochastic processes observed through an information-reducing observation map. For a visible stationary law, the map determines an observational fiber of hidden stationary laws generating that law. In the finite-state finite-memory setting, retained visible constraints determine a feasible class of stationary $(r+1)$-block laws, and the entropy maximizer is defined as the entropy-rate maximizer on this class. The paper formulates entropy-rate maximization on feasible classes induced by partial observability and develops a structural theory for the resulting maximizer. I prove existence and uniqueness of the maximizer, with uniqueness under a fixed-context-marginal hypothesis and, more generally, via a strict-concavity characterization by row proportionality. Two global characterization regimes are central: a fixed one-point marginal yields the i.i.d. maximizer, and a fixed $r$-block law yields the $(r-1)$-step Markov extension. The gap functional equals a conditional mutual information and vanishes exactly at the maximizing completion. I also derive optimality conditions, local geometry of the maximizer, a latent random-mapping realization that leaves the visible law unchanged, and a local empirical consistency theorem, and illustrate the framework by an aliased hidden-state example.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper formulates an entropy-rate maximization problem for stochastic processes observed through an information-reducing map. For a given visible stationary law, it defines an observational fiber of hidden stationary laws and a feasible class of stationary (r+1)-block laws induced by retained visible constraints and stationarity. The central results are proofs of existence and uniqueness of the entropy-rate maximizer on this class (uniqueness via fixed-context-marginal hypothesis or strict concavity by row proportionality), two global characterization regimes (fixed one-point marginal yields i.i.d. maximizer; fixed r-block law yields (r-1)-step Markov extension), identification of the gap functional as conditional mutual information that vanishes at the maximizer, plus optimality conditions, local geometry, a latent random-mapping realization preserving the visible law, a local empirical consistency theorem, and an aliased hidden-state illustration.
Significance. If the derivations hold, the work supplies a structural theory for entropy-rate selection under partial observability that extends classical maximum-entropy results to hidden processes. The explicit characterizations, the conditional-mutual-information gap identity, the latent random-mapping construction, and the empirical consistency theorem are concrete strengths that could inform both theoretical analysis and algorithmic design in information theory and hidden Markov modeling.
minor comments (2)
- The abstract is information-dense; consider breaking the list of contributions into shorter sentences or a bulleted summary for improved readability.
- Notation for the feasible class of (r+1)-block laws and the observational fiber should be introduced with a dedicated preliminary section or table to aid readers unfamiliar with the partial-observability setup.
Simulated Author's Rebuttal
We thank the referee for the positive and accurate summary of our work on entropy-rate maximization for partially observed processes, as well as the recommendation for minor revision. We are pleased that the structural contributions—existence and uniqueness results, global characterizations, the conditional mutual information gap identity, the latent random-mapping construction, and the local empirical consistency theorem—are viewed as concrete strengths.
Circularity Check
No significant circularity detected
full rationale
The paper defines a feasible set of stationary (r+1)-block laws from visible constraints and stationarity consistency, then applies standard compactness/continuity/concavity arguments to establish existence of the entropy-rate maximizer. Uniqueness follows from strict-concavity characterizations (row proportionality or fixed-context-marginal hypothesis) that are derived directly from the geometry of the entropy functional on the probability simplex; these are independent mathematical facts, not reductions to fitted inputs. The identity that the gap functional equals conditional mutual information is obtained by algebraic expansion of definitions and vanishes at the maximizer by the definition of the optimization problem itself. No load-bearing self-citations, ansatzes smuggled via prior work, or renamings of known results appear in the derivation chain. The results are self-contained against external benchmarks such as classical entropy maximization for finite-alphabet processes.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Visible and hidden processes are stationary.
- domain assumption Finite-state finite-memory setting.
Reference graph
Works this paper leans on
-
[1]
D. Blackwell, Comparison of Experiments, inProceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, 1951, pp. 93–102
work page 1951
-
[2]
J. C. C. McKinsey, A. W. Marshall, and M. K. Gardner, A simple proof of Blackwell’s ‘Comparison of Experiments’ theorem,Journal of Economic Theory27(1982), 439–443
work page 1982
-
[3]
T. M. Cover and J. A. Thomas,Elements of Information Theory, 2nd ed., John Wiley & Sons, 2006
work page 2006
-
[4]
A. W. van der Vaart,Asymptotic Statistics, Cambridge University Press, 1998
work page 1998
-
[5]
Billingsley,Probability and Measure, 3rd ed., John Wiley & Sons, 1995
P. Billingsley,Probability and Measure, 3rd ed., John Wiley & Sons, 1995. 36
work page 1995
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.