Entropy-Rate Selection for Partially Observed Processes

Oleg Kiriukhin

arxiv: 2604.10752 · v1 · submitted 2026-04-12 · 💻 cs.IT · econ.EM· math.IT· math.PR· math.ST· stat.TH

Entropy-Rate Selection for Partially Observed Processes

Oleg Kiriukhin This is my paper

Pith reviewed 2026-05-10 15:28 UTC · model grok-4.3

classification 💻 cs.IT econ.EMmath.ITmath.PRmath.STstat.TH

keywords entropy rate maximizationpartially observed processesstationary hidden lawsobservational fiberMarkov extensionconditional mutual informationfinite memory processeshidden Markov models

0 comments

The pith

The entropy-rate maximizer among hidden stationary laws consistent with a visible law is the Markov extension of the visible block statistics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that, given only the statistics visible through a partial observation map, there exists a unique hidden stationary law that maximizes the entropy rate while reproducing exactly those visible statistics. This maximizer is obtained by completing the visible constraints to a full (r+1)-block probability distribution of highest entropy rate. A reader cares because the construction supplies a canonical, least-committal choice of hidden dynamics whenever only partial observations are available, directly controlling the remaining uncertainty for prediction or coding. Two explicit cases are settled: a fixed one-point marginal selects the i.i.d. process, while a fixed r-block law selects the (r-1)-step Markov process whose blocks match the visible law. The entropy-rate gap between any other feasible hidden law and this maximizer equals a conditional mutual information that vanishes precisely at the optimum.

Core claim

In the finite-state finite-memory setting, retained visible constraints determine a feasible class of stationary (r+1)-block laws, and the entropy maximizer is defined as the entropy-rate maximizer on this class. Existence and uniqueness are proved, the latter under a fixed-context-marginal hypothesis or via a strict-concavity characterization by row proportionality. Two global regimes are central: a fixed one-point marginal yields the i.i.d. maximizer, and a fixed r-block law yields the (r-1)-step Markov extension. The gap functional equals a conditional mutual information and vanishes exactly at the maximizing completion.

What carries the argument

the entropy-rate maximizer on the observational fiber of hidden stationary laws consistent with a given visible law

If this is right

When only the single-symbol marginal is fixed, the maximizer is the i.i.d. process with those marginals.
When the entire visible r-block law is fixed, the maximizer is the (r-1)-step Markov process whose r-block statistics match the visible law.
The entropy-rate difference between any other feasible hidden law and the maximizer equals the conditional mutual information between the current hidden state and the next visible symbol given the visible history.
A latent random-mapping realization of the maximizer exists that leaves the visible process unchanged.
The maximizer satisfies a local empirical consistency property with respect to visible samples.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The selection rule offers a variational principle for constructing minimal hidden generators from coarse-grained time series without having to enumerate full transition matrices.
It may extend to online settings where visible statistics are learned incrementally, yielding adaptive hidden models whose entropy rate tracks the empirical maximum.
One could test the framework by taking a known hidden Markov chain, applying an aliasing observation map, recovering the maximizer from visible data alone, and checking whether its entropy rate matches the original chain.

Load-bearing premise

The visible stationary law is generated by at least one hidden stationary law belonging to the finite-state finite-memory class, so that the feasible set of (r+1)-block distributions is non-empty.

What would settle it

A concrete small-state example in which two distinct hidden stationary laws generate the same visible r-block law yet achieve the same entropy rate without being related by row proportionality.

read the original abstract

I formulate an entropy-rate maximization problem at the observable level for stochastic processes observed through an information-reducing observation map. For a visible stationary law, the map determines an observational fiber of hidden stationary laws generating that law. In the finite-state finite-memory setting, retained visible constraints determine a feasible class of stationary $(r+1)$-block laws, and the entropy maximizer is defined as the entropy-rate maximizer on this class. The paper formulates entropy-rate maximization on feasible classes induced by partial observability and develops a structural theory for the resulting maximizer. I prove existence and uniqueness of the maximizer, with uniqueness under a fixed-context-marginal hypothesis and, more generally, via a strict-concavity characterization by row proportionality. Two global characterization regimes are central: a fixed one-point marginal yields the i.i.d. maximizer, and a fixed $r$-block law yields the $(r-1)$-step Markov extension. The gap functional equals a conditional mutual information and vanishes exactly at the maximizing completion. I also derive optimality conditions, local geometry of the maximizer, a latent random-mapping realization that leaves the visible law unchanged, and a local empirical consistency theorem, and illustrate the framework by an aliased hidden-state example.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames entropy-rate maximization over hidden completions consistent with a visible law, with clean global characterizations that look standard but useful.

read the letter

The main thing to know is that this work moves entropy-rate selection to the observable level for processes seen through an information-reducing map. It defines the problem over the fiber of hidden stationary laws that match a given visible law, then works in the finite-state finite-memory case where visible constraints fix a compact class of (r+1)-block laws. Existence of a maximizer follows from the usual compactness and continuity arguments. Uniqueness holds under a fixed-context-marginal condition or, more generally, when the maximizer is the unique row-proportional matrix in the class.

Referee Report

0 major / 2 minor

Summary. The paper formulates an entropy-rate maximization problem for stochastic processes observed through an information-reducing map. For a given visible stationary law, it defines an observational fiber of hidden stationary laws and a feasible class of stationary (r+1)-block laws induced by retained visible constraints and stationarity. The central results are proofs of existence and uniqueness of the entropy-rate maximizer on this class (uniqueness via fixed-context-marginal hypothesis or strict concavity by row proportionality), two global characterization regimes (fixed one-point marginal yields i.i.d. maximizer; fixed r-block law yields (r-1)-step Markov extension), identification of the gap functional as conditional mutual information that vanishes at the maximizer, plus optimality conditions, local geometry, a latent random-mapping realization preserving the visible law, a local empirical consistency theorem, and an aliased hidden-state illustration.

Significance. If the derivations hold, the work supplies a structural theory for entropy-rate selection under partial observability that extends classical maximum-entropy results to hidden processes. The explicit characterizations, the conditional-mutual-information gap identity, the latent random-mapping construction, and the empirical consistency theorem are concrete strengths that could inform both theoretical analysis and algorithmic design in information theory and hidden Markov modeling.

minor comments (2)

The abstract is information-dense; consider breaking the list of contributions into shorter sentences or a bulleted summary for improved readability.
Notation for the feasible class of (r+1)-block laws and the observational fiber should be introduced with a dedicated preliminary section or table to aid readers unfamiliar with the partial-observability setup.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and accurate summary of our work on entropy-rate maximization for partially observed processes, as well as the recommendation for minor revision. We are pleased that the structural contributions—existence and uniqueness results, global characterizations, the conditional mutual information gap identity, the latent random-mapping construction, and the local empirical consistency theorem—are viewed as concrete strengths.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper defines a feasible set of stationary (r+1)-block laws from visible constraints and stationarity consistency, then applies standard compactness/continuity/concavity arguments to establish existence of the entropy-rate maximizer. Uniqueness follows from strict-concavity characterizations (row proportionality or fixed-context-marginal hypothesis) that are derived directly from the geometry of the entropy functional on the probability simplex; these are independent mathematical facts, not reductions to fitted inputs. The identity that the gap functional equals conditional mutual information is obtained by algebraic expansion of definitions and vanishes at the maximizer by the definition of the optimization problem itself. No load-bearing self-citations, ansatzes smuggled via prior work, or renamings of known results appear in the derivation chain. The results are self-contained against external benchmarks such as classical entropy maximization for finite-alphabet processes.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on standard domain assumptions of stationarity and finite memory; no free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption Visible and hidden processes are stationary.
Invoked throughout the formulation of observational fibers and feasible classes of block laws.
domain assumption Finite-state finite-memory setting.
Used to define the feasible class of (r+1)-block laws and the entropy-rate maximizer.

pith-pipeline@v0.9.0 · 5520 in / 1380 out tokens · 29197 ms · 2026-05-10T15:28:17.811390+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

5 extracted references · 5 canonical work pages

[1]

Blackwell, Comparison of Experiments, inProceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, 1951, pp

D. Blackwell, Comparison of Experiments, inProceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, 1951, pp. 93–102

work page 1951
[2]

J. C. C. McKinsey, A. W. Marshall, and M. K. Gardner, A simple proof of Blackwell’s ‘Comparison of Experiments’ theorem,Journal of Economic Theory27(1982), 439–443

work page 1982
[3]

T. M. Cover and J. A. Thomas,Elements of Information Theory, 2nd ed., John Wiley & Sons, 2006

work page 2006
[4]

A. W. van der Vaart,Asymptotic Statistics, Cambridge University Press, 1998

work page 1998
[5]

Billingsley,Probability and Measure, 3rd ed., John Wiley & Sons, 1995

P. Billingsley,Probability and Measure, 3rd ed., John Wiley & Sons, 1995. 36

work page 1995

[1] [1]

Blackwell, Comparison of Experiments, inProceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, 1951, pp

D. Blackwell, Comparison of Experiments, inProceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, 1951, pp. 93–102

work page 1951

[2] [2]

J. C. C. McKinsey, A. W. Marshall, and M. K. Gardner, A simple proof of Blackwell’s ‘Comparison of Experiments’ theorem,Journal of Economic Theory27(1982), 439–443

work page 1982

[3] [3]

T. M. Cover and J. A. Thomas,Elements of Information Theory, 2nd ed., John Wiley & Sons, 2006

work page 2006

[4] [4]

A. W. van der Vaart,Asymptotic Statistics, Cambridge University Press, 1998

work page 1998

[5] [5]

Billingsley,Probability and Measure, 3rd ed., John Wiley & Sons, 1995

P. Billingsley,Probability and Measure, 3rd ed., John Wiley & Sons, 1995. 36

work page 1995