The Geometry of Statistical Data and Information: A Large Deviation Perspective

Hong Qian; Viswa Virinchi Muppirala

arxiv: 2501.01556 · v3 · submitted 2025-01-02 · 💻 cs.IT · math.IT

The Geometry of Statistical Data and Information: A Large Deviation Perspective

Viswa Virinchi Muppirala , Hong Qian This is my paper

Pith reviewed 2026-05-23 06:20 UTC · model grok-4.3

classification 💻 cs.IT math.IT

keywords information geometrylarge deviation theoryempirical meansinformation projectionKolmogorov probabilityi.i.d. assumptionMarkovian assumption

0 comments

The pith

The information projection from divergence minimization coincides with the projection in Kolmogorov probability theory under both i.i.d. and Markov assumptions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that the manifold of empirical mean values carries a geometry whose shape is set by the probability measure through large-deviation entropy functions. These functions are used to describe the space of data rather than the space of distributions. The central result is that the information projection obtained by minimizing divergence matches the information projection defined in Kolmogorov probability theory. The match is proved for sequences that are independent and identically distributed and for sequences that obey a Markov chain.

Core claim

The information projection defined in information geometry as divergence minimization coincides with the information projection in Kolmogorov's probability theory under both i.i.d. and Markovian assumptions.

What carries the argument

Large-deviation rate functions (entropy functions) that equip the manifold of empirical means with a Riemannian geometry.

If this is right

Fisher-Rao spherical geometry appears only for singleton frequencies under the i.i.d. case and fails for pairwise statistics.
The governing probability measure itself curves the space of empirical data.
The identification places information geometry inside the measure-theoretic foundations of probability.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same entropy-based geometry could be examined for stationary processes that are neither i.i.d. nor Markov.
If the projections remain identical, the construction supplies a concrete Riemannian metric on empirical-mean manifolds for any ergodic source.

Load-bearing premise

Large-deviation rate functions can be interpreted directly as defining a Riemannian geometry on the manifold of empirical means without extra regularity conditions on the probability measure.

What would settle it

An explicit pair of distributions and a Markov chain for which the divergence-minimizing projection differs from the Kolmogorov information projection.

read the original abstract

The manifold of empirical mean values of statistical data ad infinitum has a geometric shape that depends on the probability measure that governs the generating model. Large deviation theory produces entropy functions that depend on both the probability measure and the statistical data; we use entropy to study the geometry of the data space rather than that of the space of probability distributions. It is well known, since Rao's work, that the Fisher-Rao metric makes the probability simplex into a sphere. From our perspective, that result translates to the space of empirical singleton counting frequencies under an i.i.d. assumption. Following our ideas and going beyond i.i.d., the choice of measure curves the space. When we study the pairwise statistics, the spherical geometry breaks down entirely. We show that the information projection, defined in information geometry as divergence minimization, coincides with the information projection in Kolmogorov's probability theory. This identification holds under both i.i.d. and Markovian assumptions and connects information geometry to the foundations of probability theory.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shifts focus to geometry on empirical data manifolds via large deviations and claims the I-projection matches Kolmogorov's version under i.i.d. and Markov assumptions, with the pairwise breakdown as the clearest addition.

read the letter

The main thing here is the reframing: instead of the usual probability simplex, the authors use large deviation rate functions to put geometry on the manifold of empirical means, and they say this makes the information projection from divergence minimization line up with the one from classical probability theory. That identification is stated for both i.i.d. and Markov cases. The spherical geometry from Rao's Fisher-Rao metric only holds for singleton frequencies under independence and breaks for pairwise statistics, which is a direct observation from the setup. The Markov extension applies the same minimization to the empirical pair measure once the large deviation principle is in place. This is internally consistent and avoids circularity by resting on external definitions rather than fitted parameters. The paper does a clean job of connecting the frameworks without extra machinery. The i.i.d. part is basically the standard Sanov identification, so that part is not new. The Markov claim follows from known LDP results for pairs, which makes the novelty more about the perspective than a fresh derivation. Without explicit steps or a worked example in the visible material, it is hard to judge how much the geometry change actually requires new regularity conditions. This is for readers who already work at the overlap of large deviations, information geometry, and probability foundations. Someone looking for conceptual links or a different angle on data manifolds would get value; a reader wanting sharp new theorems or numerical checks would find it thin. It deserves peer review because the central identification is coherent on its own terms and the Markov angle adds a modest but clear point worth checking in detail.

Referee Report

0 major / 3 minor

Summary. The paper claims that the manifold of empirical mean values of statistical data has a geometric shape determined by the governing probability measure. Large deviation theory yields entropy functions depending on both the measure and data, which are used to study the geometry of the data space. It asserts that the information projection defined via divergence minimization in information geometry coincides with the information projection in Kolmogorov's probability theory, holding under both i.i.d. and Markovian assumptions. This connects information geometry to probability foundations, with the Fisher-Rao metric yielding spherical geometry for i.i.d. singleton frequencies but breaking down for pairwise statistics under Markov assumptions.

Significance. If the identification holds, the manuscript bridges information geometry and the foundations of probability by reinterpreting large-deviation rate functions as defining geometry on empirical data manifolds rather than probability distributions. The Markovian extension, showing breakdown of spherical geometry, is a substantive step beyond standard i.i.d. results such as Sanov's theorem. The work explicitly builds on Rao's Fisher-Rao metric without introducing free parameters, ad-hoc axioms, or invented entities. The potential concern about regularity conditions on the probability measure for a Riemannian interpretation does not appear to undermine the central claim, as the large-deviation principle is taken as granted and the identification remains internally consistent.

minor comments (3)

[Abstract] Abstract: the phrase 'ad infinitum' is imprecise; replace with a clearer description of the limiting regime for empirical means.
[Abstract] Abstract: the claim that spherical geometry 'breaks down entirely' for pairwise statistics would be strengthened by a brief concrete indication of the deviation (e.g., a specific rate-function property) in the Markov case.
The connection to Kolmogorov's probability theory would benefit from citing one or two specific foundational results or theorems rather than a general reference.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their careful reading and positive evaluation of the manuscript. We are pleased that the significance of connecting large-deviation rate functions to the geometry of empirical data manifolds, and the identification of information projections under both i.i.d. and Markov assumptions, has been recognized. We will incorporate any minor revisions as appropriate.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's core claim equates the I-projection from information geometry (KL minimization) with a Kolmogorov-style projection via large-deviation rate functions. This identification for the i.i.d. case follows directly from the standard Sanov theorem, an external result. The Markovian extension applies the same rate-function minimization to pairwise empirical measures once the LDP for the pair measure is granted. No quoted equations reduce a prediction to a fitted parameter, no self-citation chain bears the central load, and no ansatz is smuggled in. The derivation is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard results from large deviation theory and information geometry; no new free parameters, ad-hoc axioms, or invented entities are introduced in the abstract.

axioms (2)

domain assumption Large deviation principle holds for the empirical measures under the stated i.i.d. and Markov assumptions
Invoked to produce the entropy functions that define the geometry
standard math The Fisher-Rao metric on the probability simplex is known to be spherical (Rao)
Used as the baseline that translates to singleton i.i.d. data

pith-pipeline@v0.9.0 · 5699 in / 1390 out tokens · 24068 ms · 2026-05-23T06:20:23.012996+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Theorem 7 (Information Projection). ... E[ν | Fx] = arg inf ν∈ri(Δn) {S(ν|p) | ∑νixi=x}
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_injective echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

S(ν|p)=∑νi log(νi/pi) ... rate function of Sanov’s theorem
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean costAlphaLog_high_calibrated_iff refines

?

refines
Relation between the paper passage and the cited Recognition theorem.

Legendre-Fenchel transform ... free energy F(μ|p)=log∑pieμi

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages

[1]

A. N. Kolmogoroff, Grundbegriffe der Wahrscheinlichkeitsrechnung. New York: Springer, 1933

work page 1933
[2]

Choquet-Bruhat, C

Y . Choquet-Bruhat, C. DeWitt-Morette, and M. Dillard-Bleick, Analysis, Manifolds and Physics, Part I: Basics , 2nd ed. Amsterdam: Elsevier, 1982

work page 1982
[3]

Amari, Information Geometry and Its Applications

S.-I. Amari, Information Geometry and Its Applications . New York: Springer, 2016

work page 2016
[4]

Amari, Differential-Geometrical Methods in Statistics , ser

S.-I. Amari, Differential-Geometrical Methods in Statistics , ser. Lecture Notes in Statistics. New York: Springer, 1990

work page 1990
[5]

Information and the accuracy attainable in the estimation of statistical parameters,

C. R. Rao, “Information and the accuracy attainable in the estimation of statistical parameters,” in Breakthroughs in Statistics , ser. Springer Series in Statistics, S. Kotz and N. L. Johnson, Eds. New York: Springer, 1992, pp. 235–247

work page 1992
[6]

When optimal transport meets information geometry,

G. Khan and J. Zhang, “When optimal transport meets information geometry,” Information Geometry, vol. 5, pp. 47–78, 2022

work page 2022
[7]

Letters to the editor,

J. Dickey, N. T. Gridgeman, M. C. S. Kingsley, I. J. Good, J. E. Carlson, D. Gianola, M. H. Kutner, and S. Selvin, “Letters to the editor,” The American Statistician, vol. 29, no. 3, pp. 131–134, 1975

work page 1975
[8]

Letters to the editor,

S. Selvin, M. Bloxham, A. I. Khuri, M. Moore, R. Coleman, G. R. Bryce, J. A. Hagans, T. C. Chalmers, E. A. Maxwell, and G. N. Smith, “Letters to the editor,” The American Statistician , vol. 29, no. 1, pp. 67–71, 1975

work page 1975
[9]

Formulering van het ‘som-en-product’-probleem,

H. Freudenthal, “Formulering van het ‘som-en-product’-probleem,” Nieuw Archief voor Wiskunde , vol. 17, no. 3, p. 152, 1969

work page 1969
[10]

Pride of problems, including one that is virtually impossible,

M. Gardner, “Pride of problems, including one that is virtually impossible,” Scientific American, vol. 241, no. 6, p. 22, 1979

work page 1979
[11]

Baclawski, Introduction to Probability with R

K. Baclawski, Introduction to Probability with R . Chapman and Hall/CRC, 2008

work page 2008
[12]

Internal energy, fundamental thermodynamic relation, and Gibbs’ ensemble theory as emergent laws of statistical counting,

H. Qian, “Internal energy, fundamental thermodynamic relation, and Gibbs’ ensemble theory as emergent laws of statistical counting,” Entropy, vol. 26, p. 1091, 2024

work page 2024
[13]

Ambrosio, N

L. Ambrosio, N. Gigli, and G. Savare, Gradient flows, 2nd ed., ser. Lectures in Mathematics. ETH Z ¨urich. Basel, Switzerland: Birkhauser Verlag AG, Dec. 2008

work page 2008
[14]

Villani, Optimal Transport, ser

C. Villani, Optimal Transport, ser. Grundlehren der mathematischen Wissenschaften. Berlin, Germany: Springer, Dec. 2009

work page 2009
[15]

Information geometry of the EM and em algorithms for neural networks,

S.-I. Amari, “Information geometry of the EM and em algorithms for neural networks,” Neural Networks, vol. 8, no. 9, pp. 1379–1408, 1995

work page 1995
[16]

E. T. Jaynes, Probability Theory: The Logic of Science . London, U.K.: Cambridge University Press, 2003

work page 2003
[17]

S. H. Strogatz, Nonlinear Dynamics and Chaos With Applications to Physics, Biology, Chemistry, and Engineering , 2nd ed. Boca Raton: CRC Press, 2015

work page 2015
[18]

A. W. van der Vaart and J. A. Wellner, Weak Convergence and Empirical Processes . New York: Springer, 1996

work page 1996
[19]

Qian and H

H. Qian and H. Ge, Stochastic Chemical Reaction Systems in Biology . Cham, Switzerland: Springer Nature, 2021

work page 2021
[20]

H. B. Callen, Thermodynamics and an Introduction to Thermostatistics , 2nd ed. New York: Wiley, 1991

work page 1991
[21]

On thermodynamic information,

B. Miao, H. Qian, and Y .-S. Wu, “On thermodynamic information,” arXiv:2312.03454, 2023

work page arXiv 2023
[22]

More is different: Broken symmetry and the nature of the hierarchical structure of science,

P. W. Anderson, “More is different: Broken symmetry and the nature of the hierarchical structure of science,” Science, vol. 177, pp. 393–396, 1972

work page 1972
[23]

Counting single cells and computing their heterogeneity: From phenotypic frequencies to mean value of a quantitative biomarker,

H. Qian and Y .-C. Cheng, “Counting single cells and computing their heterogeneity: From phenotypic frequencies to mean value of a quantitative biomarker,” Quant. Biol., vol. 8, pp. 172–176, 2020

work page 2020
[24]

Statistical analysis of random motion and energetic behavior of counting: Gibbs’ theory revisited,

E. Angelini and H. Qian, “Statistical analysis of random motion and energetic behavior of counting: Gibbs’ theory revisited,” J. Phys. Chem. B , vol. 127, pp. 2552–2564, 2023

work page 2023
[25]

Further studies on the thermal equilibrium of gas molecules,

L. Boltzmann, “Further studies on the thermal equilibrium of gas molecules,” in The Kinetic Theory of Gases: An Anthology of Classic Papers with Historical Commentary, S. G. Brush and N. S. Hall, Eds. Singapore: World Scientific, 2003, pp. 262–349

work page 2003
[26]

Planck, The Theory of Heat Radiation

M. Planck, The Theory of Heat Radiation . Blakiston, 1914

work page 1914
[27]

A mathematical theory of communication,

C. E. Shannon, “A mathematical theory of communication,” The Bell System Technical Journal , vol. 27, no. 3, pp. 379–423, 1948

work page 1948
[28]

A. Y . Khinchin, Mathematical Foundations of Information Theory . Dover, 1957

work page 1957
[29]

Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy,

J. Shore and R. Johnson, “Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy,” IEEE Transactions on Information Theory , vol. 26, no. 1, pp. 26–37, 1980. 17

work page 1980
[30]

On the probability of large deviations of random variables,

I. N. Sanov, “On the probability of large deviations of random variables,” Selected Translations in Mathematical Statistics and Probability , vol. 1, pp. 213–244, 1961

work page 1961
[31]

Dembo and O

A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications , 2nd ed. New York: Springer, 1998

work page 1998
[32]

D. A. Kappos, Probability algebras and stochastic spaces . Academic Press, 2014, vol. 7

work page 2014
[33]

Twelve problems in probability no one likes to bring up,

G.-C. Rota, “Twelve problems in probability no one likes to bring up,” in Algebraic Combinatorics and Computer Science: A Tribute to Gian-Carlo Rota. Springer, 2001, pp. 57–93

work page 2001
[34]

R ´enyi, Probability Theory

A. R ´enyi, Probability Theory. Courier Corporation, 2007

work page 2007
[35]

Durrett, Probability: Theory and Examples

R. Durrett, Probability: Theory and Examples . Cambridge university press, 2019

work page 2019
[36]

Stochastic calculus, filtering, and stochastic control,

R. Van Handel, “Stochastic calculus, filtering, and stochastic control,” Course notes., URL http://www. princeton. edu/rvan/acm217/ACM217. pdf, vol. 14, 2007

work page 2007
[37]

M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information . Cambridge: Cambridge University Press, 2010

work page 2010
[38]

Baym, Lectures On Quantum Mechanics

G. Baym, Lectures On Quantum Mechanics . CRC Press, 1969

work page 1969
[39]

Importance Sampling in the Monte Carlo Study of Sequential Tests,

D. Siegmund, “Importance Sampling in the Monte Carlo Study of Sequential Tests,” The Annals of Statistics , vol. 4, no. 4, pp. 673 – 684, 1976

work page 1976
[40]

T. L. Hill, Statistical Mechanics: Principles and Selected Applications . New York: McGraw-Hill, 1956

work page 1956
[41]

R. T. Rockafellar, Convex Analysis. Princeton: Princeton University Press, 1970

work page 1970
[42]

Clustering with Bregman divergences,

A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh, “Clustering with Bregman divergences,” Journal of Machine Learning Research , vol. 6, no. 58, pp. 1705–1749, 2005

work page 2005
[43]

J. W. Gibbs, The Collected Works of J. Willard Gibbs . New Haven, CT: Yale Univ. Press, 1948

work page 1948
[44]

¨Uber die ausdehnung der ph ¨anomenologschen thermodynamik auf die schwankungserscheinungen,

L. Szilard, “ ¨Uber die ausdehnung der ph ¨anomenologschen thermodynamik auf die schwankungserscheinungen,” Zeitschrift f ¨ur Physik , vol. 32, pp. 753–7888, 1925

work page 1925
[45]

J. M. Lee, Introduction to Smooth Manifolds , ser. Graduate Texts in Mathematics. New York: Springer, 2002

work page 2002
[46]

T. M. Cover and J. A. Thomas, Elements of Information Theory , 2nd ed. New York: Wiley-Interscience, 2006

work page 2006
[47]

Gr ¨unbaum, V

B. Gr ¨unbaum, V . Klee, M. A. Perles, and G. C. Shephard, Convex polytopes. Springer, 1967, vol. 16

work page 1967
[48]

Subdivisions and triangulations of polytopes,

C. W. Lee and F. Santos, “Subdivisions and triangulations of polytopes,” in Handbook of discrete and computational geometry. Chapman and Hall/CRC, 2017, pp. 415–447

work page 2017
[49]

Aspects of convex geometry polyhedra, linear programming, shellings, voronoi diagrams, delaunay triangulations,

J. Gallier and J. Quaintance, “Aspects of convex geometry polyhedra, linear programming, shellings, voronoi diagrams, delaunay triangulations,” Department of Computer and Information Science, University of Pennsylvania , vol. 219104, pp. 31–235, 2017

work page 2017
[50]

J. M. Lee, Introduction to Topological Manifolds , 2nd ed., ser. Graduate Texts in Mathematics. New York: Springer, 2010. APPENDIX We continue our discussion from section III-D. Since {q1, . . . ,qn−k} are all endpoint of the simplex U, we relate Q and the random variable X as XQ = Xq1 . . . Xqn−k =   | | x . . . x | |   = x1T n−k or, (21) X − x1T n Q...

work page 2010

[1] [1]

A. N. Kolmogoroff, Grundbegriffe der Wahrscheinlichkeitsrechnung. New York: Springer, 1933

work page 1933

[2] [2]

Choquet-Bruhat, C

Y . Choquet-Bruhat, C. DeWitt-Morette, and M. Dillard-Bleick, Analysis, Manifolds and Physics, Part I: Basics , 2nd ed. Amsterdam: Elsevier, 1982

work page 1982

[3] [3]

Amari, Information Geometry and Its Applications

S.-I. Amari, Information Geometry and Its Applications . New York: Springer, 2016

work page 2016

[4] [4]

Amari, Differential-Geometrical Methods in Statistics , ser

S.-I. Amari, Differential-Geometrical Methods in Statistics , ser. Lecture Notes in Statistics. New York: Springer, 1990

work page 1990

[5] [5]

Information and the accuracy attainable in the estimation of statistical parameters,

C. R. Rao, “Information and the accuracy attainable in the estimation of statistical parameters,” in Breakthroughs in Statistics , ser. Springer Series in Statistics, S. Kotz and N. L. Johnson, Eds. New York: Springer, 1992, pp. 235–247

work page 1992

[6] [6]

When optimal transport meets information geometry,

G. Khan and J. Zhang, “When optimal transport meets information geometry,” Information Geometry, vol. 5, pp. 47–78, 2022

work page 2022

[7] [7]

Letters to the editor,

J. Dickey, N. T. Gridgeman, M. C. S. Kingsley, I. J. Good, J. E. Carlson, D. Gianola, M. H. Kutner, and S. Selvin, “Letters to the editor,” The American Statistician, vol. 29, no. 3, pp. 131–134, 1975

work page 1975

[8] [8]

Letters to the editor,

S. Selvin, M. Bloxham, A. I. Khuri, M. Moore, R. Coleman, G. R. Bryce, J. A. Hagans, T. C. Chalmers, E. A. Maxwell, and G. N. Smith, “Letters to the editor,” The American Statistician , vol. 29, no. 1, pp. 67–71, 1975

work page 1975

[9] [9]

Formulering van het ‘som-en-product’-probleem,

H. Freudenthal, “Formulering van het ‘som-en-product’-probleem,” Nieuw Archief voor Wiskunde , vol. 17, no. 3, p. 152, 1969

work page 1969

[10] [10]

Pride of problems, including one that is virtually impossible,

M. Gardner, “Pride of problems, including one that is virtually impossible,” Scientific American, vol. 241, no. 6, p. 22, 1979

work page 1979

[11] [11]

Baclawski, Introduction to Probability with R

K. Baclawski, Introduction to Probability with R . Chapman and Hall/CRC, 2008

work page 2008

[12] [12]

Internal energy, fundamental thermodynamic relation, and Gibbs’ ensemble theory as emergent laws of statistical counting,

H. Qian, “Internal energy, fundamental thermodynamic relation, and Gibbs’ ensemble theory as emergent laws of statistical counting,” Entropy, vol. 26, p. 1091, 2024

work page 2024

[13] [13]

Ambrosio, N

L. Ambrosio, N. Gigli, and G. Savare, Gradient flows, 2nd ed., ser. Lectures in Mathematics. ETH Z ¨urich. Basel, Switzerland: Birkhauser Verlag AG, Dec. 2008

work page 2008

[14] [14]

Villani, Optimal Transport, ser

C. Villani, Optimal Transport, ser. Grundlehren der mathematischen Wissenschaften. Berlin, Germany: Springer, Dec. 2009

work page 2009

[15] [15]

Information geometry of the EM and em algorithms for neural networks,

S.-I. Amari, “Information geometry of the EM and em algorithms for neural networks,” Neural Networks, vol. 8, no. 9, pp. 1379–1408, 1995

work page 1995

[16] [16]

E. T. Jaynes, Probability Theory: The Logic of Science . London, U.K.: Cambridge University Press, 2003

work page 2003

[17] [17]

S. H. Strogatz, Nonlinear Dynamics and Chaos With Applications to Physics, Biology, Chemistry, and Engineering , 2nd ed. Boca Raton: CRC Press, 2015

work page 2015

[18] [18]

A. W. van der Vaart and J. A. Wellner, Weak Convergence and Empirical Processes . New York: Springer, 1996

work page 1996

[19] [19]

Qian and H

H. Qian and H. Ge, Stochastic Chemical Reaction Systems in Biology . Cham, Switzerland: Springer Nature, 2021

work page 2021

[20] [20]

H. B. Callen, Thermodynamics and an Introduction to Thermostatistics , 2nd ed. New York: Wiley, 1991

work page 1991

[21] [21]

On thermodynamic information,

B. Miao, H. Qian, and Y .-S. Wu, “On thermodynamic information,” arXiv:2312.03454, 2023

work page arXiv 2023

[22] [22]

More is different: Broken symmetry and the nature of the hierarchical structure of science,

P. W. Anderson, “More is different: Broken symmetry and the nature of the hierarchical structure of science,” Science, vol. 177, pp. 393–396, 1972

work page 1972

[23] [23]

Counting single cells and computing their heterogeneity: From phenotypic frequencies to mean value of a quantitative biomarker,

H. Qian and Y .-C. Cheng, “Counting single cells and computing their heterogeneity: From phenotypic frequencies to mean value of a quantitative biomarker,” Quant. Biol., vol. 8, pp. 172–176, 2020

work page 2020

[24] [24]

Statistical analysis of random motion and energetic behavior of counting: Gibbs’ theory revisited,

E. Angelini and H. Qian, “Statistical analysis of random motion and energetic behavior of counting: Gibbs’ theory revisited,” J. Phys. Chem. B , vol. 127, pp. 2552–2564, 2023

work page 2023

[25] [25]

Further studies on the thermal equilibrium of gas molecules,

L. Boltzmann, “Further studies on the thermal equilibrium of gas molecules,” in The Kinetic Theory of Gases: An Anthology of Classic Papers with Historical Commentary, S. G. Brush and N. S. Hall, Eds. Singapore: World Scientific, 2003, pp. 262–349

work page 2003

[26] [26]

Planck, The Theory of Heat Radiation

M. Planck, The Theory of Heat Radiation . Blakiston, 1914

work page 1914

[27] [27]

A mathematical theory of communication,

C. E. Shannon, “A mathematical theory of communication,” The Bell System Technical Journal , vol. 27, no. 3, pp. 379–423, 1948

work page 1948

[28] [28]

A. Y . Khinchin, Mathematical Foundations of Information Theory . Dover, 1957

work page 1957

[29] [29]

Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy,

J. Shore and R. Johnson, “Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy,” IEEE Transactions on Information Theory , vol. 26, no. 1, pp. 26–37, 1980. 17

work page 1980

[30] [30]

On the probability of large deviations of random variables,

I. N. Sanov, “On the probability of large deviations of random variables,” Selected Translations in Mathematical Statistics and Probability , vol. 1, pp. 213–244, 1961

work page 1961

[31] [31]

Dembo and O

A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications , 2nd ed. New York: Springer, 1998

work page 1998

[32] [32]

D. A. Kappos, Probability algebras and stochastic spaces . Academic Press, 2014, vol. 7

work page 2014

[33] [33]

Twelve problems in probability no one likes to bring up,

G.-C. Rota, “Twelve problems in probability no one likes to bring up,” in Algebraic Combinatorics and Computer Science: A Tribute to Gian-Carlo Rota. Springer, 2001, pp. 57–93

work page 2001

[34] [34]

R ´enyi, Probability Theory

A. R ´enyi, Probability Theory. Courier Corporation, 2007

work page 2007

[35] [35]

Durrett, Probability: Theory and Examples

R. Durrett, Probability: Theory and Examples . Cambridge university press, 2019

work page 2019

[36] [36]

Stochastic calculus, filtering, and stochastic control,

R. Van Handel, “Stochastic calculus, filtering, and stochastic control,” Course notes., URL http://www. princeton. edu/rvan/acm217/ACM217. pdf, vol. 14, 2007

work page 2007

[37] [37]

M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information . Cambridge: Cambridge University Press, 2010

work page 2010

[38] [38]

Baym, Lectures On Quantum Mechanics

G. Baym, Lectures On Quantum Mechanics . CRC Press, 1969

work page 1969

[39] [39]

Importance Sampling in the Monte Carlo Study of Sequential Tests,

D. Siegmund, “Importance Sampling in the Monte Carlo Study of Sequential Tests,” The Annals of Statistics , vol. 4, no. 4, pp. 673 – 684, 1976

work page 1976

[40] [40]

T. L. Hill, Statistical Mechanics: Principles and Selected Applications . New York: McGraw-Hill, 1956

work page 1956

[41] [41]

R. T. Rockafellar, Convex Analysis. Princeton: Princeton University Press, 1970

work page 1970

[42] [42]

Clustering with Bregman divergences,

A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh, “Clustering with Bregman divergences,” Journal of Machine Learning Research , vol. 6, no. 58, pp. 1705–1749, 2005

work page 2005

[43] [43]

J. W. Gibbs, The Collected Works of J. Willard Gibbs . New Haven, CT: Yale Univ. Press, 1948

work page 1948

[44] [44]

¨Uber die ausdehnung der ph ¨anomenologschen thermodynamik auf die schwankungserscheinungen,

L. Szilard, “ ¨Uber die ausdehnung der ph ¨anomenologschen thermodynamik auf die schwankungserscheinungen,” Zeitschrift f ¨ur Physik , vol. 32, pp. 753–7888, 1925

work page 1925

[45] [45]

J. M. Lee, Introduction to Smooth Manifolds , ser. Graduate Texts in Mathematics. New York: Springer, 2002

work page 2002

[46] [46]

T. M. Cover and J. A. Thomas, Elements of Information Theory , 2nd ed. New York: Wiley-Interscience, 2006

work page 2006

[47] [47]

Gr ¨unbaum, V

B. Gr ¨unbaum, V . Klee, M. A. Perles, and G. C. Shephard, Convex polytopes. Springer, 1967, vol. 16

work page 1967

[48] [48]

Subdivisions and triangulations of polytopes,

C. W. Lee and F. Santos, “Subdivisions and triangulations of polytopes,” in Handbook of discrete and computational geometry. Chapman and Hall/CRC, 2017, pp. 415–447

work page 2017

[49] [49]

Aspects of convex geometry polyhedra, linear programming, shellings, voronoi diagrams, delaunay triangulations,

J. Gallier and J. Quaintance, “Aspects of convex geometry polyhedra, linear programming, shellings, voronoi diagrams, delaunay triangulations,” Department of Computer and Information Science, University of Pennsylvania , vol. 219104, pp. 31–235, 2017

work page 2017

[50] [50]

J. M. Lee, Introduction to Topological Manifolds , 2nd ed., ser. Graduate Texts in Mathematics. New York: Springer, 2010. APPENDIX We continue our discussion from section III-D. Since {q1, . . . ,qn−k} are all endpoint of the simplex U, we relate Q and the random variable X as XQ = Xq1 . . . Xqn−k =   | | x . . . x | |   = x1T n−k or, (21) X − x1T n Q...

work page 2010