pith. sign in

arxiv: 2501.01556 · v3 · submitted 2025-01-02 · 💻 cs.IT · math.IT

The Geometry of Statistical Data and Information: A Large Deviation Perspective

Pith reviewed 2026-05-23 06:20 UTC · model grok-4.3

classification 💻 cs.IT math.IT
keywords information geometrylarge deviation theoryempirical meansinformation projectionKolmogorov probabilityi.i.d. assumptionMarkovian assumption
0
0 comments X

The pith

The information projection from divergence minimization coincides with the projection in Kolmogorov probability theory under both i.i.d. and Markov assumptions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that the manifold of empirical mean values carries a geometry whose shape is set by the probability measure through large-deviation entropy functions. These functions are used to describe the space of data rather than the space of distributions. The central result is that the information projection obtained by minimizing divergence matches the information projection defined in Kolmogorov probability theory. The match is proved for sequences that are independent and identically distributed and for sequences that obey a Markov chain.

Core claim

The information projection defined in information geometry as divergence minimization coincides with the information projection in Kolmogorov's probability theory under both i.i.d. and Markovian assumptions.

What carries the argument

Large-deviation rate functions (entropy functions) that equip the manifold of empirical means with a Riemannian geometry.

If this is right

  • Fisher-Rao spherical geometry appears only for singleton frequencies under the i.i.d. case and fails for pairwise statistics.
  • The governing probability measure itself curves the space of empirical data.
  • The identification places information geometry inside the measure-theoretic foundations of probability.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same entropy-based geometry could be examined for stationary processes that are neither i.i.d. nor Markov.
  • If the projections remain identical, the construction supplies a concrete Riemannian metric on empirical-mean manifolds for any ergodic source.

Load-bearing premise

Large-deviation rate functions can be interpreted directly as defining a Riemannian geometry on the manifold of empirical means without extra regularity conditions on the probability measure.

What would settle it

An explicit pair of distributions and a Markov chain for which the divergence-minimizing projection differs from the Kolmogorov information projection.

read the original abstract

The manifold of empirical mean values of statistical data ad infinitum has a geometric shape that depends on the probability measure that governs the generating model. Large deviation theory produces entropy functions that depend on both the probability measure and the statistical data; we use entropy to study the geometry of the data space rather than that of the space of probability distributions. It is well known, since Rao's work, that the Fisher-Rao metric makes the probability simplex into a sphere. From our perspective, that result translates to the space of empirical singleton counting frequencies under an i.i.d. assumption. Following our ideas and going beyond i.i.d., the choice of measure curves the space. When we study the pairwise statistics, the spherical geometry breaks down entirely. We show that the information projection, defined in information geometry as divergence minimization, coincides with the information projection in Kolmogorov's probability theory. This identification holds under both i.i.d. and Markovian assumptions and connects information geometry to the foundations of probability theory.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper claims that the manifold of empirical mean values of statistical data has a geometric shape determined by the governing probability measure. Large deviation theory yields entropy functions depending on both the measure and data, which are used to study the geometry of the data space. It asserts that the information projection defined via divergence minimization in information geometry coincides with the information projection in Kolmogorov's probability theory, holding under both i.i.d. and Markovian assumptions. This connects information geometry to probability foundations, with the Fisher-Rao metric yielding spherical geometry for i.i.d. singleton frequencies but breaking down for pairwise statistics under Markov assumptions.

Significance. If the identification holds, the manuscript bridges information geometry and the foundations of probability by reinterpreting large-deviation rate functions as defining geometry on empirical data manifolds rather than probability distributions. The Markovian extension, showing breakdown of spherical geometry, is a substantive step beyond standard i.i.d. results such as Sanov's theorem. The work explicitly builds on Rao's Fisher-Rao metric without introducing free parameters, ad-hoc axioms, or invented entities. The potential concern about regularity conditions on the probability measure for a Riemannian interpretation does not appear to undermine the central claim, as the large-deviation principle is taken as granted and the identification remains internally consistent.

minor comments (3)
  1. [Abstract] Abstract: the phrase 'ad infinitum' is imprecise; replace with a clearer description of the limiting regime for empirical means.
  2. [Abstract] Abstract: the claim that spherical geometry 'breaks down entirely' for pairwise statistics would be strengthened by a brief concrete indication of the deviation (e.g., a specific rate-function property) in the Markov case.
  3. The connection to Kolmogorov's probability theory would benefit from citing one or two specific foundational results or theorems rather than a general reference.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their careful reading and positive evaluation of the manuscript. We are pleased that the significance of connecting large-deviation rate functions to the geometry of empirical data manifolds, and the identification of information projections under both i.i.d. and Markov assumptions, has been recognized. We will incorporate any minor revisions as appropriate.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's core claim equates the I-projection from information geometry (KL minimization) with a Kolmogorov-style projection via large-deviation rate functions. This identification for the i.i.d. case follows directly from the standard Sanov theorem, an external result. The Markovian extension applies the same rate-function minimization to pairwise empirical measures once the LDP for the pair measure is granted. No quoted equations reduce a prediction to a fitted parameter, no self-citation chain bears the central load, and no ansatz is smuggled in. The derivation is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard results from large deviation theory and information geometry; no new free parameters, ad-hoc axioms, or invented entities are introduced in the abstract.

axioms (2)
  • domain assumption Large deviation principle holds for the empirical measures under the stated i.i.d. and Markov assumptions
    Invoked to produce the entropy functions that define the geometry
  • standard math The Fisher-Rao metric on the probability simplex is known to be spherical (Rao)
    Used as the baseline that translates to singleton i.i.d. data

pith-pipeline@v0.9.0 · 5699 in / 1390 out tokens · 24068 ms · 2026-05-23T06:20:23.012996+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages

  1. [1]

    A. N. Kolmogoroff, Grundbegriffe der Wahrscheinlichkeitsrechnung. New York: Springer, 1933

  2. [2]

    Choquet-Bruhat, C

    Y . Choquet-Bruhat, C. DeWitt-Morette, and M. Dillard-Bleick, Analysis, Manifolds and Physics, Part I: Basics , 2nd ed. Amsterdam: Elsevier, 1982

  3. [3]

    Amari, Information Geometry and Its Applications

    S.-I. Amari, Information Geometry and Its Applications . New York: Springer, 2016

  4. [4]

    Amari, Differential-Geometrical Methods in Statistics , ser

    S.-I. Amari, Differential-Geometrical Methods in Statistics , ser. Lecture Notes in Statistics. New York: Springer, 1990

  5. [5]

    Information and the accuracy attainable in the estimation of statistical parameters,

    C. R. Rao, “Information and the accuracy attainable in the estimation of statistical parameters,” in Breakthroughs in Statistics , ser. Springer Series in Statistics, S. Kotz and N. L. Johnson, Eds. New York: Springer, 1992, pp. 235–247

  6. [6]

    When optimal transport meets information geometry,

    G. Khan and J. Zhang, “When optimal transport meets information geometry,” Information Geometry, vol. 5, pp. 47–78, 2022

  7. [7]

    Letters to the editor,

    J. Dickey, N. T. Gridgeman, M. C. S. Kingsley, I. J. Good, J. E. Carlson, D. Gianola, M. H. Kutner, and S. Selvin, “Letters to the editor,” The American Statistician, vol. 29, no. 3, pp. 131–134, 1975

  8. [8]

    Letters to the editor,

    S. Selvin, M. Bloxham, A. I. Khuri, M. Moore, R. Coleman, G. R. Bryce, J. A. Hagans, T. C. Chalmers, E. A. Maxwell, and G. N. Smith, “Letters to the editor,” The American Statistician , vol. 29, no. 1, pp. 67–71, 1975

  9. [9]

    Formulering van het ‘som-en-product’-probleem,

    H. Freudenthal, “Formulering van het ‘som-en-product’-probleem,” Nieuw Archief voor Wiskunde , vol. 17, no. 3, p. 152, 1969

  10. [10]

    Pride of problems, including one that is virtually impossible,

    M. Gardner, “Pride of problems, including one that is virtually impossible,” Scientific American, vol. 241, no. 6, p. 22, 1979

  11. [11]

    Baclawski, Introduction to Probability with R

    K. Baclawski, Introduction to Probability with R . Chapman and Hall/CRC, 2008

  12. [12]

    Internal energy, fundamental thermodynamic relation, and Gibbs’ ensemble theory as emergent laws of statistical counting,

    H. Qian, “Internal energy, fundamental thermodynamic relation, and Gibbs’ ensemble theory as emergent laws of statistical counting,” Entropy, vol. 26, p. 1091, 2024

  13. [13]

    Ambrosio, N

    L. Ambrosio, N. Gigli, and G. Savare, Gradient flows, 2nd ed., ser. Lectures in Mathematics. ETH Z ¨urich. Basel, Switzerland: Birkhauser Verlag AG, Dec. 2008

  14. [14]

    Villani, Optimal Transport, ser

    C. Villani, Optimal Transport, ser. Grundlehren der mathematischen Wissenschaften. Berlin, Germany: Springer, Dec. 2009

  15. [15]

    Information geometry of the EM and em algorithms for neural networks,

    S.-I. Amari, “Information geometry of the EM and em algorithms for neural networks,” Neural Networks, vol. 8, no. 9, pp. 1379–1408, 1995

  16. [16]

    E. T. Jaynes, Probability Theory: The Logic of Science . London, U.K.: Cambridge University Press, 2003

  17. [17]

    S. H. Strogatz, Nonlinear Dynamics and Chaos With Applications to Physics, Biology, Chemistry, and Engineering , 2nd ed. Boca Raton: CRC Press, 2015

  18. [18]

    A. W. van der Vaart and J. A. Wellner, Weak Convergence and Empirical Processes . New York: Springer, 1996

  19. [19]

    Qian and H

    H. Qian and H. Ge, Stochastic Chemical Reaction Systems in Biology . Cham, Switzerland: Springer Nature, 2021

  20. [20]

    H. B. Callen, Thermodynamics and an Introduction to Thermostatistics , 2nd ed. New York: Wiley, 1991

  21. [21]

    On thermodynamic information,

    B. Miao, H. Qian, and Y .-S. Wu, “On thermodynamic information,” arXiv:2312.03454, 2023

  22. [22]

    More is different: Broken symmetry and the nature of the hierarchical structure of science,

    P. W. Anderson, “More is different: Broken symmetry and the nature of the hierarchical structure of science,” Science, vol. 177, pp. 393–396, 1972

  23. [23]

    Counting single cells and computing their heterogeneity: From phenotypic frequencies to mean value of a quantitative biomarker,

    H. Qian and Y .-C. Cheng, “Counting single cells and computing their heterogeneity: From phenotypic frequencies to mean value of a quantitative biomarker,” Quant. Biol., vol. 8, pp. 172–176, 2020

  24. [24]

    Statistical analysis of random motion and energetic behavior of counting: Gibbs’ theory revisited,

    E. Angelini and H. Qian, “Statistical analysis of random motion and energetic behavior of counting: Gibbs’ theory revisited,” J. Phys. Chem. B , vol. 127, pp. 2552–2564, 2023

  25. [25]

    Further studies on the thermal equilibrium of gas molecules,

    L. Boltzmann, “Further studies on the thermal equilibrium of gas molecules,” in The Kinetic Theory of Gases: An Anthology of Classic Papers with Historical Commentary, S. G. Brush and N. S. Hall, Eds. Singapore: World Scientific, 2003, pp. 262–349

  26. [26]

    Planck, The Theory of Heat Radiation

    M. Planck, The Theory of Heat Radiation . Blakiston, 1914

  27. [27]

    A mathematical theory of communication,

    C. E. Shannon, “A mathematical theory of communication,” The Bell System Technical Journal , vol. 27, no. 3, pp. 379–423, 1948

  28. [28]

    A. Y . Khinchin, Mathematical Foundations of Information Theory . Dover, 1957

  29. [29]

    Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy,

    J. Shore and R. Johnson, “Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy,” IEEE Transactions on Information Theory , vol. 26, no. 1, pp. 26–37, 1980. 17

  30. [30]

    On the probability of large deviations of random variables,

    I. N. Sanov, “On the probability of large deviations of random variables,” Selected Translations in Mathematical Statistics and Probability , vol. 1, pp. 213–244, 1961

  31. [31]

    Dembo and O

    A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications , 2nd ed. New York: Springer, 1998

  32. [32]

    D. A. Kappos, Probability algebras and stochastic spaces . Academic Press, 2014, vol. 7

  33. [33]

    Twelve problems in probability no one likes to bring up,

    G.-C. Rota, “Twelve problems in probability no one likes to bring up,” in Algebraic Combinatorics and Computer Science: A Tribute to Gian-Carlo Rota. Springer, 2001, pp. 57–93

  34. [34]

    R ´enyi, Probability Theory

    A. R ´enyi, Probability Theory. Courier Corporation, 2007

  35. [35]

    Durrett, Probability: Theory and Examples

    R. Durrett, Probability: Theory and Examples . Cambridge university press, 2019

  36. [36]

    Stochastic calculus, filtering, and stochastic control,

    R. Van Handel, “Stochastic calculus, filtering, and stochastic control,” Course notes., URL http://www. princeton. edu/rvan/acm217/ACM217. pdf, vol. 14, 2007

  37. [37]

    M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information . Cambridge: Cambridge University Press, 2010

  38. [38]

    Baym, Lectures On Quantum Mechanics

    G. Baym, Lectures On Quantum Mechanics . CRC Press, 1969

  39. [39]

    Importance Sampling in the Monte Carlo Study of Sequential Tests,

    D. Siegmund, “Importance Sampling in the Monte Carlo Study of Sequential Tests,” The Annals of Statistics , vol. 4, no. 4, pp. 673 – 684, 1976

  40. [40]

    T. L. Hill, Statistical Mechanics: Principles and Selected Applications . New York: McGraw-Hill, 1956

  41. [41]

    R. T. Rockafellar, Convex Analysis. Princeton: Princeton University Press, 1970

  42. [42]

    Clustering with Bregman divergences,

    A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh, “Clustering with Bregman divergences,” Journal of Machine Learning Research , vol. 6, no. 58, pp. 1705–1749, 2005

  43. [43]

    J. W. Gibbs, The Collected Works of J. Willard Gibbs . New Haven, CT: Yale Univ. Press, 1948

  44. [44]

    ¨Uber die ausdehnung der ph ¨anomenologschen thermodynamik auf die schwankungserscheinungen,

    L. Szilard, “ ¨Uber die ausdehnung der ph ¨anomenologschen thermodynamik auf die schwankungserscheinungen,” Zeitschrift f ¨ur Physik , vol. 32, pp. 753–7888, 1925

  45. [45]

    J. M. Lee, Introduction to Smooth Manifolds , ser. Graduate Texts in Mathematics. New York: Springer, 2002

  46. [46]

    T. M. Cover and J. A. Thomas, Elements of Information Theory , 2nd ed. New York: Wiley-Interscience, 2006

  47. [47]

    Gr ¨unbaum, V

    B. Gr ¨unbaum, V . Klee, M. A. Perles, and G. C. Shephard, Convex polytopes. Springer, 1967, vol. 16

  48. [48]

    Subdivisions and triangulations of polytopes,

    C. W. Lee and F. Santos, “Subdivisions and triangulations of polytopes,” in Handbook of discrete and computational geometry. Chapman and Hall/CRC, 2017, pp. 415–447

  49. [49]

    Aspects of convex geometry polyhedra, linear programming, shellings, voronoi diagrams, delaunay triangulations,

    J. Gallier and J. Quaintance, “Aspects of convex geometry polyhedra, linear programming, shellings, voronoi diagrams, delaunay triangulations,” Department of Computer and Information Science, University of Pennsylvania , vol. 219104, pp. 31–235, 2017

  50. [50]

    J. M. Lee, Introduction to Topological Manifolds , 2nd ed., ser. Graduate Texts in Mathematics. New York: Springer, 2010. APPENDIX We continue our discussion from section III-D. Since {q1, . . . ,qn−k} are all endpoint of the simplex U, we relate Q and the random variable X as XQ = Xq1 . . . Xqn−k =   | | x . . . x | |   = x1T n−k or, (21) X − x1T n Q...