pith. sign in

arxiv: 2606.21551 · v1 · pith:OQN6NXOEnew · submitted 2026-06-19 · 🧮 math.ST · cs.IT· math.CT· math.IT· math.PR· stat.TH

Reformulation Invariance and the Axiomatic Foundations of Inference

Pith reviewed 2026-06-26 12:34 UTC · model grok-4.3

classification 🧮 math.ST cs.ITmath.CTmath.ITmath.PRstat.TH
keywords reformulation invarianceinference axiomsKullback-Leibler divergencef-divergencesalpha-divergencescategory of statistical modelspreorder on measuresmaximum entropy
0
0 comments X

The pith

Requiring an inference method to give the same answer to any reformulation of a problem forces it to minimize the Kullback-Leibler divergence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the single requirement of reformulation invariance is enough to force any inference procedure to minimize some classical divergence. Successive additional invariance conditions then narrow the family first to the α-divergences and finally to the Kullback-Leibler divergence alone. Inference is recast as selecting a least element under a preorder on positive measures rather than minimizing a numerical functional, with a divergence serving only as one possible numerical scale for that preorder. The reformulations are treated as the morphisms of a category of inference problems, and the invariance requirement is expressed by requiring the inference operator to be a covariant functor into Cencov's category of statistical models. The argument is proved first on finite spaces and then lifted to general measurable spaces by an elementary closure argument.

Core claim

Inference is the selection of a least element under a preorder on positive measures; the requirement that this selection be preserved under all reformulations of an inference problem, modeled as the morphisms of a category, forces the preorder to be represented by the Kullback-Leibler divergence when the inference operator is required to be a covariant functor into the category of statistical models, narrowing the admissible divergences from the broad f-divergences through the α-divergences to the single KL divergence.

What carries the argument

Reformulation invariance expressed as a covariant functor from the category of inference problems into Cencov's category of statistical models.

If this is right

  • Inference reduces to minimization of a classical divergence.
  • Stronger reformulation conditions narrow the admissible family to the α-divergences.
  • Full invariance under all reformulations selects only the Kullback-Leibler divergence.
  • The representation holds for both discrete and continuous spaces.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Different choices of the target category could yield other inference rules under the same invariance principle.
  • The preorder perspective may allow direct comparison of inference methods without first choosing a numerical divergence.

Load-bearing premise

The morphisms of the category of inference problems together with the covariant-functor condition into Cencov's category of statistical models are sufficient to characterize the divergence without additional hidden structure or restrictions on the class of allowed reformulations.

What would settle it

An explicit inference procedure that respects every reformulation invariance stated in the paper yet does not minimize the Kullback-Leibler divergence would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.21551 by Bert de Vries, Rapha\"el Tr\'esor, Thijs van de Laar.

Figure 1
Figure 1. Figure 1: Guiding example: a building B of three floors {F1, F2, F3} holding 4, 3, and 2 rooms. Figure a shows the inferred counts P ⋆ , additive from rooms to floors to the building; figure b shows the three resolutions as a chain of σ-algebras, the room partition refining the floor partition refining {∅, B}. A coarse-resolution constraint such as P ⋆ (F1) = 5 or 7 ≤ P ⋆ (F2 ∪ F3) ≤ 12 lives on AF and says nothing … view at source ↗
Figure 2
Figure 2. Figure 2: Action of the inference operator T . A functor preserves the category structure between its source and target, encoding that the inference results transport consistently along reformulations of the underlying space that preserve the information. The inference operator T(Ω,A) sends each information (Ω, A), I of I to a family of measures (Ω, A), {Pθ} of M, and lifts each information morphism t# to a Markov k… view at source ↗
read the original abstract

Maximum entropy, Bayesian updating, and exponential-family estimation are all instances of a common inference principle: selecting the measure or distribution that minimizes a divergence subject to the available constraints. Which divergence to use is usually decided by analytic convenience, by empirical performance, or by a set of axioms chosen to single it out, leaving open a basic question: why one divergence and not another? We answer it from a single requirement: an inference method should return the same answer whenever the same problem is presented in an equivalent form, for instance, after simply renaming its parts. This requirement alone forces inference to be the minimisation of a classical divergence, and each further reformulation it must respect tightens the admissible family one notch, narrowing the broad f-divergences to the {\alpha}-divergences and finally to the single Kullback-Leibler (KL) divergence. Mathematically, inference is recast from minimising a numerical functional to selecting a least element under a preorder on positive measures, a divergence being merely one numerical scale that reproduces that preorder. The reformulations are the morphisms of a category of inference problems, and the invariance requirement says the inference operator is a covariant functor into the category of statistical models of Cencov, mirroring his characterisation of the Fisher metric. The representation is proved on finite spaces and lifted to general measurable spaces by an elementary closure, covering discrete and continuous spaces alike. Earlier axiomatisations, such as those of Shore-Johnson and Csiszar, postulate their consistency axioms directly and only on finite alphabets; here the axioms follow from reformulation invariance alone.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that reformulation invariance alone—formalized by viewing reformulations as morphisms of a category of inference problems and requiring the inference operator to be a covariant functor into Cencov's category of statistical models—forces any inference method to select a least element under a preorder on positive measures. This is equivalent to minimizing a classical divergence; successive invariance requirements narrow the admissible family from f-divergences to α-divergences and finally to the single Kullback-Leibler divergence. The representation theorem is proved on finite spaces and lifted to general measurable spaces by an elementary closure operation.

Significance. If the derivation is free of hidden restrictions on the allowed morphisms, the result supplies a unified, invariance-based foundation for maximum-entropy, Bayesian, and exponential-family methods that derives the divergence choice rather than postulating consistency axioms directly on finite alphabets. It extends Cencov's functorial characterization of the Fisher metric to the divergence level and covers both discrete and continuous settings.

major comments (2)
  1. [representation on finite spaces] The abstract states that the covariant-functor condition into Cencov's category derives the preorder without presupposing the divergence form, yet supplies no explicit construction of the morphisms of the inference-problem category or verification that the functoriality condition alone (rather than an implicit restriction on which reformulations count as morphisms) forces the preorder to be the one induced by KL. This is the load-bearing step for the narrowing claim.
  2. [extension to general measurable spaces] The lifting from finite to general spaces is described only as 'elementary closure.' It is unclear whether this operation preserves uniqueness of the preorder or introduces additional structure that could admit other divergences; a concrete statement of the closure operation and a check that it does not enlarge the admissible family is required.
minor comments (2)
  1. The abstract refers to 'each further reformulation it must respect' without listing the concrete reformulations that successively eliminate f-divergences and α-divergences; an explicit enumeration would improve readability.
  2. Notation for the preorder on positive measures and for the functor should be introduced with a short diagram or table relating the inference-problem category, the statistical-model category, and the induced preorder.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting the two load-bearing steps in the argument. We address each comment below and will revise the manuscript to supply the requested explicit constructions and proofs.

read point-by-point responses
  1. Referee: [representation on finite spaces] The abstract states that the covariant-functor condition into Cencov's category derives the preorder without presupposing the divergence form, yet supplies no explicit construction of the morphisms of the inference-problem category or verification that the functoriality condition alone (rather than an implicit restriction on which reformulations count as morphisms) forces the preorder to be the one induced by KL. This is the load-bearing step for the narrowing claim.

    Authors: Section 2 defines the category of inference problems with objects as pairs (X, C) where C is a convex set of positive measures on X, and morphisms as measurable maps that preserve the constraint sets (including relabelings, embeddings, and marginalizations). The proof in Section 3 proceeds by exhibiting a family of such morphisms that force any covariant functor to select the KL preorder; any other f-divergence violates covariance for at least one of these morphisms. We agree the presentation can be made more explicit and will add a new subsection enumerating the generating morphisms together with a self-contained lemma that isolates the functoriality step from any auxiliary assumptions. This will be included in the revision. revision: yes

  2. Referee: [extension to general measurable spaces] The lifting from finite to general spaces is described only as 'elementary closure.' It is unclear whether this operation preserves uniqueness of the preorder or introduces additional structure that could admit other divergences; a concrete statement of the closure operation and a check that it does not enlarge the admissible family is required.

    Authors: The closure operation is the smallest preorder on positive measures that agrees with the finite-support case and is closed under pointwise limits and under convex combinations with finite-support measures. We will expand the relevant section to state this definition formally and add a proposition proving that any preorder satisfying the functoriality condition on general spaces must restrict to the finite case, thereby excluding other divergences. The revised text will contain the explicit verification that the admissible family remains unchanged. revision: yes

Circularity Check

0 steps flagged

No significant circularity: derivation rests on external Cencov category and direct proof

full rationale

The paper's central derivation recasts inference as a covariant functor from a category of inference problems (defined via reformulation morphisms) into Cencov's category of statistical models. This is presented as an independent construction whose functoriality condition forces the preorder on measures and narrows to KL, with an explicit proof on finite spaces followed by elementary closure. No step reduces by construction to a fitted parameter, self-defined quantity, or load-bearing self-citation by the present authors; Cencov's prior work is external. The claim is therefore self-contained against the stated axioms and category, warranting score 0.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the single invariance requirement together with the categorical structure taken from Cencov; no free parameters or new invented entities are introduced in the abstract.

axioms (1)
  • domain assumption An inference method must return the same answer whenever the same problem is presented in an equivalent form (reformulation invariance).
    This is the sole requirement used to derive the admissible family of divergences.

pith-pipeline@v0.9.1-grok · 5837 in / 1347 out tokens · 28108 ms · 2026-06-26T12:34:15.901911+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 25 canonical work pages

  1. [1]

    Uspekhi Matematicheskikh Nauk , volume=

    On the concept of entropy of a finite probabilistic scheme , author=. Uspekhi Matematicheskikh Nauk , volume=. 1956 , publisher=

  2. [2]

    Journal of Statistical Physics , author =

    A new theorem of information theory , volume =. Journal of Statistical Physics , author =. 1969 , keywords =. doi:10.1007/BF01106578 , abstract =

  3. [3]

    Markov Categories and Entropy , year=

    Perrone, Paolo , journal=. Markov Categories and Entropy , year=

  4. [4]

    , year =

    Cox, Richard T. , year =. Algebra of. doi:10.56021/9780801869822 , abstract =

  5. [5]

    Entropy , VOLUME =

    Caticha, Ariel , TITLE =. Entropy , VOLUME =. 2021 , NUMBER =

  6. [6]

    International Journal of Approximate Reasoning , author =

    A note on the inevitability of maximum entropy , volume =. International Journal of Approximate Reasoning , author =. 1990 , keywords =. doi:10.1016/0888-613X(90)90020-3 , abstract =

  7. [7]

    Knuth , keywords =

    Kevin H. Knuth , keywords =. Lattice duality: The origin of probability and entropy , journal =. 2005 , note =. doi:https://doi.org/10.1016/j.neucom.2004.11.039 , url =

  8. [8]

    Eine informationstheoretische

    Csiszár, Imre , year =. Eine informationstheoretische. A Magyar Tudományos Akadémia Matematikai Kutató Intézetének Közleményei , publisher =

  9. [9]

    Entropy , VOLUME =

    Csiszár, Imre , TITLE =. Entropy , VOLUME =. 2008 , NUMBER =

  10. [10]

    Nonadditive Entropies Yield Probability Distributions with Biases not Warranted by the Data , author =. Phys. Rev. Lett. , volume =. 2013 , month =. doi:10.1103/PhysRevLett.111.180604 , url =

  11. [11]

    Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics , author =

    Can the maximum entropy principle be explained as a consistency requirement? , volume =. Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics , author =. 1995 , pages =. doi:10.1016/1355-2198(95)00015-1 , abstract =

  12. [12]

    Entropy , VOLUME =

    Tsallis, Constantino , TITLE =. Entropy , VOLUME =. 2015 , NUMBER =

  13. [13]

    Rényi, Alfréd , month = jan, year =. On. Proceedings of the

  14. [14]

    IEEE Transactions on Information Theory , author =

    Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy , volume =. IEEE Transactions on Information Theory , author =. 1980 , note =. doi:10.1109/TIT.1980.1056144 , abstract =

  15. [15]

    Skilling, John , editor =. The. Maximum-. 1988 , keywords =. doi:10.1007/978-94-009-3049-0_8 , abstract =

  16. [16]

    , TITLE =

    Pressé, Steve and Ghosh, Kingshuk and Lee, Julian and Dill, Ken A. , TITLE =. Entropy , VOLUME =. 2015 , NUMBER =

  17. [17]

    When Shannon and Khinchin meet Shore and Johnson: Equivalence of information theory and statistical inference axiomatics , author =. Phys. Rev. E , volume =. 2020 , month =. doi:10.1103/PhysRevE.101.042126 , url =

  18. [18]

    Maximum Entropy Principle in Statistical Inference: Case for Non-Shannonian Entropies , author =. Phys. Rev. Lett. , volume =. 2019 , month =. doi:10.1103/PhysRevLett.122.120601 , url =

  19. [19]

    doi:10.1109/TIT.2009.2030485 , abstract =

    Amari, Shun-Ichi , title =. IEEE Trans. Inf. Theor. , month = nov, pages =. 2009 , issue_date =. doi:10.1109/TIT.2009.2030485 , abstract =

  20. [20]

    Rényi Divergence and Kullback-Leibler Divergence , year=

    van Erven, Tim and Harremos, Peter , journal=. Rényi Divergence and Kullback-Leibler Divergence , year=

  21. [21]

    2017 , publisher=

    Information geometry , author=. 2017 , publisher=

  22. [22]

    and Fritz, Tobias and Leinster, Tom , TITLE =

    Baez, John C. and Fritz, Tobias and Leinster, Tom , TITLE =. Entropy , VOLUME =. 2011 , NUMBER =

  23. [23]

    2014 , eprint=

    A Bayesian Characterization of Relative Entropy , author=. 2014 , eprint=

  24. [24]

    A Categorical Characterization of Relative Entropy on Standard Borel Spaces , journal =

    Nicolas Gagné and Prakash Panangaden , keywords =. A Categorical Characterization of Relative Entropy on Standard Borel Spaces , journal =. 2018 , note =. doi:https://doi.org/10.1016/j.entcs.2018.03.020 , url =

  25. [25]

    American Journal of Physics , author =

    Probability,. American Journal of Physics , author =. 1946 , pages =. doi:10.1119/1.1990764 , number =

  26. [26]

    International Journal of Approximate Reasoning , author =

    Constructing a logic of plausible inference: a guide to. International Journal of Approximate Reasoning , author =. 2003 , keywords =. doi:10.1016/S0888-613X(03)00051-3 , abstract =

  27. [27]

    Information Theory, IEEE Transactions on , author =

    General. Information Theory, IEEE Transactions on , author =. 1990 , pages =. doi:10.1109/18.50370 , abstract =

  28. [28]

    , editor =

    Klir, George J. , editor =. Facets of. Entropy. 2003 , keywords =. doi:10.1007/978-3-540-36212-8_2 , abstract =

  29. [29]

    Axioms , author =

    Foundations of. Axioms , author =. 2012 , note =. doi:10.3390/axioms1010038 , abstract =

  30. [30]

    Information

    Amari, Shun-ichi , year =. Information. doi:10.1007/978-4-431-55978-8 , language =

  31. [31]

    Borwein, J. M. and Lewis, A. S. , year =. Duality. doi:10.1137/0329017 , abstract =

  32. [32]

    Entropic

    Caticha, Ariel , year =. Entropic

  33. [33]

    Csiszar, Imre , month = dec, year =. Why. The Annals of Statistics , publisher =. doi:10.1214/aos/1176348385 , abstract =

  34. [34]

    Representation of a preference ordering by a numerical function , isbn =

    Debreu, Gerard , editor =. Representation of a preference ordering by a numerical function , isbn =. Mathematical. 1983 , pages =. doi:10.1017/CCOL052123736X.007 , urldate =

  35. [35]

    Cowles Foundation Discussion Papers , author =

    Topological. Cowles Foundation Discussion Papers , author =

  36. [36]

    D'Amicantonio, Giacomo and Bondarev, Egor and With, Peter H. N. De , month = nov, year =. Automated. doi:10.48550/arXiv.2311.02598 , abstract =

  37. [37]

    Jaynes, E. T. , month = may, year =. Information. Physical Review , publisher =. doi:10.1103/PhysRev.106.620 , abstract =

  38. [38]

    Jaynes, Proceedings of the IEEE70(9), 939 (1982)

    On the rationale of maximum-entropy methods , volume =. Proceedings of the IEEE , author =. 1982 , note =. doi:10.1109/PROC.1982.12425 , abstract =

  39. [39]

    On Divergences and Informations in Statistics and Information Theory , volume =

    Liese, Friedrich and Vajda, Igor , year =. On Divergences and Informations in Statistics and Information Theory , volume =. Information Theory, IEEE Transactions on , doi =

  40. [40]

    , journal=

    Jaynes, Edwin T. , journal=. Prior Probabilities , year=

  41. [41]

    Kullback and R

    S. Kullback and R. A. Leibler , title =. The Annals of Mathematical Statistics , number =. 1951 , doi =

  42. [42]

    Infinite joins and meets , isbn =

    Sikorski, Roman , year =. Infinite joins and meets , isbn =. Boolean Algebras , publisher =. doi:10.1007/978-3-642-85820-8_2 , pages =