Reformulation Invariance and the Axiomatic Foundations of Inference
Pith reviewed 2026-06-26 12:34 UTC · model grok-4.3
The pith
Requiring an inference method to give the same answer to any reformulation of a problem forces it to minimize the Kullback-Leibler divergence.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Inference is the selection of a least element under a preorder on positive measures; the requirement that this selection be preserved under all reformulations of an inference problem, modeled as the morphisms of a category, forces the preorder to be represented by the Kullback-Leibler divergence when the inference operator is required to be a covariant functor into the category of statistical models, narrowing the admissible divergences from the broad f-divergences through the α-divergences to the single KL divergence.
What carries the argument
Reformulation invariance expressed as a covariant functor from the category of inference problems into Cencov's category of statistical models.
If this is right
- Inference reduces to minimization of a classical divergence.
- Stronger reformulation conditions narrow the admissible family to the α-divergences.
- Full invariance under all reformulations selects only the Kullback-Leibler divergence.
- The representation holds for both discrete and continuous spaces.
Where Pith is reading between the lines
- Different choices of the target category could yield other inference rules under the same invariance principle.
- The preorder perspective may allow direct comparison of inference methods without first choosing a numerical divergence.
Load-bearing premise
The morphisms of the category of inference problems together with the covariant-functor condition into Cencov's category of statistical models are sufficient to characterize the divergence without additional hidden structure or restrictions on the class of allowed reformulations.
What would settle it
An explicit inference procedure that respects every reformulation invariance stated in the paper yet does not minimize the Kullback-Leibler divergence would falsify the central claim.
Figures
read the original abstract
Maximum entropy, Bayesian updating, and exponential-family estimation are all instances of a common inference principle: selecting the measure or distribution that minimizes a divergence subject to the available constraints. Which divergence to use is usually decided by analytic convenience, by empirical performance, or by a set of axioms chosen to single it out, leaving open a basic question: why one divergence and not another? We answer it from a single requirement: an inference method should return the same answer whenever the same problem is presented in an equivalent form, for instance, after simply renaming its parts. This requirement alone forces inference to be the minimisation of a classical divergence, and each further reformulation it must respect tightens the admissible family one notch, narrowing the broad f-divergences to the {\alpha}-divergences and finally to the single Kullback-Leibler (KL) divergence. Mathematically, inference is recast from minimising a numerical functional to selecting a least element under a preorder on positive measures, a divergence being merely one numerical scale that reproduces that preorder. The reformulations are the morphisms of a category of inference problems, and the invariance requirement says the inference operator is a covariant functor into the category of statistical models of Cencov, mirroring his characterisation of the Fisher metric. The representation is proved on finite spaces and lifted to general measurable spaces by an elementary closure, covering discrete and continuous spaces alike. Earlier axiomatisations, such as those of Shore-Johnson and Csiszar, postulate their consistency axioms directly and only on finite alphabets; here the axioms follow from reformulation invariance alone.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that reformulation invariance alone—formalized by viewing reformulations as morphisms of a category of inference problems and requiring the inference operator to be a covariant functor into Cencov's category of statistical models—forces any inference method to select a least element under a preorder on positive measures. This is equivalent to minimizing a classical divergence; successive invariance requirements narrow the admissible family from f-divergences to α-divergences and finally to the single Kullback-Leibler divergence. The representation theorem is proved on finite spaces and lifted to general measurable spaces by an elementary closure operation.
Significance. If the derivation is free of hidden restrictions on the allowed morphisms, the result supplies a unified, invariance-based foundation for maximum-entropy, Bayesian, and exponential-family methods that derives the divergence choice rather than postulating consistency axioms directly on finite alphabets. It extends Cencov's functorial characterization of the Fisher metric to the divergence level and covers both discrete and continuous settings.
major comments (2)
- [representation on finite spaces] The abstract states that the covariant-functor condition into Cencov's category derives the preorder without presupposing the divergence form, yet supplies no explicit construction of the morphisms of the inference-problem category or verification that the functoriality condition alone (rather than an implicit restriction on which reformulations count as morphisms) forces the preorder to be the one induced by KL. This is the load-bearing step for the narrowing claim.
- [extension to general measurable spaces] The lifting from finite to general spaces is described only as 'elementary closure.' It is unclear whether this operation preserves uniqueness of the preorder or introduces additional structure that could admit other divergences; a concrete statement of the closure operation and a check that it does not enlarge the admissible family is required.
minor comments (2)
- The abstract refers to 'each further reformulation it must respect' without listing the concrete reformulations that successively eliminate f-divergences and α-divergences; an explicit enumeration would improve readability.
- Notation for the preorder on positive measures and for the functor should be introduced with a short diagram or table relating the inference-problem category, the statistical-model category, and the induced preorder.
Simulated Author's Rebuttal
We thank the referee for the careful reading and for highlighting the two load-bearing steps in the argument. We address each comment below and will revise the manuscript to supply the requested explicit constructions and proofs.
read point-by-point responses
-
Referee: [representation on finite spaces] The abstract states that the covariant-functor condition into Cencov's category derives the preorder without presupposing the divergence form, yet supplies no explicit construction of the morphisms of the inference-problem category or verification that the functoriality condition alone (rather than an implicit restriction on which reformulations count as morphisms) forces the preorder to be the one induced by KL. This is the load-bearing step for the narrowing claim.
Authors: Section 2 defines the category of inference problems with objects as pairs (X, C) where C is a convex set of positive measures on X, and morphisms as measurable maps that preserve the constraint sets (including relabelings, embeddings, and marginalizations). The proof in Section 3 proceeds by exhibiting a family of such morphisms that force any covariant functor to select the KL preorder; any other f-divergence violates covariance for at least one of these morphisms. We agree the presentation can be made more explicit and will add a new subsection enumerating the generating morphisms together with a self-contained lemma that isolates the functoriality step from any auxiliary assumptions. This will be included in the revision. revision: yes
-
Referee: [extension to general measurable spaces] The lifting from finite to general spaces is described only as 'elementary closure.' It is unclear whether this operation preserves uniqueness of the preorder or introduces additional structure that could admit other divergences; a concrete statement of the closure operation and a check that it does not enlarge the admissible family is required.
Authors: The closure operation is the smallest preorder on positive measures that agrees with the finite-support case and is closed under pointwise limits and under convex combinations with finite-support measures. We will expand the relevant section to state this definition formally and add a proposition proving that any preorder satisfying the functoriality condition on general spaces must restrict to the finite case, thereby excluding other divergences. The revised text will contain the explicit verification that the admissible family remains unchanged. revision: yes
Circularity Check
No significant circularity: derivation rests on external Cencov category and direct proof
full rationale
The paper's central derivation recasts inference as a covariant functor from a category of inference problems (defined via reformulation morphisms) into Cencov's category of statistical models. This is presented as an independent construction whose functoriality condition forces the preorder on measures and narrows to KL, with an explicit proof on finite spaces followed by elementary closure. No step reduces by construction to a fitted parameter, self-defined quantity, or load-bearing self-citation by the present authors; Cencov's prior work is external. The claim is therefore self-contained against the stated axioms and category, warranting score 0.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption An inference method must return the same answer whenever the same problem is presented in an equivalent form (reformulation invariance).
Reference graph
Works this paper leans on
-
[1]
Uspekhi Matematicheskikh Nauk , volume=
On the concept of entropy of a finite probabilistic scheme , author=. Uspekhi Matematicheskikh Nauk , volume=. 1956 , publisher=
1956
-
[2]
Journal of Statistical Physics , author =
A new theorem of information theory , volume =. Journal of Statistical Physics , author =. 1969 , keywords =. doi:10.1007/BF01106578 , abstract =
-
[3]
Markov Categories and Entropy , year=
Perrone, Paolo , journal=. Markov Categories and Entropy , year=
-
[4]
Cox, Richard T. , year =. Algebra of. doi:10.56021/9780801869822 , abstract =
-
[5]
Entropy , VOLUME =
Caticha, Ariel , TITLE =. Entropy , VOLUME =. 2021 , NUMBER =
2021
-
[6]
International Journal of Approximate Reasoning , author =
A note on the inevitability of maximum entropy , volume =. International Journal of Approximate Reasoning , author =. 1990 , keywords =. doi:10.1016/0888-613X(90)90020-3 , abstract =
-
[7]
Kevin H. Knuth , keywords =. Lattice duality: The origin of probability and entropy , journal =. 2005 , note =. doi:https://doi.org/10.1016/j.neucom.2004.11.039 , url =
-
[8]
Eine informationstheoretische
Csiszár, Imre , year =. Eine informationstheoretische. A Magyar Tudományos Akadémia Matematikai Kutató Intézetének Közleményei , publisher =
-
[9]
Entropy , VOLUME =
Csiszár, Imre , TITLE =. Entropy , VOLUME =. 2008 , NUMBER =
2008
-
[10]
Nonadditive Entropies Yield Probability Distributions with Biases not Warranted by the Data , author =. Phys. Rev. Lett. , volume =. 2013 , month =. doi:10.1103/PhysRevLett.111.180604 , url =
-
[11]
Can the maximum entropy principle be explained as a consistency requirement? , volume =. Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics , author =. 1995 , pages =. doi:10.1016/1355-2198(95)00015-1 , abstract =
-
[12]
Entropy , VOLUME =
Tsallis, Constantino , TITLE =. Entropy , VOLUME =. 2015 , NUMBER =
2015
-
[13]
Rényi, Alfréd , month = jan, year =. On. Proceedings of the
-
[14]
IEEE Transactions on Information Theory , author =
Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy , volume =. IEEE Transactions on Information Theory , author =. 1980 , note =. doi:10.1109/TIT.1980.1056144 , abstract =
-
[15]
Skilling, John , editor =. The. Maximum-. 1988 , keywords =. doi:10.1007/978-94-009-3049-0_8 , abstract =
-
[16]
, TITLE =
Pressé, Steve and Ghosh, Kingshuk and Lee, Julian and Dill, Ken A. , TITLE =. Entropy , VOLUME =. 2015 , NUMBER =
2015
-
[17]
When Shannon and Khinchin meet Shore and Johnson: Equivalence of information theory and statistical inference axiomatics , author =. Phys. Rev. E , volume =. 2020 , month =. doi:10.1103/PhysRevE.101.042126 , url =
-
[18]
Maximum Entropy Principle in Statistical Inference: Case for Non-Shannonian Entropies , author =. Phys. Rev. Lett. , volume =. 2019 , month =. doi:10.1103/PhysRevLett.122.120601 , url =
-
[19]
doi:10.1109/TIT.2009.2030485 , abstract =
Amari, Shun-Ichi , title =. IEEE Trans. Inf. Theor. , month = nov, pages =. 2009 , issue_date =. doi:10.1109/TIT.2009.2030485 , abstract =
-
[20]
Rényi Divergence and Kullback-Leibler Divergence , year=
van Erven, Tim and Harremos, Peter , journal=. Rényi Divergence and Kullback-Leibler Divergence , year=
-
[21]
2017 , publisher=
Information geometry , author=. 2017 , publisher=
2017
-
[22]
and Fritz, Tobias and Leinster, Tom , TITLE =
Baez, John C. and Fritz, Tobias and Leinster, Tom , TITLE =. Entropy , VOLUME =. 2011 , NUMBER =
2011
-
[23]
2014 , eprint=
A Bayesian Characterization of Relative Entropy , author=. 2014 , eprint=
2014
-
[24]
A Categorical Characterization of Relative Entropy on Standard Borel Spaces , journal =
Nicolas Gagné and Prakash Panangaden , keywords =. A Categorical Characterization of Relative Entropy on Standard Borel Spaces , journal =. 2018 , note =. doi:https://doi.org/10.1016/j.entcs.2018.03.020 , url =
-
[25]
American Journal of Physics , author =
Probability,. American Journal of Physics , author =. 1946 , pages =. doi:10.1119/1.1990764 , number =
-
[26]
International Journal of Approximate Reasoning , author =
Constructing a logic of plausible inference: a guide to. International Journal of Approximate Reasoning , author =. 2003 , keywords =. doi:10.1016/S0888-613X(03)00051-3 , abstract =
-
[27]
Information Theory, IEEE Transactions on , author =
General. Information Theory, IEEE Transactions on , author =. 1990 , pages =. doi:10.1109/18.50370 , abstract =
-
[28]
Klir, George J. , editor =. Facets of. Entropy. 2003 , keywords =. doi:10.1007/978-3-540-36212-8_2 , abstract =
-
[29]
Foundations of. Axioms , author =. 2012 , note =. doi:10.3390/axioms1010038 , abstract =
-
[30]
Amari, Shun-ichi , year =. Information. doi:10.1007/978-4-431-55978-8 , language =
-
[31]
Borwein, J. M. and Lewis, A. S. , year =. Duality. doi:10.1137/0329017 , abstract =
-
[32]
Entropic
Caticha, Ariel , year =. Entropic
-
[33]
Csiszar, Imre , month = dec, year =. Why. The Annals of Statistics , publisher =. doi:10.1214/aos/1176348385 , abstract =
-
[34]
Representation of a preference ordering by a numerical function , isbn =
Debreu, Gerard , editor =. Representation of a preference ordering by a numerical function , isbn =. Mathematical. 1983 , pages =. doi:10.1017/CCOL052123736X.007 , urldate =
-
[35]
Cowles Foundation Discussion Papers , author =
Topological. Cowles Foundation Discussion Papers , author =
-
[36]
D'Amicantonio, Giacomo and Bondarev, Egor and With, Peter H. N. De , month = nov, year =. Automated. doi:10.48550/arXiv.2311.02598 , abstract =
-
[37]
Jaynes, E. T. , month = may, year =. Information. Physical Review , publisher =. doi:10.1103/PhysRev.106.620 , abstract =
-
[38]
Jaynes, Proceedings of the IEEE70(9), 939 (1982)
On the rationale of maximum-entropy methods , volume =. Proceedings of the IEEE , author =. 1982 , note =. doi:10.1109/PROC.1982.12425 , abstract =
-
[39]
On Divergences and Informations in Statistics and Information Theory , volume =
Liese, Friedrich and Vajda, Igor , year =. On Divergences and Informations in Statistics and Information Theory , volume =. Information Theory, IEEE Transactions on , doi =
-
[40]
, journal=
Jaynes, Edwin T. , journal=. Prior Probabilities , year=
-
[41]
Kullback and R
S. Kullback and R. A. Leibler , title =. The Annals of Mathematical Statistics , number =. 1951 , doi =
1951
-
[42]
Infinite joins and meets , isbn =
Sikorski, Roman , year =. Infinite joins and meets , isbn =. Boolean Algebras , publisher =. doi:10.1007/978-3-642-85820-8_2 , pages =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.