pith. sign in

arxiv: 2606.27349 · v1 · pith:KLREFKTOnew · submitted 2026-06-25 · 💻 cs.IT · math.IT· math.PR· math.ST· stat.ML· stat.TH

All you need is log

Pith reviewed 2026-06-26 01:59 UTC · model grok-4.3

classification 💻 cs.IT math.ITmath.PRmath.STstat.MLstat.TH
keywords multi-distribution divergencesRenyi divergencesdata processing monotonicityproduct additivitycoincidence divergencesmulti-hypothesis testingmulti-population fairnessPAC-Bayes bounds
0
0 comments X

The pith

Every functional of W-tuples of distributions that is monotone under data processing and additive on independent products equals a positive integral of multi-way coincidence divergences over four strata.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that the two axioms of data-processing monotonicity and product additivity together force any such functional on multiple distributions to take the specific integral form given by the coincidence divergences. This matters because many statistical and learning tasks require comparing more than two distributions simultaneously, yet no canonical family was previously known. The characterization recovers the ordinary Rényi divergences when only two distributions are involved and supplies explicit necessary strata that cannot be reproduced by the others. Five separate derivations all lead to the same family, and a conditional extension is supplied.

Core claim

Every functional of W-tuples of distributions that is monotone under data processing and additive on independent products is a positive integral of multi-way coincidence divergences C_α(π1,…,πW) := −log∫π1^α1⋯πW^αW (with ∑αk=1) over a parameter space with four strata: the simplex interior; mixed-sign exponent cones; a tropical boundary at infinity carrying max-divergences; and pairwise Kullback-Leibler edges at the simplex vertices. Each stratum is necessary, shown by an explicit counter-example that the remaining strata cannot reproduce, and each stratum arises as a clean limit of simplex-interior atoms.

What carries the argument

The multi-way coincidence divergence C_α(π1,…,πW) = −log∫ π1^α1 ⋯ πW^αW (∑αk=1), integrated positively over the four-stratum parameter space.

If this is right

  • The two-distribution case recovers the classical Rényi family exactly.
  • The same family is obtained from Kolmogorov-Nagumo means, classical entropy axioms, multi-hypothesis testing error exponents, and a multi-lottery betting interpretation.
  • A worked three-distribution case, numerical checks, and a conditional extension are supplied.
  • Each of the four strata is required and cannot be omitted without losing some monotone additive functional.
  • The strata arise as limits of the interior simplex atoms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The betting interpretation suggests direct use in sequential multi-agent decision problems where agents place simultaneous bets on several hypotheses.
  • The characterization may supply new multi-prior generalization bounds in PAC-Bayes settings that were previously limited to pairwise comparisons.
  • The explicit counter-examples for each stratum offer concrete test cases for checking whether a candidate multi-distribution functional satisfies the axioms.
  • The tropical boundary suggests connections to max-entropy methods or robust optimization that operate at infinite orders.

Load-bearing premise

The two properties of monotonicity under data processing and additivity on independent products are jointly sufficient to characterize the entire family, with each of the four strata required by an explicit example the others cannot match.

What would settle it

A concrete functional on three or more distributions that satisfies data-processing monotonicity and product additivity yet lies outside every positive integral of the four-stratum coincidence divergences.

Figures

Figures reproduced from arXiv: 2606.27349 by Akshay Balsubramani.

Figure 1
Figure 1. Figure 1: Sym-orbit average vs closed-form symmetric-form representation (Section 7): the relative residual sits near machine precision (10−16) across all 1,200 trials, every one passing the 10−10 pre-registered threshold (dashed). G.2 Convergence-rate slopes for the KL and tropical limit identities The vertex-derivative identity (9) predicts |C(1−ϵ)ek+ϵel (π)/ϵ − D1(πkkπl)| = O(ϵ) as ϵ ↓ 0, i.e. slope +1 on a log-l… view at source ↗
Figure 2
Figure 2. Figure 2: Fitted log-log slope versus theoretical slope per(W, X) cell. KL stratum (blue) clusters tightly around +1 (theo￾retical rate); tropical stratum (vermillion) clusters around −0.97, just above the theoretical −1 due to the sub-exponential log(t)/t correction from the Laplace prefactor. above −1 (median fitted slope −0.95 to −0.98) and are consistent with the predicted slope once the prefactor correction is … view at source ↗
Figure 3
Figure 3. Figure 3: Per-(W, X) cell agreement rate between the spectrum inequality spec and the construction-flag cat (whether π ′ was drawn as Kπ). The dashed line marks the pre-registered 95% threshold; the pooled agreement rate is 95.6%, with no false negatives (zero “Kπ violates the inequality”) and a small number of false positives (random π ′ accidentally passing the sampled grid). G.4 Choquet linearity holds on every… view at source ↗
Figure 4
Figure 4. Figure 4: Choquet-linearity sweep across six cells of the atom-family cone. Per-cell pass rates for joint DPI, additivity, and ground state. All three axioms pass in every cell; the pre-registered 99% threshold (dashed) is exceeded uniformly. The structural reading: cone-additivity is genuinely cell-uniform, not C-specific. r ⋆ = p ⋆ α⋆ ∝ Q k π α ⋆ k k at the LHS argmax (the saddle-point form of Sion’s theorem), the… view at source ↗
Figure 5
Figure 5. Figure 5: Sion minimax identity residual vs grid resolution δ at W = 3. Median and maximum relative residual across 50 trials (10 per X ∈ {4, 5, 6, 8, 10}). A decade in δ yields roughly a decade in residual; V6’s coarser δ ≈ 1/30 residual of 2.4% is consistent with the extrapolation of this curve to δ = 3·10−2 . adult__additivity adult__ground_state adult__joint_dpi bank__additivity bank__ground_state bank__joint_dp… view at source ↗
Figure 6
Figure 6. Figure 6: Real-data axiom-stress on natural class-conditional distributions (UCI Adult, UCI Bank, MNIST, CIFAR-10, ImageNet-1K). Per-(dataset, axiom) Wilson 95% confidence interval on the passage rate at relative tolerance 10−6 . The dashed line marks the 99% pre-registered threshold. All fifteen cells achieve passage rate 1.0 with Wilson lower bound at least 0.99. 45 [PITH_FULL_IMAGE:figures/full_fig_p045_6.png] view at source ↗
read the original abstract

Comparing two probability distributions is a basic building block of statistics and machine learning, and the right family is well understood: the R\'enyi divergences of order $\alpha\in[0,\infty]$ are the unique family monotone under data processing and additive on independent products. Many problems instead compare more than two distributions at once -- multi-population fairness, multi-prior PAC-Bayes bounds, multi-hypothesis testing -- and the right multi-distribution generalization of the R\'enyi family has been an open question. We characterize it. Every functional of $W$-tuples of distributions that is monotone under data processing and additive on independent products is a positive integral of multi-way coincidence divergences $C_{\alpha}(\pi_1,\dots,\pi_W) := -\log\int \pi_1^{\alpha_1}\cdots\pi_W^{\alpha_W}$ (with $\sum_k \alpha_k = 1$) over a parameter space with four strata: the simplex interior; mixed-sign exponent cones (the analogue of R\'enyi orders $>1$); a tropical boundary at infinity carrying max-divergences; and pairwise Kullback-Leibler edges at the simplex vertices. Each stratum is necessary -- the destination of an explicit data-processing-monotone, product-additive divergence the others cannot reproduce -- and each is a clean limit of simplex-interior atoms. The same family arises from five independent routes -- the structural axioms, Kolmogorov-Nagumo means with R\'enyi's entropy axiomatics, classical entropy characterizations, multi-hypothesis testing error exponents, and a multi-lottery betting interpretation -- structural evidence that this is the canonical multi-distribution R\'enyi calculus rather than an artefact of any one axiomatic input. The two-prior case recovers the standard R\'enyi result; a worked $W=3$ instance, numerical verification, and a conditional extension round out the treatment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 1 minor

Summary. The paper claims that every functional of W-tuples of distributions that is monotone under data processing and additive on independent products must be a positive integral of the multi-way coincidence divergences C_α(π1,…,πW) := −log∫π1^α1⋯πW^αW (∑αk=1) over a four-stratum parameter space (simplex interior, mixed-sign exponent cones, tropical boundary at infinity, and pairwise KL edges at vertices). It supplies five independent derivations (structural axioms, Kolmogorov-Nagumo, classical entropy, hypothesis-testing exponents, betting) plus explicit counter-examples establishing necessity of each stratum, with the W=2 case recovering the classical Rényi family exactly.

Significance. If the characterization holds, the result is significant: it supplies the canonical multi-distribution extension of the Rényi family, resolving an open question with applications to multi-population fairness, multi-prior PAC-Bayes, and multi-hypothesis testing. The five independent routes, explicit necessity counter-examples, exact W=2 recovery, and numerical/W=3 verification constitute strong structural evidence rather than an artefact of one axiomatization.

minor comments (1)
  1. The abstract states that each stratum is 'a clean limit of simplex-interior atoms,' but the precise limiting argument (e.g., which sequence of α vectors) is not referenced to a numbered equation or proposition in the provided summary; adding an explicit pointer would improve readability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment, recognition of the result's significance, and recommendation to accept. The five independent derivations, necessity counter-examples, and exact recovery of the classical Rényi case for W=2 are indeed the core of the contribution.

Circularity Check

0 steps flagged

No significant circularity: characterization from external axioms

full rationale

The central result is a characterization theorem: any functional of W-tuples satisfying the two external structural axioms (data-processing monotonicity and product additivity) must be a positive integral of the C_α over the four strata. The manuscript derives this via five independent routes (axiomatic, Kolmogorov-Nagumo, classical entropy, hypothesis-testing, betting) and supplies explicit counter-examples showing each stratum is required. The W=2 case recovers the known Rényi family from prior literature without fitting or self-referential reduction. No step reduces by construction to its inputs, no load-bearing self-citation chain appears, and the derivation remains self-contained against the stated axioms.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The claim rests on the two domain axioms of data-processing monotonicity and product additivity; no free parameters or invented entities are introduced.

axioms (2)
  • domain assumption Monotonicity under data processing
    Functional does not increase when the tuple of distributions is passed through any channel.
  • domain assumption Additivity on independent products
    Functional of the product measure equals the sum of the functionals on each factor.

pith-pipeline@v0.9.1-grok · 5884 in / 1177 out tokens · 21327 ms · 2026-06-26T01:59:21.474896+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

49 extracted references · 25 canonical work pages · 1 internal anchor

  1. [1]

    On Measures of Information and their Characterizations, volume 115 of Mathematics in Science and Engineering

    János Aczél and Zoltán Daróczy. On Measures of Information and their Characterizations, volume 115 of Mathematics in Science and Engineering. Academic Press, New York, 1975. 48

  2. [2]

    Functional Equations in Several Variables, volume 31 of Encyclopedia of Mathematics and its Applications

    János Aczél and Jean Dhombres. Functional Equations in Several Variables, volume 31 of Encyclopedia of Mathematics and its Applications. Cambridge University Press, 1989

  3. [3]

    Why the Shannon and Hartley entropies are ‘natural’.Advances in Applied Probability, 6(1):131–146, 1974

    János Aczél, Bruno Forte, and Che Tat Ng. Why the Shannon and Hartley entropies are ‘natural’.Advances in Applied Probability, 6(1):131–146, 1974. doi: 10.2307/1426210

  4. [4]

    A resource theory of gambling

    Maite Arcos, Renato Renner, and Jonathan Oppenheim. A resource theory of gambling. arXiv preprint arXiv:2510.08418, 2025

  5. [5]

    Koenraad M. R. Audenaert and Milán Mosonyi. Upper bounds on the error probabilities and asymptotic error exponents in quantum multiple state discrimination. Journal of Mathematical Physics , 55(10):102201, 2014. doi: 10.1063/1.4898559

  6. [6]

    Information from coincidences: a mixed partition-function calculus for multiscale typicality

    Akshay Balsubramani. Information from coincidences: a mixed partition-function calculus for multiscale typicality

  7. [7]

    url-verified 2026-06-25: https://arxiv.org/abs/2606.25042

    URL https://arxiv.org/abs/2606.25042. url-verified 2026-06-25: https://arxiv.org/abs/2606.25042

  8. [8]

    Expected information as expected utility

    José M Bernardo. Expected information as expected utility. The Annals of Statistics, pages 686–690, 1979

  9. [9]

    Conditional Rényi divergences and horse betting

    Cyril Bleuler, Amos Lapidoth, and Christoph Pfister. Conditional Rényi divergences and horse betting. Entropy, 22 (3):316, 2020. doi: 10.3390/e22030316

  10. [10]

    findings-emnlp.765/

    Gergely Bunth and Péter Vrana. Equivariant relative submajorization. arXiv preprint, 2021. doi: 10.48550/arXiv. 2108.13217

  11. [11]

    Quantum relative Lorenz curves and resource theories

    Francesco Buscemi and Gilad Gour. Quantum relative Lorenz curves and resource theories. Journal of Mathematical Physics, 65(1):012203, 2024. Earlier preprint: arXiv:1607.05735 (2016)

  12. [12]

    [Dob06] Ernst-Erich Doberkat

    Kenta Cho and Bart Jacobs. Disintegration and Bayesian inversion via string diagrams. In Mathematical Structures in Computer Science, volume 29, pages 938–971, 2019. doi: 10.1017/S0960129518000488

  13. [13]

    Axiomatic characterizations of information measures

    Imre Csiszár. Axiomatic characterizations of information measures. Entropy, 10(3):261–273, 2008. doi: 10.3390/ e10030261

  14. [14]

    Ducuara, Erkka Haapasalo, and Ryo Takakura

    Andrés F. Ducuara, Erkka Haapasalo, and Ryo Takakura. Multivariate Rényi divergences characterise betting games with multiple lotteries. arXiv preprint, 2026. doi: 10.48550/arXiv.2601.17850. Report number YITP-25-40

  15. [15]

    On the concept of entropy of a finite probabilistic scheme

    Dmitrii Konstantinovich Faddeev. On the concept of entropy of a finite probabilistic scheme. Uspekhi Matematich- eskikh Nauk, 11(1):227–231, 1956

  16. [16]

    Matrix majorization in large samples

    Muhammad Usman Farooq, T obias Fritz, Erkka Haapasalo, and Marco T omamichel. Matrix majorization in large samples. IEEE Transactions on Information Theory, 70(11):3118–3144, 2024. doi: 10.1109/TIT.2024.3437073

  17. [17]

    A synthetic approach to markov kernels, conditional independence and theorems on sufficient statistics

    T obias Fritz. A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics. Advances in Mathematics, 370:107239, 2020. doi: 10.1016/j.aim.2020.107239

  18. [18]

    A generalization of strict comparison for resource convertibility, with an application to second laws of thermodynamics

    T obias Fritz. A generalization of strict comparison for resource convertibility, with an application to second laws of thermodynamics. Letters in Mathematical Physics, 113(5):99, 2023. doi: 10.1007/s11005-023-01722-7

  19. [19]

    Sufficiency of Rényi divergences

    Frederik Galke, Lauritz van Luijk, and Henrik Wilming. Sufficiency of Rényi divergences. arXiv preprint, 2024

  20. [20]

    Strictly proper scoring rules, prediction, and estimation

    Tilmann Gneiting and Adrian E Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association, 102(477):359–378, 2007

  21. [21]

    Entropy and relative entropy from information-theoretic principles

    Gilad Gour and Marco T omamichel. Entropy and relative entropy from information-theoretic principles. IEEE Transactions on Information Theory, 67(10):6313–6327, 2021. doi: 10.1109/TIT.2021.3078337

  22. [22]

    Barycentric decompositions for extensive monotone divergences

    Erkka Haapasalo. Barycentric decompositions for extensive monotone divergences. arXiv preprint, 2025. doi: 10. 48550/arXiv.2509.18725

  23. [23]

    An invitation to quantum incompatibility

    T eiko Heinosaari, Takayuki Miyadera, and Mikko Tukiainen. An invitation to quantum incompatibility. Journal of Physics A: Mathematical and Theoretical , 49(12):123001, 2016. doi: 10.1088/1751-8113/49/12/123001. Survey; updated version available 2022. 49

  24. [24]

    A new theorem of information theory

    Arthur Hobson. A new theorem of information theory. Journal of Statistical Physics , 1(3):383–391, 1969. doi: 10. 1007/BF01106578

  25. [25]

    Frederik B. Jensen. Asymptotic operational interpretations of generalized Rényi divergences. arXiv preprint, 2019

  26. [26]

    Johnson and John E

    Rodney W. Johnson and John E. Shore. Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy. IEEE Transactions on Information Theory , 26(1):26–37, 1980. doi: 10.1109/TIT.1980. 1056144

  27. [27]

    Mathematical Foundations of Information Theory

    Aleksandr Iakovlevich Khinchin. Mathematical Foundations of Information Theory. Dover, New York, 1957

  28. [28]

    A. N. Kolmogorov. Sur la notion de la moyenne. Atti della Reale Accademia Nazionale dei Lincei , 12:388–391, 1930

  29. [29]

    Ashok Kumar and Rajesh Sundaresan

    M. Ashok Kumar and Rajesh Sundaresan. Minimization problems based on relativeα-entropy I: forward projection. IEEE Transactions on Information Theory, 62(9):5063–5080, 2016. doi: 10.1109/TIT.2016.2590465

  30. [30]

    Asymptotic Methods in Statistical Decision Theory

    Lucien Le Cam. Asymptotic Methods in Statistical Decision Theory . Springer Series in Statistics. Springer, 1986. doi: 10.1007/978-1-4612-4946-7

  31. [31]

    Leang and Don H

    Chuong B. Leang and Don H. Johnson. On the asymptotics of M-hypothesis Bayesian detection. IEEE Transactions on Information Theory, 43(1):280–282, 1997. doi: 10.1109/18.567705

  32. [32]

    Classification based on distance in multivariate Gaussian cases

    Kameo Matusita. Classification based on distance in multivariate Gaussian cases. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability , 1:299–304, 1967

  33. [33]

    Measures of the value of information.Proceedings of the National Academy of Sciences, 42(9):654–655, 1956

    John McCarthy. Measures of the value of information.Proceedings of the National Academy of Sciences, 42(9):654–655, 1956

  34. [34]

    Geometric relative entropies and barycentric Rényi divergences

    Milán Mosonyi, Gergely Bunth, and Péter Vrana. Geometric relative entropies and barycentric Rényi divergences. Linear Algebra and its Applications , 699:159–276, 2024. doi: 10.1016/j.laa.2024.06.005

  35. [35]

    From Blackwell dominance in large samples to Rényi divergences and back again

    Xiaosheng Mu, Luciano Pomatto, Philipp Strack, and Omer Tamuz. From Blackwell dominance in large samples to Rényi divergences and back again. Econometrica, 89(1):475–506, 2021. doi: 10.3982/ECTA17548

  36. [36]

    Monotone additive statistics

    Xiaosheng Mu, Luciano Pomatto, Philipp Strack, and Omer Tamuz. Monotone additive statistics. Econometrica, 92 (4):995–1031, 2024. doi: 10.3982/ECTA19967. url-verified 2026-06-06: https://arxiv.org/abs/2102.00618

  37. [37]

    Über eine Klasse der Mittelwerte

    Mitio Nagumo. Über eine Klasse der Mittelwerte. Japanese Journal of Mathematics, 7:71–79, 1930

  38. [38]

    The Chernoff lower bound for symmetric quantum hypothesis testing.Annals of Statistics, 37(2):1040–1057, 2009

    Michael Nussbaum and Arleta Szkoła. The Chernoff lower bound for symmetric quantum hypothesis testing.Annals of Statistics, 37(2):1040–1057, 2009. doi: 10.1214/08-AOS593

  39. [39]

    The cost of information: the case of constant marginal costs

    Luciano Pomatto, Philipp Strack, and Omer Tamuz. The cost of information: the case of constant marginal costs. American Economic Review, 113(5):1360–1393, 2023. doi: 10.1257/aer.20211094

  40. [40]

    On measures of entropy and information

    Alfréd Rényi. On measures of entropy and information. In Proceedings of the fourth Berkeley symposium on mathemat- ical statistics and probability, volume 1: contributions to the theory of statistics , volume 4, pages 547–562. University of California Press, 1961

  41. [41]

    A complete characterisation of conditional entropies

    Roberto Rubboli, Erkka Haapasalo, and Marco T omamichel. A complete characterisation of conditional entropies. arXiv preprint, 2026. doi: 10.48550/arXiv.2601.23213

  42. [42]

    N. P. Salikhov. Asymptotic properties of the rate of mistakes in the problem of distinguishing between several statis- tical hypotheses. Trudy Mat. Inst. Steklov., 124:117–146, 1973. In Russian; English summary in Theory of Probability and its Applications

  43. [43]

    Admissible probability measurement procedures

    Emir H Shuford Jr, Arthur Albert, and H Edward Massengill. Admissible probability measurement procedures. Psy- chometrika, 31(2):125–145, 1966

  44. [44]

    Information radius

    Robin Sibson. Information radius. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 14:149–160, 1969. doi: 10.1007/BF00537520. url-verified 2026-05-26: https://link.springer.com/article/10.1007/BF00537520. 50

  45. [45]

    Cambridge University Press, Cambridge, 1991

    Erik T orgersen.Comparison of Statistical Experiments, volume 36 of Encyclopedia of Mathematics and its Applications . Cambridge University Press, Cambridge, 1991

  46. [46]

    T oussaint

    Godfried T. T oussaint. Some properties of Matusita’s measure of affinity of several distributions. Annals of the Institute of Statistical Mathematics, 26(1):389–394, 1974. doi: 10.1007/BF02479845

  47. [47]

    Rényi divergence and Kullback–Leibler divergence

    Tim van Erven and Peter Harremoës. Rényi divergence and Kullback–Leibler divergence. IEEE Transactions on Information Theory, 60(7):3797–3820, 2014. doi: 10.1109/TIT.2014.2320500

  48. [48]

    Matrix majorization in large samples with varying support restrictions

    Frits Verhagen, Marco T omamichel, and Erkka Haapasalo. Matrix majorization in large samples with varying support restrictions. IEEE Transactions on Information Theory, 71(9):6517–6545, 2025. doi: 10.1109/TIT.2025.3585062

  49. [49]

    Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabás Póczos, Ruslan Salakhutdinov, and Alexander J. Smola. Deep sets. In Advances in Neural Information Processing Systems 30 (NIPS 2017) , pages 3391–3401, 2017. 51