Information from coincidences
Pith reviewed 2026-06-25 21:57 UTC · model grok-4.3
The pith
One algebraic identity shows the log of a mixed prior count equals a Boltzmann weight, normalizer, maximum-entropy value, and KL-barycenter optimum at once.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that log E_{x∼ν}[∏_{i=1}^W π_i^{α_i}(x)] is simultaneously a Boltzmann coincidence weight, an exponential-family normalizer, a maximum-entropy value, and a KL-barycenter optimum. Specializing the same equality recovers Sanov-type decompositions, Chernoff information, Donsker-Varadhan and PAC-Bayes inequalities, Erdos-Renyi run lengths, rate-distortion thresholds, and birthday problems. The identity generalizes the classical Renyi variational formulas to a W-prior simplex and holds for unnormalized and continuum-indexed priors.
What carries the argument
The mixed count E_{x∼ν}[∏_{i=1}^W π_i^{α_i}(x)] whose logarithm is equated to the four listed variational objects.
If this is right
- Sanov decompositions and Gibbs conditioning follow as direct special cases.
- Chernoff information and its multi-way version give hypothesis-testing error exponents.
- Donsker-Varadhan and PAC-Bayes change-of-measure inequalities are recovered uniformly.
- An exact multi-prior PAC-Bayes penalty subtracts an explicit coincidence bonus from the usual term.
- The asymptotic MAP error exponent for W-ary testing appears as an edge-restricted simplex optimum.
Where Pith is reading between the lines
- The same identity may supply exact finite-sample versions of several bounds that are usually stated only asymptotically.
- Contrastive decoding in language models and sliding-window separation of genomic priors illustrate how the calculus applies at scale to sequential data.
- Other variational problems outside information theory that involve products of measures could admit analogous unifications.
Load-bearing premise
The algebraic identity holds exactly for arbitrary real exponents and for unnormalized or continuum-indexed priors.
What would settle it
A concrete counterexample computation, for chosen priors, exponents, and reference measure, in which the log mixed count fails to equal the claimed KL-barycenter optimum.
Figures
read the original abstract
We prove a single algebraic mixed coincidence identity that unifies a broad swath of information-theoretic variational results. For any family of priors $\{\pi_i\}$ and real exponents $\{ \alpha_i \}$, the log of the mixed count $E_{x\sim\nu}\!\left[\prod_{i=1}^W \pi_i^{\alpha_i}(x)\right]$ is simultaneously a Boltzmann coincidence weight, an exponential-family normalizer, a maximum-entropy value, and a KL-barycenter optimum. The identity yields a unified derivation of classical cornerstones of information theory: concentration of empirical distributions (Sanov-type decompositions and Gibbs conditioning), hypothesis-testing error exponents (Chernoff information and its multi-way analogue), change-of-measure inequalities (Donsker-Varadhan and PAC-Bayes), and laws governing rare-pattern coincidences (Erdos-Renyi run-length, iterative guesswork, rate-distortion, and birthday thresholds). Each is recovered as a specialization of the same algebraic equality. It strictly generalizes the classical Renyi entropy and divergence variational formulas (one and two priors respectively) to a $W$-prior simplex, and holds for unnormalized and continuum-indexed priors. Among its consequences are an exact multi-prior PAC-Bayes penalty that subtracts an explicit "coincidence bonus" from the usual single-prior posterior penalty, and the asymptotic MAP error exponent for $W$-ary hypothesis testing as an edge-restricted simplex optimum. We demonstrate the calculus at scale on two large alphabets encoding richly modeled sequential languages: on language-model next-token predictives where we recover contrastive decoding, and on human genomic regulatory sequence where it separates correlated from diverse prior families along a sliding-window trace.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to prove a single algebraic mixed coincidence identity: for any family of priors {π_i} and real exponents {α_i}, log E_{x∼ν}[∏_{i=1}^W π_i^{α_i}(x)] simultaneously equals a Boltzmann coincidence weight, an exponential-family normalizer, a maximum-entropy value, and a KL-barycenter optimum. This identity is asserted to yield unified derivations of Sanov-type results, Chernoff information, Donsker-Varadhan and PAC-Bayes inequalities, Renyi generalizations to W priors, and several rare-pattern laws, while holding for unnormalized and continuum-indexed priors. Applications include an exact multi-prior PAC-Bayes penalty and demonstrations on language-model next-token prediction and genomic sequences.
Significance. If the central algebraic identity is rigorously established and holds over the claimed domain (arbitrary real α_i, unnormalized priors), the work would provide a notable unifying algebraic framework for many classical variational results in information theory. The explicit generalization of Renyi formulas to a W-prior simplex and the derived multi-prior PAC-Bayes form with a coincidence bonus would be useful contributions; the large-alphabet empirical examples add concrete illustration.
major comments (2)
- [abstract / main identity] Abstract and statement of the main identity: the manuscript asserts that the algebraic identity has been proved and that every listed result follows directly from it, yet supplies no expansion steps, lemmas, or explicit algebraic manipulations showing how log E[∏ π_i^{α_i}(x)] equals the KL-barycenter optimum or the exponential-family normalizer. This absence is load-bearing for the unification claim.
- [abstract] Abstract: the identity is claimed to hold for arbitrary real (including negative) α_i and for unnormalized or continuum-indexed priors, but no verification, domain restrictions, or handling of cases where ∏ π_i^{α_i}(x) becomes undefined or infinite (e.g., when some α_i < 0 and some π_i(x) = 0) is provided. Such cases directly affect whether the claimed interchange with Donsker-Varadhan or max-ent forms remains valid for the Sanov and Chernoff specializations.
minor comments (1)
- [abstract] Notation for the mixed count uses an implicit measure ν without explicit definition in the abstract; a brief clarification of the reference measure would improve readability.
Simulated Author's Rebuttal
We thank the referee for their thoughtful review and for identifying points where the presentation of the central identity can be strengthened. We agree that additional explicit steps and domain clarifications will improve accessibility and rigor. Below we respond point-by-point to the major comments and commit to revisions that address them directly.
read point-by-point responses
-
Referee: [abstract / main identity] Abstract and statement of the main identity: the manuscript asserts that the algebraic identity has been proved and that every listed result follows directly from it, yet supplies no expansion steps, lemmas, or explicit algebraic manipulations showing how log E[∏ π_i^{α_i}(x)] equals the KL-barycenter optimum or the exponential-family normalizer. This absence is load-bearing for the unification claim.
Authors: We accept that the current compact statement of the identity, while algebraically direct from the definition of the mixed expectation, does not include intermediate lemmas or step-by-step expansions. The equivalences to the KL-barycenter optimum and exponential-family normalizer follow from standard convex duality and the definition of the log-moment generating function, but these connections were left implicit. In revision we will insert a new subsection (after the statement of the identity) containing two short lemmas: one deriving the KL-barycenter representation via the variational definition of KL divergence, and one recovering the exponential-family normalizer via the cumulant function. These lemmas will contain the explicit algebraic manipulations requested, making the unification claim self-contained. revision: yes
-
Referee: [abstract] Abstract: the identity is claimed to hold for arbitrary real (including negative) α_i and for unnormalized or continuum-indexed priors, but no verification, domain restrictions, or handling of cases where ∏ π_i^{α_i}(x) becomes undefined or infinite (e.g., when some α_i < 0 and some π_i(x) = 0) is provided. Such cases directly affect whether the claimed interchange with Donsker-Varadhan or max-ent forms remains valid for the Sanov and Chernoff specializations.
Authors: The manuscript asserts the identity for unnormalized and continuum-indexed priors, yet we agree that the domain statement is insufficiently precise. Negative exponents require that the support of each π_i be respected to keep the product finite and positive. In the revision we will add an explicit domain paragraph stating that the identity holds when either (i) all α_i ≥ 0, or (ii) α_i < 0 only for those i where π_i(x) > 0 almost everywhere under ν, with the convention 0^0 := 1 for measure-zero sets. We will also verify that the Sanov and Chernoff specializations remain valid under these restrictions because the classical statements already impose the necessary support conditions on the empirical measures. A short remark will note that the Donsker-Varadhan and max-ent interchanges continue to hold on this restricted domain. revision: yes
Circularity Check
No significant circularity; central claim is a direct algebraic identity
full rationale
The paper states it proves an algebraic mixed coincidence identity by direct means, with the log-expectation of the product of powered priors serving as the unifying object that specializes to known variational results. No load-bearing steps reduce to self-definition, fitted inputs renamed as predictions, or self-citation chains; the derivation is presented as rearrangement and specialization of the identity itself. The paper is self-contained against external benchmarks for the algebraic step, yielding a score of 0.
Axiom & Free-Parameter Ledger
axioms (1)
- ad hoc to paper The mixed coincidence identity holds for arbitrary real exponents and for unnormalized or continuum-indexed priors
Forward citations
Cited by 1 Pith paper
-
All you need is log
The unique family of multi-distribution Rényi functionals is the positive integral of coincidence divergences C_α over the simplex interior, mixed-sign cones, tropical boundary, and KL edges.
Reference graph
Works this paper leans on
-
[1]
Brownian excursions, critical random graphs and the multiplicative coalescent
David Aldous. Brownian excursions, critical random graphs and the multiplicative coalescent. The Annals of Proba- bility, pages 812–854, 1997
1997
-
[2]
A variational characterization of rényi divergences
Venkat Anantharam. A variational characterization of rényi divergences. IEEE Transactions on Information Theory, 64(11):6979–6989, 2018
2018
-
[3]
An inequality on guessing and its application to sequential decoding
Erdal Arikan. An inequality on guessing and its application to sequential decoding. IEEE Transactions on Information Theory, 42(1):99–105, 2002
2002
-
[4]
The multiplicative weights update method: a meta-algorithm and appli- cations
Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and appli- cations. Theory of computing, 8(1):121–164, 2012
2012
-
[5]
Projection theorems for the rényi divergence on α-convex sets
M Ashok Kumar and Igal Sason. Projection theorems for the rényi divergence on α-convex sets. IEEE Transactions on Information Theory, 62(9):4924–4935, 2016
2016
-
[6]
Cramér–rao lower bounds arising from generalized csiszár divergences
M Ashok Kumar and Kumar Vijay Mishra. Cramér–rao lower bounds arising from generalized csiszár divergences. Information Geometry, 3(1):33–59, 2020
2020
-
[7]
Robust bounds on risk-sensitive functionals via rényi divergence
Rami Atar, Kenny Chowdhary, and Paul Dupuis. Robust bounds on risk-sensitive functionals via rényi divergence. SIAM/ASA Journal on Uncertainty Quantification, 3(1):18–33, 2015
2015
-
[8]
A Modification of the Sequential Probability Ratio Test to Reduce the Sample Size
Raghu Raj Bahadur and R. Ranga Rao. On deviations of the sample mean. The Annals of Mathematical Statistics , 31 (4):1015–1027, 1960. doi: 10.1214/aoms/1177705694
-
[9]
Adaptive sampling for efficient softmax approximation
Tavor Z Baharav, Daniel Kang, Colin Sullivan, Mo Tiwari, Eric Luxenberg, David Tse, and Mert Pilanci. Adaptive sampling for efficient softmax approximation. In Advances in Neural Information Processing Systems (NeurIPS), 2024
2024
-
[10]
Sharp finite-sample concentration of independent variables
Akshay Balsubramani. Sharp finite-sample concentration of independent variables. arXiv preprint arXiv:2008.13293, 2020
arXiv 2008
-
[11]
Entropy, concentration, and learning: a statistical mechanics primer
Akshay Balsubramani. Entropy, concentration, and learning: a statistical mechanics primer. arXiv preprint arXiv:2409.18630, 2024
arXiv 2024
-
[12]
Introduction to smooth ergodic theory , volume 231
Luis Barreira and Yakov Pesin. Introduction to smooth ergodic theory , volume 231. American Mathematical Society, 2023
2023
-
[13]
Upper and lower bounds on the renyi dimensions and the uniformity of multifractals
Christian Beck. Upper and lower bounds on the renyi dimensions and the uniformity of multifractals. Physica D: Nonlinear Phenomena, 41(1):67–78, 1990
1990
-
[14]
On a measure of divergence between two statistical populations defined by their probability distribution
Anil Bhattacharyya. On a measure of divergence between two statistical populations defined by their probability distribution. Bulletin of the Calcutta Mathematical Society , 35:99–110, 1943
1943
-
[15]
Pythia: A suite for analyzing large language models across training and scaling
Stella Biderman, Hailey Schoelkopf, Quentin Anthony, Herbie Bradley, Kyle O’Brien, Eric Hallahan, Moham- mad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, et al. Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning (ICML), 2023
2023
-
[16]
Variational representations and neural network estimation of rényi divergences
Jeremiah Birrell, Paul Dupuis, Markos A Katsoulakis, Luc Rey-Bellet, and Jie Wang. Variational representations and neural network estimation of rényi divergences. SIAM Journal on Mathematics of Data Science, 3(4):1093–1116, 2021
2021
-
[17]
Variational inference: A review for statisticians
David M Blei, Alp Kucukelbir, and Jon D McAuliffe. Variational inference: A review for statisticians. Journal of the American statistical Association, 112(518):859–877, 2017
2017
-
[18]
Concentration Inequalities: A Nonasymptotic Theory of Inde- pendence
Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. Concentration Inequalities: A Nonasymptotic Theory of Inde- pendence. Oxford University Press, 2013
2013
-
[19]
On rényi entropies and their applications to guessing attacks in cryptography
Serdar Boztas. On rényi entropies and their applications to guessing attacks in cryptography. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences , 97(12):2542–2548, 2014. 28
2014
-
[20]
Conditional Rényi divergence saddlepoint and the maximization of α-mutual informa- tion
Cong Cai and Sergio Verdú. Conditional Rényi divergence saddlepoint and the maximization of α-mutual informa- tion. Entropy, 21(10):969, 2019. doi: 10.3390/e21100969
-
[21]
A coding theorem and rényi’s entropy
L Lorne Campbell. A coding theorem and rényi’s entropy. Information and control, 8(4):423–429, 1965
1965
-
[22]
Definition of entropy by means of a coding problem
LL Campbell. Definition of entropy by means of a coding problem. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 6(2):113–118, 1966
1966
-
[23]
Eugenio Clerico, Tyler Farghly, George Deligiannidis, Benjamin Guedj, and Arnaud Doucet
Olivier Catoni. PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning , volume 56 of Institute of Mathematical Statistics Lecture Notes–Monograph Series . Institute of Mathematical Statistics, Beachwood, OH, 2007. doi: 10.1214/074921707000000391
-
[24]
Micro-canonical cascades and random homeomorphisms
Xinxin Chen, Yong Han, Yanqi Qiu, and Zipeng Wang. Micro-canonical cascades and random homeomorphisms. arXiv preprint arXiv:2505.16405, 2025
arXiv 2025
-
[25]
A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations.The Annals of Mathematical Statistics, pages 493–507, 1952
Herman Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations.The Annals of Mathematical Statistics, pages 493–507, 1952
1952
-
[26]
Multifractal formalism derived from thermodynamics for general dynamical systems.Electronic Research Announcements, 17:1–11, 2010
Vaughn Climenhaga. Multifractal formalism derived from thermodynamics for general dynamical systems.Electronic Research Announcements, 17:1–11, 2010
2010
-
[27]
Universal randomized guessing subject to distortion
Asaf Cohen and Neri Merhav. Universal randomized guessing subject to distortion. IEEE Transactions on Information Theory, 68(12):7714–7734, 2022
2022
-
[28]
I. Csiszár. I-divergence geometry of probability distributions and minimization problems. Annals of Probability, 3 (1):146–158, 1975
1975
-
[29]
I. Csiszár. Sanov property, generalized I-projection and a conditional limit theorem. Annals of Probability , 12(3): 768–793, 1984
1984
-
[30]
Csiszár and F
I. Csiszár and F. Matus. Information projections revisited. IEEE Transactions on Information Theory, 49(6):1474–1490, June 2003. ISSN 0018-9448
2003
-
[31]
Generalized cutoff rates and rényi’s information measures
Imre Csiszár. Generalized cutoff rates and rényi’s information measures. IEEE Transactions on information theory , 41(1):26–34, 2002
2002
-
[32]
Information theory and statistics: A tutorial
Imre Csiszár, Paul C Shields, et al. Information theory and statistics: A tutorial. Foundations and Trends® in Commu- nications and Information Theory, 1(4):417–528, 2004
2004
-
[33]
H. E. Daniels. Saddlepoint approximations in statistics. The Annals of Mathematical Statistics , 25(4):631–650, 1954. doi: 10.1214/aoms/1177728652
-
[34]
Large deviations techniques and applications
Amir Dembo and Ofer Zeitouni. Large deviations techniques and applications. Stochastic Modelling and Applied Probability, 2010
2010
-
[35]
Large deviations for a general class of random vectors
Richard S Ellis. Large deviations for a general class of random vectors. The Annals of Probability, 12(1):1–12, 1984
1984
-
[36]
On a new law of large numbers
Paul Erdös and Alfred Rényi. On a new law of large numbers. Journal d’Analyse Mathématique, 23(1):103–111, 1970
1970
-
[37]
Detecting hallucinations in large language models using semantic entropy
Sebastian Farquhar, Jannik Kossen, Lorenz Kuhn, and Yarin Gal. Detecting hallucinations in large language models using semantic entropy. Nature, 630:625–630, 2024
2024
-
[38]
Adaptive game playing using multiplicative weights
Yoav Freund and Robert E Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29(1-2):79–103, 1999
1999
-
[39]
Xi Fu, Shentong Mo, Alejandro Buendia, Anouchka P. Laurent, Anqi Shao, Maria Del Mar Alvarez-T orres, Tianji Yu, Jimin Tan, Jiayu Su, Romella Sagatelian, Adolfo A. Ferrando, Alberto Ciccia, Yanyan Lan, David M. Owens, T eresa Palomero, Eric P. Xing, and Raul Rabadan. A foundation model of transcription across human cell types. Nature, 637(8047):965–973, 2...
-
[40]
On large deviations from the invariant measure
Jürgen Gärtner. On large deviations from the invariant measure. Theory of Probability & Its Applications, 22(1):24–39, 1977
1977
-
[41]
A characterization theorem for externally bayesian groups
Christian Genest. A characterization theorem for externally bayesian groups. The Annals of Statistics , 12(3):1100– 1105, 1984
1984
-
[42]
Combining probability distributions: A critique and an annotated bibliography
Christian Genest and James V Zidek. Combining probability distributions: A critique and an annotated bibliography. Statistical Science, 1(1):114–135, 1986
1986
-
[43]
continuity
Robert B Griffiths and David Ruelle. Strict convexity (“continuity”) of the pressure in lattice systems. Communications in Mathematical Physics, 23(3):169–175, 1971
1971
-
[44]
The minimum description length principle
Peter D Grünwald. The minimum description length principle. MIT press, 2007
2007
-
[45]
Game theory, maximum entropy, minimum discrepancy and robust bayesian decision theory
Peter D Grünwald and A Philip Dawid. Game theory, maximum entropy, minimum discrepancy and robust bayesian decision theory. The Annals of Statistics, 32(4):1367–1433, 2004
2004
-
[46]
Regularized rényi divergence minimization through breg- man proximal gradient algorithms
Thomas Guilmeau, Emilie Chouzenoux, and Víctor Elvira. Regularized rényi divergence minimization through breg- man proximal gradient algorithms. Journal of Machine Learning Research, 26(157):1–56, 2025
2025
-
[47]
Training products of experts by minimizing contrastive divergence
Geoffrey E Hinton. Training products of experts by minimizing contrastive divergence. Neural computation, 14(8): 1771–1800, 2002
2002
-
[48]
spaCy: Industrial-strength natural language processing in python, 2020
Matthew Honnibal, Ines Montani, Sofie Van Landeghem, and Adriane Boyd. spaCy: Industrial-strength natural language processing in python, 2020. https://spacy.io
2020
-
[49]
Justification of logarithmic loss via the benefit of side information
Jiantao Jiao, Thomas A Courtade, Kartik Venkat, and Tsachy Weissman. Justification of logarithmic loss via the benefit of side information. IEEE Transactions on Information Theory, 61(10):5357–5365, 2015
2015
-
[50]
Axiomatic characterization of the directed divergences and their linear combinations
R Johnson. Axiomatic characterization of the directed divergences and their linear combinations. IEEE Transactions on Information Theory, 25(6):709–716, 1979
1979
-
[51]
Positive martingales and random measures
Jean-Pierre Kahane. Positive martingales and random measures. Chinese Annals of Mathematics Series B , 8(1):1–12, 1987
1987
-
[52]
Auto-encoding variational bayes
Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013
Pith/arXiv arXiv 2013
-
[53]
Fusion of probability density functions
Günther Koliander, Yousef El-Laham, Petar M Djurić, and Franz Hlawatsch. Fusion of probability density functions. Proceedings of the IEEE, 110(4):404–453, 2022
2022
-
[54]
An axiomatic theory of fairness in network resource allocation
Tian Lan, David Kao, Mung Chiang, and Ashutosh Sabharwal. An axiomatic theory of fairness in network resource allocation. In Proceedings of the 29th conference on Information communications , pages 1343–1351, 2010
2010
-
[55]
Mixture of experts meets prompt-based continual learning.Advances in Neural Information Processing Systems, 37:119025–119062, 2024
Minh Le, An Nguyen, Huy Nguyen, Trang Nguyen, Trang Pham, Linh Van Ngo, and Nhat Ho. Mixture of experts meets prompt-based continual learning.Advances in Neural Information Processing Systems, 37:119025–119062, 2024
2024
-
[56]
Contrastive decoding: Open-ended text generation as optimization
Xiang Lisa Li, Ari Holtzman, Daniel Fried, Percy Liang, Jason Eisner, Tatsunori Hashimoto, Mike Lewis, and Luke Zettlemoyer. Contrastive decoding: Open-ended text generation as optimization. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL) , 2023. arXiv:2210.15097
arXiv 2023
-
[57]
On divergences and informations in statistics and information theory
Friedrich Liese and Igor Vajda. On divergences and informations in statistics and information theory. IEEE Trans- actions on Information Theory, 52(10):4394–4412, 2006
2006
-
[58]
Divergence measures based on the shannon entropy
Jianhua Lin. Divergence measures based on the shannon entropy. IEEE Transactions on Information theory , 37(1): 145–151, 1991
1991
-
[59]
Saddle point approximation for the distribution of the sum of independent random variables
Robert Lugannani and Stephen Rice. Saddle point approximation for the distribution of the sum of independent random variables. Advances in Applied Probability, 12(2):475–490, 1980. doi: 10.2307/1426607
-
[60]
Algorithmic fractal dimensions in geometric measure theory
Jack H Lutz and Elvira Mayordomo. Algorithmic fractal dimensions in geometric measure theory. In Handbook of Computability and Complexity in Analysis , pages 271–302. Springer, 2021. 30
2021
-
[61]
Gabriella E. Martyn, Michael T. Montgomery, Hank Jones, Katherine Guo, Benjamin R. Doughty, Johannes Lin- der, Deepa Bisht, Fan Xia, Xiangmeng S. Cai, Ziwei Chen, Kelly Cochran, Kathryn A. Lawrence, Glen Munson, Anusri Pampari, Charles P. Fulco, Nidhi Sahni, David R. Kelley, Eric S. Lander, Anshul Kundaje, and Jesse M. En- greitz. Rewriting regulatory DNA...
-
[62]
On the notion of affinity of several distributions and some of its applications
Kameo Matusita. On the notion of affinity of several distributions and some of its applications. Annals of the Institute of Statistical Mathematics, 19(1):181–192, 1967
1967
-
[63]
Some pac-bayesian theorems
David A McAllester. Some pac-bayesian theorems. In Proceedings of the eleventh annual conference on Computational learning theory, pages 230–234, 1998
1998
-
[64]
Generalized q-dimensions of measures on non-autonomous conformal sets
Jun Jie Miao and Tianrui Wang. Generalized q-dimensions of measures on non-autonomous conformal sets. arXiv preprint arXiv:2512.19771, 2025
arXiv 2025
-
[65]
Fair end-to-end window-based congestion control
Jeonghoon Mo and Jean Walrand. Fair end-to-end window-based congestion control. IEEE/ACM Transactions on networking, 8(5):556–567, 2002
2002
-
[66]
From blackwell dominance in large samples to rényi divergences and back again
Xiaosheng Mu, Luciano Pomatto, Philipp Strack, and Omer Tamuz. From blackwell dominance in large samples to rényi divergences and back again. Econometrica, 89(1):475–506, 2021
2021
-
[67]
An information-geometric characterization of chernoff information
Frank Nielsen. An information-geometric characterization of chernoff information. IEEE Signal Processing Letters , 20(3):269–272, 2013
2013
-
[68]
Kernel language entropy: Fine-grained uncer- tainty quantification for LLMs
Alexander Nikitin, Jannik Kossen, Yarin Gal, and Pekka Marttinen. Kernel language entropy: Fine-grained uncer- tainty quantification for LLMs. In Advances in Neural Information Processing Systems (NeurIPS), 2024
2024
-
[69]
Perspective on physical interpretations of rényi entropy in statistical mechanics
Misaki Ozawa and Nina Javerzat. Perspective on physical interpretations of rényi entropy in statistical mechanics. Europhysics Letters, 147(1):11001, 2024
2024
-
[70]
Characteristic lyapunov exponents and smooth ergodic theory
Ya B Pesin. Characteristic lyapunov exponents and smooth ergodic theory. Russian Mathematical Surveys, 32(4):55, 1977
1977
-
[71]
Repulsive mixtures
Francesca Petralia, Vinayak Rao, and David Dunson. Repulsive mixtures. Advances in neural information processing systems, 25, 2012
2012
-
[72]
Information theory: From coding to learning
Yury Polyanskiy and Yihong Wu. Information theory: From coding to learning . Cambridge university press, 2025
2025
-
[73]
Bruce D Popp. Poincar n’e on gibbs and on probability in statistical mechanics.arXiv preprint arXiv:2505.12168, 2025
arXiv 2025
-
[74]
Language models are unsu- pervised multitask learners
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsu- pervised multitask learners. OpenAI technical report, 2019
2019
-
[75]
Direct preference optimization: Your language model is secretly a reward model
Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. Advances in neural information processing systems, 36:53728–53741, 2023
2023
-
[76]
On the dimension and entropy of probability distributions
Alfréd Rényi. On the dimension and entropy of probability distributions. Acta Mathematica Academiae Scientiarum Hungarica, 10(1):193–215, 1959
1959
-
[77]
Dimension, entropy and information
Alfréd Rényi. Dimension, entropy and information. In Trans. 2nd Prague Conf. Information Theory, pages 545–556, 1960
1960
-
[78]
On measures of entropy and information
Alfréd Rényi. On measures of entropy and information. In Proceedings of the fourth Berkeley symposium on mathemat- ical statistics and probability, volume 1: contributions to the theory of statistics , volume 4, pages 547–562. University of California Press, 1961
1961
-
[79]
On the foundations of information theory
Alfréd Rényi. On the foundations of information theory. Revue de l’Institut International de Statistique , pages 1–14, 1965. 31
1965
-
[80]
On the probability of large deviations of random magnitudes
Ivan Nikolaevich Sanov. On the probability of large deviations of random magnitudes. Matematicheskii Sbornik, 84 (1):11–44, 1957. Translation at https://repository.lib.ncsu.edu/items/8f909775-ba1b-4874-acc2-362a8221edb0
1957
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.