Change of measure through the Legendre transform
Pith reviewed 2026-05-24 12:22 UTC · model grok-4.3
The pith
f-divergence change-of-measure inequalities derived from the Legendre transform extend PAC-Bayes bounds to new empirical risk conditions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We study change-of-measure inequalities based on f-divergences, obtained by combining the Legendre transform of f with the Fenchel-Young inequality. Beyond their intrinsic interest in probability theory, we show how these inequalities are helpful in learning theory and yield PAC-Bayes bounds under tailored assumptions on the empirical risk, thereby extending the range of conditions under which PAC-Bayesian guarantees can be established.
What carries the argument
f-divergence change-of-measure inequality derived from the Legendre transform of f and the Fenchel-Young inequality
If this is right
- PAC-Bayes bounds become available when the empirical risk has finite moments of order p > 1 for suitable f.
- The method covers cases where the loss can take large values with small probability without requiring exponential integrability.
- Different f choices produce bounds adapted to sub-Gaussian, sub-exponential, or other tail behaviors.
- Generalization results apply to a larger family of posterior distributions in statistical learning.
Where Pith is reading between the lines
- The same technique could generate concentration inequalities for other functionals in probability beyond PAC-Bayes.
- It might simplify proofs in robust statistics where moment conditions are natural.
- Numerical verification on synthetic data with heavy-tailed losses would test whether the new bounds are non-vacuous.
Load-bearing premise
The derived f-divergence inequalities must preserve the concentration properties without imposing extra conditions that would make the PAC-Bayes application invalid.
What would settle it
Finding a loss random variable with finite p-moment but infinite exponential moment, and checking whether the PAC-Bayes bound derived from the corresponding f still provides a non-trivial guarantee on the generalization gap.
read the original abstract
PAC-Bayes generalisation bounds are derived via change-of-measure inequalities that transfer concentration properties from a reference measure to all posterior measures. The specific choice of change of measure determines the assumptions required on the empirical risk; in particular, the classical Donsker--Varadhan theorem leads to bounds relying on bounded exponential moments. We study change-of-measure inequalities based on \(f\)-divergences, obtained by combining the Legendre transform of \(f\) with the Fenchel--Young inequality. Beyond their intrinsic interest in probability theory, we show how these inequalities are helpful in learning theory and yield PAC-Bayes bounds under tailored assumptions on the empirical risk, thereby extending the range of conditions under which PAC-Bayesian guarantees can be established.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper derives change-of-measure inequalities based on f-divergences by combining the Legendre transform of a convex function f with the Fenchel-Young inequality. These inequalities are then applied to obtain PAC-Bayes generalization bounds that rely on tailored assumptions on the empirical risk (rather than the bounded exponential moments required by the classical Donsker-Varadhan theorem), thereby extending the range of conditions under which PAC-Bayesian guarantees hold.
Significance. If the derivations are correct, the work provides a convex-analytic framework for generating families of change-of-measure inequalities indexed by f, which in turn yield PAC-Bayes bounds under correspondingly tailored risk assumptions. This could meaningfully broaden the applicability of PAC-Bayes theory to learning settings where exponential-moment conditions fail but other moment or tail conditions (matched to f) hold. The approach is parameter-free in the sense that it directly invokes standard convex duality without additional fitted quantities.
minor comments (3)
- Abstract, paragraph 2: the phrase 'tailored assumptions on the empirical risk' is used without an immediate concrete example; a one-sentence illustration (e.g., for f(t)=t log t or f(t)=t^2) would clarify the claim for readers.
- The manuscript should explicitly state whether the derived inequalities recover the classical Donsker-Varadhan bound as a special case when f is the exponential function, to make the extension transparent.
- Notation for the reference and posterior measures should be introduced once and used consistently; the current abstract alternates between 'reference measure' and 'posterior measures' without a single defining sentence.
Simulated Author's Rebuttal
We thank the referee for their positive summary of our work and for recommending minor revision. No specific major comments were listed in the report.
Circularity Check
No significant circularity identified
full rationale
The derivation begins from the standard Legendre transform of an f-divergence combined with the external Fenchel-Young inequality to produce change-of-measure bounds, then applies those bounds to PAC-Bayes under tailored empirical-risk assumptions. No step reduces by construction to a fitted parameter, self-definition, or load-bearing self-citation; the central inequalities are obtained directly from classical convex-analysis identities whose validity does not depend on the target PAC-Bayes result. The paper therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Fenchel-Young inequality holds for proper convex lower-semicontinuous functions
Forward citations
Cited by 2 Pith papers
-
Tighter Information-Theoretic Generalization Bounds via a Novel Class of Change of Measure Inequalities
A unified data-processing framework produces tighter change-of-measure inequalities that improve information-theoretic generalization bounds across learning theory and privacy.
-
Density-Ratio Losses for Post-Hoc Learning to Defer
Post-hoc learning to defer is cast as density-ratio learning between model and expert ideal distributions, producing DR CPE losses that recover Chow's rule for KL-based ideals and support adjustable deferral via thresholding.
Reference graph
Works this paper leans on
-
[1]
12 ON CHANGE OF MEASURE INEQUALITIES FOR f -DIVERGENCES Pierre Alquier and Benjamin Guedj
URL https://www.arxiv.org/abs/2110.11216. 12 ON CHANGE OF MEASURE INEQUALITIES FOR f -DIVERGENCES Pierre Alquier and Benjamin Guedj. Simpler PAC-Bayesian bo unds for hostile data. Machine Learning, 107(5):887–902,
-
[2]
URL https://doi.org/10.3390/e23101280
doi: 10.3390/e23101280. URL https://doi.org/10.3390/e23101280. Felix Biggs and Benjamin Guedj. On margins and derandomisat ion in PAC-Bayes. In AISTATS, 2022a. URL https://www.arxiv.org/abs/2107.03955. Felix Biggs and Benjamin Guedj. Non-vacuous generalisatio n bounds for shallow neural networks. 2022b. URL https://arxiv.org/abs/2202.01627. St´ ephane Bou...
-
[3]
URL https://arxiv.org/abs/2012.03780. Olivier Catoni. A PAC-Bayesian approach to adaptive classi fication. preprint, 840,
-
[4]
Olivier Catoni. Statistical Learning Theory and Stochastic Optimization: Ecole d’Et ´e de Proba- bilit´es de Saint-Flour XXXI-2001 . Springer,
work page 2001
-
[5]
doi: 10.1016/j.jcss.2011.12.02
-
[6]
A primer on PAC-bayesian learning.arXiv preprint arXiv:1901.05353, 2019
URL https://arxiv.org/abs/1901.05353. Benjamin Guedj and Louis Pujol. Still no free lunches: the pr ice to pay for tighter PAC- Bayes bounds. Entropy, 23(11),
-
[7]
ISSN 1099-4300. doi: 10.3390/e23111529. UR L https://www.mdpi.com/1099-4300/23/11/1529. Maxime Haddouche, Benjamin Guedj, Omar Rivasplata, and Joh n Shawe-Taylor. PAC-Bayes un- leashed: generalisation bounds with unbounded losses. Entropy, 23(10):1330,
-
[8]
Matthew J. Holland. PAC-Bayes under potentially heavy tail s. In Hanna M. Wal- lach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alch ´ e-Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Sys tems 2019, NeurIPS 2019, December 8-14, 2019, V ancouver , B...
work page 2019
-
[9]
John Langford and Matthias Seeger
URL https://proceedings.neurips.cc/paper/2019/hash/3a20f62a0af1aa152670bab3c602feed-Abstract.html. John Langford and Matthias Seeger. Bounds for averaging cla ssifiers
work page 2019
-
[10]
Zakaria Mhammedi, Benjamin Guedj, and Robert C. Williamson . PAC-Bayesian bound for the conditional value at risk. In Hugo Larochelle, Marc’Aurelio Ran- zato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien L in, editors, Advances in Neural Information Processing Systems 33: Annual Conferen ce on Neural Informa- tion Processing Systems [NeurIPS] 2020, ...
work page 2020
-
[11]
Behnam Neyshabur, Srinadh Bhojanapalli, and Nathan Srebro
URL https://proceedings.neurips.cc/paper/2020/hash/d02e9bdc27a894e882fa0c9055c99722-Abstract.html. Behnam Neyshabur, Srinadh Bhojanapalli, and Nathan Srebro . A PAC-Bayesian ap- proach to spectrally-normalized margin bounds for neural n etworks. In 6th Interna- tional Conference on Learning Representations, ICLR 2018, V ancouver , BC, Canada, April 30 - M...
work page 2020
-
[12]
Y uki Ohnishi and Jean Honorio
URL https://proceedings.mlr.press/v124/nozawa20a.html. Y uki Ohnishi and Jean Honorio. Novel change of measure inequ alities with appli- cations to PAC-Bayesian bounds and Monte Carlo estimation. In Arindam Baner- jee and Kenji Fukumizu, editors, The 24th International Conference on Artificial Intel- ligence and Statistics, AISTATS 2021, April 13-15, 2021,...
work page 2021
-
[13]
URL http://proceedings.mlr.press/v130/ohnishi21a.html. Maria Perez-Ortiz, Omar Rivasplata, Benjamin Guedj, Matth ew Gleeson, Jingyu Zhang, John Shawe-Taylor, Miroslaw Bober, and Josef Kittler. Learning PAC-Bayes priors for probabilis- tic neural networks. 2021a. URL https://arxiv.org/abs/2109.10304. Maria Perez-Ortiz, Omar Rivasplata, John Shawe-Taylor, a...
-
[14]
URL https://doi.org/10.1109/TIT.2014.2320500
doi: 10.1109/TIT.2014.2320500. URL https://doi.org/10.1109/TIT.2014.2320500. Wenda Zhou, Victor V eitch, Morgane Austern, Ryan P . Adams, a nd Peter Orbanz. Non-vacuous generalization bounds at the imagenet scale: a PAC-Bayesian compres- sion approach. In 7th International Conference on Learning Representations , ICLR 2019, New Orleans, LA, USA, May 6-9, ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.