arxiv: 2605.12190 · v1 · submitted 2026-05-12 · 📊 stat.ML · cs.LG

Recognition: 2 theorem links

· Lean Theorem

Information-Theoretic Generalization Bounds for Sequential Decision Making

Futoshi Futami, Masahiro Fujisawa

Pith reviewed 2026-05-13 03:50 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords generalization boundsconditional mutual informationsequential decision makingonline learningmulti-armed banditsactive learninginformation theory

0 comments

The pith

Under row-wise exchangeability, sequential conditional mutual information bounds the generalization gap in sequential decision making.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a sequential supersample framework to obtain information-theoretic generalization bounds for adaptive data settings. The framework separates the learner's filtration from an enlarged supersample for ghost sample comparisons in proofs. Under a row-wise exchangeability assumption, the sequential generalization gap is upper-bounded by the sequential CMI, which is the sum over rounds of selector-loss mutual information terms. A Bernstein-type refinement of the bound is derived to achieve faster rates when variance conditions hold. These bounds are shown to apply to online learning, streaming active learning with importance weighting, and stochastic multi-armed bandits.

Core claim

We develop a sequential supersample construction that decouples the learner's filtration from the proof-side ghost samples. Under the row-wise exchangeability assumption, the expected generalization gap is upper-bounded by the sequential CMI, which aggregates round-wise mutual information between the selector and the loss. We further derive a Bernstein-type bound that improves the rate when the variance of the loss is small.

What carries the argument

The sequential supersample framework separating learner filtration from proof-side enlargement for ghost comparisons, and the sequential CMI as the sum of roundwise selector-loss information terms.

If this is right

The bounds apply to online learning algorithms.
They extend to streaming active learning with importance weighting.
The strategy bounds generalization in stochastic multi-armed bandits.
Bernstein refinement yields faster rates under low variance conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The separation of filtrations may enable similar bounds for other adaptive algorithms like reinforcement learning.
Sequential CMI could serve as a diagnostic for generalization performance in real-world sequential systems.
This work opens paths to information-theoretic analysis of more general causal decision processes.

Load-bearing premise

The row-wise exchangeability assumption that allows the sequential generalization gap to be controlled by sequential CMI.

What would settle it

A counterexample where rows are not exchangeable and the generalization gap is larger than the sequential CMI predicts.

read the original abstract

Information-theoretic generalization bounds based on the supersample construction are a central tool for algorithm-dependent generalization analysis in the batch i.i.d.~setting. However, existing supersample conditional mutual information (CMI) bounds do not directly apply to sequential decision-making problems such as online learning, streaming active learning, and bandits, where data are revealed adaptively and the learner evolves along a causal trajectory. To address this limitation, we develop a sequential supersample framework that separates the learner filtration from a proof-side enlargement used for ghost-coordinate comparisons. Under a row-wise exchangeability assumption, the sequential generalization gap is controlled by sequential CMI, a sum of roundwise selector--loss information terms. We also establish a Bernstein-type refinement that yields faster rates under suitable variance conditions. The selector-SCMI proof strategy applies to online learning, streaming active learning with importance weighting, and stochastic multi-armed bandits.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper sketches a sequential supersample CMI bound for online learning and bandits, but the row-wise exchangeability assumption looks too fragile for the adaptive cases they target.

read the letter

The main contribution is a sequential supersample construction that separates the learner's filtration from a ghost-sample enlargement, then defines sequential CMI as the sum of per-round selector-loss mutual information terms. Under row-wise exchangeability this controls the generalization gap, and they add a Bernstein-style refinement for variance-dependent rates. The framework is applied to online learning, importance-weighted active learning, and stochastic bandits. That is the actual new piece: prior supersample CMI work stayed in the batch i.i.d. regime, so this is a direct attempt to move the tool into causal, adaptive trajectories. The construction itself is clean on paper and gives a clear proof strategy once the assumption is granted. Credit for spotting the gap and writing down the round-wise decomposition. The soft spot is the exchangeability assumption itself. In stochastic bandits or online learning the action at round t is chosen using past rewards, so the marginal distribution of the t-th row depends on the filtration up to t-1. That breaks row-wise exchangeability with any fixed ghost sample whose joint law does not adapt the same way. The bound then compares the real trajectory to a ghost sample whose law no longer matches, and it is not obvious the CMI terms still upper-bound the gap without further restrictions on the policy or reward model. The abstract states the assumption but does not show how it survives the adaptivity in the listed applications. If the full proof contains extra conditions that restore the argument, that would be the key detail; otherwise the claim is narrower than advertised. This is for readers already working with information-theoretic bounds who want to see how the supersample idea might extend. A theorist in online learning or bandits would get value from the framework even if they end up tightening or replacing the assumption. It is worth a serious referee because the gap it targets is real and the construction is explicit enough to critique. I would send it to review but ask the authors to address the exchangeability issue head-on in the response.

Referee Report

2 major / 1 minor

Summary. The paper develops a sequential supersample framework extending information-theoretic generalization bounds to adaptive sequential decision-making settings (online learning, streaming active learning, stochastic bandits). Under a row-wise exchangeability assumption on the supersample, the sequential generalization gap is bounded by sequential CMI (a sum of round-wise selector-loss conditional mutual information terms). A Bernstein-type refinement is also derived for faster rates under suitable variance conditions. The selector-SCMI strategy is claimed to apply directly to the listed applications.

Significance. If the row-wise exchangeability assumption can be rigorously justified or conditioned for adaptive policies, this would meaningfully extend CMI-based, algorithm-dependent generalization analysis beyond the batch i.i.d. case to causal, sequential trajectories. The Bernstein refinement is a constructive addition that could yield practical rate improvements when variance is controlled. The work addresses a genuine technical gap, but its significance hinges on whether the core assumption survives the adaptive dependence structures in the target domains.

major comments (2)

[Abstract and the statement of the main theorem] The row-wise exchangeability assumption is load-bearing for the central claim (abstract: 'Under a row-wise exchangeability assumption, the sequential generalization gap is controlled by sequential CMI'). In stochastic bandits and online learning, the learner's adaptive action selection (or importance weights) conditions later rows on the filtration of prior outcomes, so the joint law of the supersample rows no longer matches the real trajectory. The manuscript must either prove that the assumption is preserved for the policies considered or delineate the precise additional conditions (e.g., on the reward distributions or policy class) under which the ghost-sample comparison remains valid.
[Abstract and Section 3 (framework definition)] The separation of 'learner filtration from a proof-side enlargement' is invoked to handle causality, yet the abstract provides no derivation details or error analysis. Without the explicit construction of the sequential supersample (how the ghost coordinates are sampled and how the CMI terms are defined round-wise), it is impossible to verify that the bound is non-vacuous or that it avoids the circularity that would arise if the enlargement itself depended on the adaptive choices.

minor comments (1)

The abstract is technically dense; a single additional sentence clarifying the difference between the sequential supersample and the standard batch supersample would improve accessibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our work extending information-theoretic generalization bounds to sequential decision-making. The feedback highlights key points about the role of the row-wise exchangeability assumption and the need for explicit details on the supersample construction. We address each major comment below and will revise the manuscript to incorporate clarifications.

read point-by-point responses

Referee: [Abstract and the statement of the main theorem] The row-wise exchangeability assumption is load-bearing for the central claim (abstract: 'Under a row-wise exchangeability assumption, the sequential generalization gap is controlled by sequential CMI'). In stochastic bandits and online learning, the learner's adaptive action selection (or importance weights) conditions later rows on the filtration of prior outcomes, so the joint law of the supersample rows no longer matches the real trajectory. The manuscript must either prove that the assumption is preserved for the policies considered or delineate the precise additional conditions (e.g., on the reward distributions or policy class) under which the ghost-sample comparison remains valid.

Authors: We agree that row-wise exchangeability is a central assumption and does not hold unconditionally under arbitrary adaptive dependence. The manuscript presents the bound as holding under this assumption (Section 3), without claiming it is automatically satisfied for all policies. For the listed applications, the assumption can be ensured by enlarging the probability space with independent ghost samples drawn from the same marginal distributions as the real data, decoupled from the learner's adaptive choices. This is feasible under standard i.i.d. reward or data assumptions (e.g., fixed arm distributions in bandits, or i.i.d. instances in online learning). We will add a dedicated subsection (new Section 4.4) that explicitly delineates these conditions for each application, including when the policy class or reward distributions allow the ghost coordinates to remain exchangeable row-wise conditional on the filtration. This revision will clarify the scope without overclaiming generality. revision: yes
Referee: [Abstract and Section 3 (framework definition)] The separation of 'learner filtration from a proof-side enlargement' is invoked to handle causality, yet the abstract provides no derivation details or error analysis. Without the explicit construction of the sequential supersample (how the ghost coordinates are sampled and how the CMI terms are defined round-wise), it is impossible to verify that the bound is non-vacuous or that it avoids the circularity that would arise if the enlargement itself depended on the adaptive choices.

Authors: The abstract is intentionally high-level, consistent with standard practice, and does not contain derivation details. The explicit construction appears in Section 3: the learner filtration is the sigma-algebra generated by the realized observations and actions, while the proof-side enlargement augments the space with ghost coordinates sampled independently from the same marginals (using auxiliary randomness independent of the policy). The round-wise SCMI terms are then I(selector_t; loss on ghost | past filtration), with the ghosts fixed across rounds for exchangeability. This decouples the enlargement from adaptive choices, avoiding circularity. To make verification easier, we will expand Section 3 with a step-by-step sampling procedure, add an appendix containing a filtration diagram, and include a short error analysis showing the bound remains non-vacuous when the CMI terms are finite (as is standard in the batch CMI literature). revision: yes

Circularity Check

0 steps flagged

No significant circularity; bound derived conditionally from exchangeability assumption

full rationale

The paper introduces a sequential supersample framework that separates learner filtration from ghost-sample enlargement and then states that, under the row-wise exchangeability assumption, the generalization gap is controlled by the sum of roundwise selector-loss CMI terms. This is a standard conditional derivation: the bound follows from the assumption plus the new construction rather than reducing to a self-definition, a fitted parameter renamed as a prediction, or a load-bearing self-citation. No equations in the abstract or description exhibit the patterns of self-definitional closure, ansatz smuggling, or renaming of known results. The result remains self-contained against the stated assumption and the supersample construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

Ledger populated from abstract only; central claim rests on one domain assumption and introduces two new conceptual entities without independent evidence.

axioms (1)

domain assumption row-wise exchangeability assumption
Invoked to separate learner filtration from proof-side enlargement and to control the generalization gap by sequential CMI.

invented entities (2)

sequential supersample framework no independent evidence
purpose: Separates the learner filtration from a proof-side enlargement used for ghost-coordinate comparisons in sequential settings.
New construction introduced to extend supersample methods beyond i.i.d. batch data.
sequential CMI no independent evidence
purpose: Bounds the sequential generalization gap as a sum of roundwise selector-loss information terms.
New quantity defined for the adaptive case.

pith-pipeline@v0.9.0 · 5448 in / 1219 out tokens · 85356 ms · 2026-05-13T03:50:50.702424+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
Under a row-wise exchangeability assumption, the sequential generalization gap is controlled by sequential CMI, a sum of roundwise selector–loss information terms.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_strictMono_of_one_lt unclear
Lemma 1. Suppose that (3) and (6) hold. Then Errseq_n = 2/n ∑ E[E[ε_t L^+_t | G_{t-1}]]

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

[1]

Importance weighted active learning

Alina Beygelzimer, Sanjoy Dasgupta, and John Langford. Importance weighted active learning. InProceedings of the 26th annual international conference on machine learning, pages 49–56, 2009

work page 2009
[2]

Regret analysis of stochastic and nonstochastic multi-armed bandit problems.arXiv preprint arXiv:1204.5721, 2012

Sébastien Bubeck and Nicolo Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems.arXiv preprint arXiv:1204.5721, 2012

work page arXiv 2012
[3]

T. M. Cover and J. A. Thomas.Elements of Information Theory. John Wiley & Sons, 2012

work page 2012
[4]

Futami and M

F. Futami and M. Fujisawa. Time-independent information-theoretic generalization bounds for SGLD. In Thirty-seventh Conference on Neural Information Processing Systems, 2023

work page 2023
[5]

Pac-bayes, mac-bayes and conditional mutual information: Fast rate bounds that handle general vc classes

Peter Grunwald, Thomas Steinke, and Lydia Zakynthinou. Pac-bayes, mac-bayes and conditional mutual information: Fast rate bounds that handle general vc classes. InConference on Learning Theory, pages 2217–2247. PMLR, 2021

work page 2021
[6]

Online pac-bayes learning.Advances in Neural Information Processing Systems, 35:25725–25738, 2022

Maxime Haddouche and Benjamin Guedj. Online pac-bayes learning.Advances in Neural Information Processing Systems, 35:25725–25738, 2022

work page 2022
[7]

PAC-bayes generalisation bounds for heavy-tailed losses through supermartingales.Transactions on Machine Learning Research, 2023

Maxime Haddouche and Benjamin Guedj. PAC-bayes generalisation bounds for heavy-tailed losses through supermartingales.Transactions on Machine Learning Research, 2023. ISSN 2835-8856. URL https:// openreview.net/forum?id=qxrwt6F3sf

work page 2023
[8]

Sharpened generalization bounds based on conditional mutual information and an application to noisy, iterative algorithms

Mahdi Haghifam, Jeffrey Negrea, Ashish Khisti, Daniel M Roy, and Gintare Karolina Dziugaite. Sharpened generalization bounds based on conditional mutual information and an application to noisy, iterative algorithms. Advances in Neural Information Processing Systems, 33:9925–9935, 2020. 9

work page 2020
[9]

Harutyunyan, M

H. Harutyunyan, M. Raginsky, G. Ver Steeg, and A. Galstyan. Information-theoretic generalization bounds for black-box learning algorithms. InAdvances in Neural Information Processing Systems, pages 24670–24682, 2021

work page 2021
[10]

Hellström and G

F. Hellström and G. Durisi. A new family of generalization bounds using samplewise evaluated CMI. InAdvances in Neural Information Processing Systems, pages 20648–20660, 2022

work page 2022
[11]

Hartung-Gorre Germany, 1998

Gerhard Kramer.Directed information for channels with feedback, volume 11. Hartung-Gorre Germany, 1998

work page 1998
[12]

Causality, feedback and directed information

James Massey et al. Causality, feedback and directed information. InProc. Int. Symp. Inf. Theory Applic.(ISITA-90), volume 2, page 1, 1990

work page 1990
[13]

The MIT Press, 2nd edition, 2018

Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar.Foundations of Machine Learning. The MIT Press, 2nd edition, 2018. ISBN 0262039400

work page 2018
[14]

Online learning via sequential complexities.The Journal of Machine Learning Research, 16(1):155–186, 2015

Alexander Rakhlin, Karthik Sridharan, and Ambuj Tewari. Online learning via sequential complexities.The Journal of Machine Learning Research, 16(1):155–186, 2015

work page 2015
[15]

Russo and J

D. Russo and J. Zou. Controlling bias in adaptive data analysis using information theory. InProceedings of the 19th International Conference on Artificial Intelligence and Statistics, volume 51, pages 1232–1240, 2016

work page 2016
[16]

Minimum description length and generalization guarantees for representation learning.Advances in Neural Information Processing Systems, 36:1489–1525, 2023

Milad Sefidgaran, Abdellatif Zaidi, and Piotr Krasnowski. Minimum description length and generalization guarantees for representation learning.Advances in Neural Information Processing Systems, 36:1489–1525, 2023

work page 2023
[17]

Pac-bayes- bernstein inequality for martingales and its application to multiarmed bandits

Yevgeny Seldin, Nicolo Cesa-Bianchi, Peter Auer, François Laviolette, and John Shawe-Taylor. Pac-bayes- bernstein inequality for martingales and its application to multiarmed bandits. InProceedings of the Workshop on On-line Trading of Exploration and Exploitation 2, pages 98–111. JMLR Workshop and Conference Proceedings, 2012

work page 2012
[18]

Steinke and L

T. Steinke and L. Zakynthinou. Reasoning about generalization via conditional mutual information. In Jacob Abernethy and Shivani Agarwal, editors,Proceedings of Thirty Third Conference on Learning Theory, volume 125 ofProceedings of Machine Learning Research, pages 3437–3452. PMLR, 09–12 Jul 2020

work page 2020
[19]

The capacity of channels with feedback.IEEE Transactions on Information Theory, 55(1):323–349, 2008

Sekhar Tatikonda and Sanjoy Mitter. The capacity of channels with feedback.IEEE Transactions on Information Theory, 55(1):323–349, 2008

work page 2008
[20]

H. Wang, R. Gao, and F. P. Calmon. Generalization bounds for noisy iterative algorithms using properties of additive noise channels.Journal of Machine Learning Research, 24(26):1–43, 2023

work page 2023
[21]

Wang and Y

Z. Wang and Y . Mao. On the generalization of models trained with SGD: Information-theoretic bounds and implications. InThe Tenth International Conference on Learning Representations, 2022

work page 2022
[22]

Wang and Y

Z. Wang and Y . Mao. Tighter information-theoretic generalization bounds from supersamples. InProceedings of the 40th International Conference on Machine Learning, volume 202, pages 36111–36137, 2023

work page 2023
[23]

0” and which coordinate is called “1

A. Xu and M. Raginsky. Information-theoretic analysis of generalization capability of learning algorithms. In Advances in Neural Information Processing Systems, volume 30, pages 2524–2533, 2017. 10 Appendix organization.Appendix A proves the core SCMI results, including the one-coordinate identity, the slow-rate bound, and the exchangeability-free two-coo...

work page 2017