arxiv: 2605.07964 · v2 · submitted 2026-05-08 · 📊 stat.ML · cs.LG

Recognition: no theorem link

Asymptotically Log-Optimal Bayes-Assisted Confidence Sequences for Bounded Means

Fran\c{c}ois Caron, Stefano Cortinovis, Valentin Kilian

Pith reviewed 2026-05-12 03:17 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords confidence sequencestest martingalesBayesian predictive modelsWasserstein consistencyasymptotic log-optimalitybounded meansanytime-valid inferencesequential inference

0 comments

The pith

A Bayesian predictive model yields asymptotically log-optimal confidence sequences for bounded means that stay valid under misspecification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a Bayes-assisted method to build time-uniform confidence sequences for the mean of bounded independent observations. It uses a working predictive distribution to select, at each step and for each candidate mean, the valid martingale update that maximizes expected log-growth. Validity is guaranteed regardless of whether the predictive model or prior is correct. The central theorem states that if the predictive distribution is consistent in the Wasserstein metric, the resulting sequences achieve the same per-sample log-growth rate as an oracle procedure that knows the true distribution. This lets users incorporate prior information to tighten bounds and reduce sampling needs in sequential tasks while preserving exact coverage guarantees.

Core claim

The authors prove that a Bayes-assisted construction of confidence sequences, which adaptively chooses among valid one-step martingale factors the update maximizing predictive expected log-growth for each candidate mean and time point, is asymptotically log-optimal whenever the working predictive distribution converges in Wasserstein distance to the true data-generating distribution. This optimality means the sequences match the per-sample log-growth of an oracle with access to the true distribution. The framework preserves exact validity for any prior or predictive model, relying only on the observations being IID and bounded.

What carries the argument

The adaptive selection, for each time and candidate mean, of the valid one-step martingale factor that maximizes expected log-growth under the Bayesian predictive distribution.

If this is right

Informative priors produce narrower confidence sequences than non-adaptive baselines.
The approach reduces the number of samples needed for tasks such as sequential best-arm identification.
It maintains anytime-valid coverage in prediction-powered inference settings.
Robust instantiations such as Dirichlet-process mixtures and Bayesian exponentially tilted empirical likelihood yield practical implementations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar predictive-assisted selection could be applied to other parameters if suitable families of martingales exist.
The validity-under-misspecification property makes the method suitable for real data streams where the true distribution is unknown but bounded.
Extensions to dependent observations would require adjusted consistency conditions on the predictive model.

Load-bearing premise

The working predictive distribution must converge in Wasserstein distance to the true distribution for the asymptotic log-optimality result to hold.

What would settle it

Generate repeated samples from a known bounded distribution, feed a Wasserstein-consistent predictive such as the empirical measure into the procedure, and verify whether the average log-growth rate of the constructed sequences approaches the rate achieved by an oracle that uses the true distribution directly.

read the original abstract

Confidence sequences based on test martingales provide time-uniform uncertainty quantification for the mean of bounded IID observations without parametric distributional assumptions. Their practical efficiency, however, depends strongly on the choice of martingale updates, and many existing constructions do not exploit prior information about plausible data-generating distributions or mean values. We propose a Bayes-assisted framework that uses a Bayesian working predictive model to adaptively construct confidence sequences. For each candidate mean and time point, the predictive distribution selects, among valid one-step martingale factors, the update maximising predictive expected log-growth; validity is therefore preserved even when the prior or working model is misspecified. We prove that if the predictive distribution is Wasserstein-consistent, the resulting procedure is asymptotically log-optimal, matching the per-sample log-growth of an oracle procedure with access to the true distribution. We instantiate the framework using robust predictives based on Dirichlet-process mixtures and Bayesian exponentially tilted empirical likelihood. Experiments on synthetic data, sequential best-arm identification for LLM evaluation, and prediction-powered inference show that informative priors can substantially reduce confidence-sequence width and sampling effort while retaining anytime-valid coverage.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper folds Bayesian predictives into martingale confidence sequences for bounded IID means, keeps validity under misspecification, and proves asymptotic log-optimality when the predictive is Wasserstein-consistent.

read the letter

The main point is a clean construction that lets a working Bayesian predictive pick the strongest one-step martingale update at each time for expected log-growth. Validity stays intact even if the predictive is wrong, and the paper shows that Wasserstein consistency of the predictive is enough for the sequence to match the per-sample log-growth of an oracle that knows the true distribution. They instantiate it with Dirichlet-process mixtures and Bayesian exponentially tilted empirical likelihood, then run experiments on synthetic data plus sequential best-arm identification for LLM evaluation and prediction-powered inference. Those runs indicate narrower sequences and lower sample effort when the prior carries useful information. The validity-under-misspecification property and the explicit link from consistency to asymptotic optimality are the parts that feel new and useful. The framework stays within the standard bounded-IID setting, which keeps the math tractable. The main soft spots are that the optimality result is asymptotic rather than finite-sample, and the consistency condition on the predictive is an external assumption that still needs to hold in practice. The experiments are mostly synthetic with a couple of applied illustrations; they demonstrate the idea but do not exhaustively test robustness across misspecification levels. This is worth a serious referee for anyone working on anytime-valid inference or adaptive experimentation. The theoretical step is focused and the empirical checks are honest about what they show. I would send it to peer review.

Referee Report

1 major / 3 minor

Summary. The manuscript proposes a Bayes-assisted framework for constructing time-uniform confidence sequences for the mean of bounded IID observations. A Bayesian working predictive distribution is used to select, at each time and candidate mean, the valid one-step martingale factor that maximizes predictive expected log-growth. Validity is preserved under misspecification of the prior or predictive model. The central theoretical result states that Wasserstein consistency of the predictive distribution implies asymptotic log-optimality, in the sense that the per-sample log-growth rate matches that of an oracle procedure with access to the true data-generating distribution. The framework is instantiated with Dirichlet-process mixture predictives and Bayesian exponentially tilted empirical likelihood; experiments on synthetic data, sequential best-arm identification for LLM evaluation, and prediction-powered inference illustrate reduced width and sampling effort.

Significance. If the asymptotic optimality result holds, the work provides a principled bridge between Bayesian predictive modeling and frequentist anytime-valid inference, allowing informative priors to improve efficiency without sacrificing coverage guarantees. The explicit use of Wasserstein consistency as the sufficient condition for matching oracle log-growth is a clean and falsifiable contribution. Practical instantiations with robust nonparametric predictives and the reported experiments on real-world sequential tasks strengthen the case for adoption. The manuscript ships a clear statement of the consistency assumption and demonstrates that validity does not require correctness of the working model.

major comments (1)

[§3] §3 (asymptotic optimality theorem): the proof sketch relies on Wasserstein consistency implying convergence of the selected log-growth rates to the oracle rate; it is not immediately clear whether the argument requires uniform integrability or a specific rate of convergence in Wasserstein distance to control the per-sample limit, or whether the result is only in probability rather than almost surely.

minor comments (3)

[§2] The definition of the one-step martingale factor selection criterion (predictive expected log-growth) would benefit from an explicit equation number and a short derivation showing why it remains a valid test martingale even under misspecification.
[§5] In the experimental section, the synthetic data figures would be clearer if the oracle width were plotted alongside the Bayes-assisted and baseline sequences for direct visual comparison of the asymptotic gap.
[§4] A brief remark on computational cost of the Dirichlet-process mixture predictive (e.g., number of particles or truncation level) would help readers assess practicality for large-scale sequential tasks.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment and the detailed comment on the asymptotic optimality theorem. We address the concern below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [§3] §3 (asymptotic optimality theorem): the proof sketch relies on Wasserstein consistency implying convergence of the selected log-growth rates to the oracle rate; it is not immediately clear whether the argument requires uniform integrability or a specific rate of convergence in Wasserstein distance to control the per-sample limit, or whether the result is only in probability rather than almost surely.

Authors: We appreciate the referee highlighting the need for greater precision in the proof of asymptotic log-optimality. The argument establishes almost-sure convergence of the per-sample log-growth rate to the oracle rate. Wasserstein consistency is assumed to hold almost surely (as is standard), and the per-sample limit is taken along the same almost-sure event. Because the observations are bounded in [0,1], all admissible one-step log-growth rates are uniformly bounded by a constant independent of the data and of the predictive distribution. This boundedness directly supplies the uniform integrability required to interchange the limit and the predictive expectation when selecting the martingale factor, without needing any additional moment conditions. No quantitative rate of Wasserstein convergence is imposed beyond the consistency assumption itself, because the result concerns the limsup of the average log-growth as n→∞. We will revise §3 to state these points explicitly, including a short paragraph on the role of boundedness in securing uniform integrability and confirming that the convergence holds almost surely. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained under external assumption

full rationale

The central claim is a theorem establishing that Wasserstein consistency of any predictive distribution implies asymptotic per-sample log-optimality of the resulting confidence sequence (matching an oracle with the true distribution). This is an implication proved under an explicitly stated external condition on the predictive model, not a self-referential construction. Validity of the sequences holds independently of model correctness or misspecification. No load-bearing steps reduce by definition or by self-citation to the target result; the selection of martingale factors via predictive log-growth is a construction that preserves validity by design and whose optimality is derived conditionally on the consistency assumption rather than fitted or renamed from inputs. The framework does not rely on uniqueness theorems from the authors' prior work or smuggle ansatzes via citation. This is the normal case of a non-circular proof.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper relies on standard properties of martingales and introduces a selection criterion based on predictive log-growth, with no new free parameters or invented entities.

axioms (2)

standard math Bounded IID observations allow construction of test martingales for confidence sequences
This is a foundational assumption in the field of sequential analysis.
domain assumption Wasserstein consistency of the predictive distribution leads to asymptotic log-optimality
This is the key condition stated for the main theoretical result.

pith-pipeline@v0.9.0 · 5499 in / 1349 out tokens · 59461 ms · 2026-05-12T03:17:10.090926+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

91 extracted references · 91 canonical work pages · 1 internal anchor

[1]

The Bell System Technical Journal , volume =

A New Interpretation of Information Rate , author =. The Bell System Technical Journal , volume =. 1956 , doi =

work page 1956
[2]

Foundations and Trends in Statistics , volume =

Hypothesis Testing with E-values , author =. Foundations and Trends in Statistics , volume =. 2025 , doi =

work page 2025
[3]

The Annals of Mathematical Statistics , volume =

Statistical Methods Related to the Law of the Iterated Logarithm , author =. The Annals of Mathematical Statistics , volume =. 1970 , doi =

work page 1970
[4]

Journal of Machine Learning Research , volume =

Mixture Martingales Revisited with Applications to Sequential Tests and Confidence Intervals , author =. Journal of Machine Learning Research , volume =. 2021 , url =

work page 2021
[5]

Proceedings of the 41st International Conference on Machine Learning , series =

Gambling-Based Confidence Sequences for Bounded Random Vectors , author =. Proceedings of the 41st International Conference on Machine Learning , series =

work page
[6]

Stochastic Processes and their Applications , volume =

Sequential Optimizing Strategy in Multi-Dimensional Bounded Forecasting Games , author =. Stochastic Processes and their Applications , volume =. 2011 , doi =

work page 2011
[7]

Science , volume=

Prediction-powered inference , author=. Science , volume=. 2023 , publisher=

work page 2023
[8]

and den Bulte, C

Berman, R. and den Bulte, C. , year = 2021, month = dec, journal =. False. doi:10.1287/mnsc.2021.4207 , urldate =

work page doi:10.1287/mnsc.2021.4207 2021
[9]

and Thompson, D

Horvitz, D. and Thompson, D. , year = 1952, journal =. A. doi:10.2307/2280784 , urldate =. 2280784 , eprinttype =

work page doi:10.2307/2280784 1952
[10]

and Deng, A

Kohavi, R. and Deng, A. and Vermeer, L. , year = 2022, month = aug, series =. A/. Proceedings of the 28th. doi:10.1145/3534678.3539160 , urldate =

work page doi:10.1145/3534678.3539160 2022
[11]

and Longbotham, R

Kohavi, R. and Longbotham, R. , year = 2023, pages =. Online. Encyclopedia of. doi:10.1007/978-1-4899-7502-7_891-2 , urldate =

work page doi:10.1007/978-1-4899-7502-7_891-2 2023
[12]

Journal of Educational Psychology , volume =

Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies , author =. Journal of Educational Psychology , volume =

work page
[13]

Statistical Science , volume =

On the. Statistical Science , volume =. 2245382 , eprinttype =

work page
[14]

, journal =

Clerico, E. , journal =. On the optimality of coin-betting for mean estimation , year =

work page
[15]

and Walker, S

Lyddon, S. and Walker, S. and Holmes, C. , journal =. Nonparametric learning from. 2018 , volume =

work page 2018
[16]

and Ho, N

Bariletto, N. and Ho, N. , journal =. Bayesian nonparametrics meets data-driven distributionally robust optimization , year =

work page
[17]

and Ramdas, A

Waudby-Smith, I. and Ramdas, A. , journal =. Estimating means of bounded random variables by betting , year =

work page
[18]

Journal of the Royal Statistical Society Series B: Statistical Methodology , title =

Gr. Journal of the Royal Statistical Society Series B: Statistical Methodology , title =. 2024 , number =

work page 2024
[19]

and Hoff, P

Kessler, D. and Hoff, P. and Dunson, D. , journal =. Marginally specified priors for non-parametric Bayesian estimation , year =

work page
[20]

and Takemura, A

Kumon, M. and Takemura, A. and Takeuchi, K. , journal =. Capital process and optimality properties of a. 2008 , number =

work page 2008
[21]

and Vovk, V

Shafer, G. and Vovk, V. , publisher =. Game-theoretic foundations for probability and finance , year =

work page
[22]

, journal =

Schennach, S. , journal =. Bayesian exponentially tilted empirical likelihood , year =

work page
[23]

, journal =

Lazar, N. , journal =. Bayesian empirical likelihood , year =

work page
[24]

, publisher =

Owen, A. , publisher =. Empirical likelihood , year =

work page
[25]

, journal =

Antoniak, C. , journal =. Mixtures of. 1974 , pages =

work page 1974
[26]

, journal =

Ferguson, T. , journal =. A. 1973 , pages =

work page 1973
[27]

and MacQueen, J

Blackwell, D. and MacQueen, J. , journal =. Ferguson distributions via. 1973 , number =

work page 1973
[28]

and Wellner, J

Vaart, A. and Wellner, J. , year = 2023, series =. Weak. doi:10.1007/978-3-031-29040-4 , urldate =

work page doi:10.1007/978-3-031-29040-4 2023
[29]

Convex Analysis , editor =

work page
[30]

and Heyde, C

Hall, P. and Heyde, C. , publisher =. Martingale limit theory and its application , year =

work page
[31]

and Guillin, A

Fournier, N. and Guillin, A. , journal =. On the rate of convergence in. 2015 , number =

work page 2015
[32]

and Jun, K

Orabona, F. and Jun, K. , journal =. Tight concentrations and confidence sequences from the regret of universal portfolio , year =

work page
[33]

, journal =

Cover, T. , journal =. Universal portfolios , year =

work page
[34]

and Cardoso, \^

Liu, C. and Cardoso, \^. Datasets for Online Controlled Experiments , volume =. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks , editor =

work page
[35]

arXiv preprint arXiv:2008.07146 , year=

Large-scale Open Dataset, Pipeline, and Benchmark for Bandit Algorithms , author=. arXiv preprint arXiv:2008.07146 , year=

work page arXiv 2008
[36]

and Roeder, T

Goldberg, K. and Roeder, T. and Gupta, D. and Perkins, C. , year = 2001, journal =. Eigentaste:. doi:10.1023/A:1011419012209 , urldate =

work page doi:10.1023/a:1011419012209 2001
[37]

Maxwell Harper and Joseph A

Harper, F. and Konstan, J. , year = 2015, journal =. The. doi:10.1145/2827872 , urldate =

work page doi:10.1145/2827872 2015
[38]

and Tewari, A

Kalyanakrishnan, S. and Tewari, A. and Auer, P. and Stone, P. , year = 2012, journal =

work page 2012
[39]

and Kalyanakrishnan, S

Kaufmann, E. and Kalyanakrishnan, S. , editor =. Information. Proceedings of the 26th Annual Conference on Learning Theory , series =

work page
[40]

and Variyath, A

Chen, J. and Variyath, A. and Abraham, B. , year = 2008, month = jun, journal =. Adjusted. doi:10.1198/106186008X321068 , urldate =

work page doi:10.1198/106186008x321068 2008
[41]

Electronic Journal of Statistics , volume =

Calibration of the Empirical Likelihood Method for a Vector Mean , author =. Electronic Journal of Statistics , volume =. doi:10.1214/09-EJS518 , urldate =

work page doi:10.1214/09-ejs518
[42]

The Annals of Statistics , volume =

Adjusted Empirical Likelihood with High-Order Precision , author =. The Annals of Statistics , volume =. doi:10.1214/09-AOS750 , urldate =. 1010.0313 , primaryclass =

work page doi:10.1214/09-aos750
[43]

Biometrics , volume =

Adjusted Exponentially Tilted Likelihood with Applications to Brain Morphology , author =. Biometrics , volume =. doi:10.1111/j.1541-0420.2008.01124.x , langid =

work page doi:10.1111/j.1541-0420.2008.01124.x 2008
[44]

Power-One Tests Based on Sample Sums

Lai, T. , year = 1976, month = mar, journal =. On. doi:10.1214/aos/1176343406 , urldate =

work page doi:10.1214/aos/1176343406 1976
[45]

and Siegmund, D

Robbins, H. and Siegmund, D. , year = 1970, journal =. Boundary Crossing Probabilities for the

work page 1970
[46]

Proceedings of the National Academy of Sciences , volume =

Confidence Sequences for Mean, Variance, and Median , author =. Proceedings of the National Academy of Sciences , volume =. doi:10.1073/pnas.58.1.66 , urldate =

work page doi:10.1073/pnas.58.1.66
[47]

, title =

Bernshtein, S. , title =. 1927 , publisher =

work page 1927
[48]

and Vovk, V

Shafer, G. and Vovk, V. , publisher =

work page
[49]

and Munos, R

Audibert, J. and Munos, R. and Szepesv. Tuning. Algorithmic. doi:10.1007/978-3-540-75225-7_15 , urldate =

work page doi:10.1007/978-3-540-75225-7_15
[50]

, year = 1962, journal =

Bennett, G. , year = 1962, journal =. Probability. doi:10.2307/2282438 , urldate =. 2282438 , eprinttype =

work page doi:10.2307/2282438 1962
[51]

, year = 2004, month = apr, journal =

Bentkus, V. , year = 2004, month = apr, journal =. On. doi:10.1214/009117904000000360 , urldate =

work page doi:10.1214/009117904000000360 2004
[52]

and Pontil, M

Maurer, A. and Pontil, M. , year = 2009, month = jul, urldate =. Empirical. Annual

work page 2009
[53]

, year = 1963, journal =

Hoeffding, W. , year = 1963, journal =. Probability. doi:10.2307/2282952 , urldate =. 2282952 , eprinttype =

work page doi:10.2307/2282952 1963
[54]

Bernoulli

Bayes-Optimal Prediction with Frequentist Coverage Control , author =. Bernoulli. Official Journal of the Bernoulli Society for Mathematical Statistics and Probability , volume =

work page
[55]

Australian & New Zealand Journal of Statistics , volume =

Further Properties of Frequentist Confidence Intervals in Regression That Utilize Uncertain Prior Information , author =. Australian & New Zealand Journal of Statistics , volume =

work page
[56]

Stat , volume =

Confidence Intervals That Utilize Sparsity , author =. Stat , volume =

work page
[57]

Journal of the American Statistical Association , volume =

Length of Confidence Intervals , author =. Journal of the American Statistical Association , volume =

work page
[58]

The Annals of Mathematical Statistics , pages =

Shorter Confidence Intervals for the Mean of a Normal Distribution with Known Variance , author =. The Annals of Mathematical Statistics , pages =

work page
[59]

Biometrika , volume =

Adaptive Multigroup Confidence Intervals with Constant Coverage , author =. Biometrika , volume =

work page
[60]

and Casella, G

Brown, L. and Casella, G. and Hwang, J. , year = 1995, journal =. Optimal Confidence Sets, Bioequivalence, and the Limacon of

work page 1995
[61]

and Caron, F

Cortinovis, S. and Caron, F. , year = 2024, journal =. Bayes-Assisted. 2410.20169 , archiveprefix =

work page arXiv 2024
[62]

Statistics & Probability Letters , volume =

Confidence Intervals for the Normal Mean Utilizing Prior Information , author =. Statistics & Probability Letters , volume =

work page
[63]

Electronic Journal of Statistics , volume =

Exact Adaptive Confidence Intervals for Linear Regression Coefficients , author =. Electronic Journal of Statistics , volume =

work page
[64]

Statistical Science , volume =

Game-Theoretic Statistics and Safe Anytime-Valid Inference , author =. Statistical Science , volume =. 2023 , doi =

work page 2023
[65]

, year = 1939, publisher =

Ville, J. , year = 1939, publisher =

work page 1939
[66]

The Annals of Statistics , volume =

Time-Uniform, Nonparametric, Nonasymptotic Confidence Sequences , author =. The Annals of Statistics , volume =. doi:10.1214/20-AOS1991 , urldate =. 1810.08240 , primaryclass =

work page doi:10.1214/20-aos1991
[67]

, journal =

Wald, A. , journal =. Sequential Tests of Statistical Hypotheses , year =

work page
[68]

Journal of Statistical Planning and Inference , volume =

Bayesian Inference with Misspecified Models , author =. Journal of Statistical Planning and Inference , volume =. doi:10.1016/j.jspi.2013.05.013 , urldate =

work page doi:10.1016/j.jspi.2013.05.013 2013
[69]

and Holmes, C

Bissiri, P. and Holmes, C. and Walker, S. , year = 2016, month = nov, journal =. A. doi:10.1111/rssb.12158 , urldate =. 1306.6430 , primaryclass =

work page doi:10.1111/rssb.12158 2016
[70]

, year = 1979, journal =

Holm, S. , year = 1979, journal =. A. 4615733 , eprinttype =

work page 1979
[71]

, year = 1945, journal =

Wilcoxon, F. , year = 1945, journal =. Individual. 3001968 , eprinttype =

work page 1945
[72]

, year = 2015, month = jan, series =

Santambrogio, F. , year = 2015, month = jan, series =. Optimal

work page 2015
[73]

and Ordentlich, E

Cover, T. and Ordentlich, E. , journal =. Universal Portfolios with Side Information , year =

work page
[74]

, booktitle =

Breiman, L. , booktitle =. Optimal Gambling Systems for Favorable Games , year =

work page
[75]

arXiv preprint arXiv:2502.04294 , year=

Prediction-powered e-values , author=. arXiv preprint arXiv:2502.04294 , year=

work page arXiv
[76]

Monthly Notices of the Royal Astronomical Society , volume=

Galaxy Zoo 2: detailed morphological classifications for 304 122 galaxies from the Sloan Digital Sky Survey , author=. Monthly Notices of the Royal Astronomical Society , volume=. 2013 , publisher=

work page 2013
[77]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Deep residual learning for image recognition , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page
[78]

https://arxiv.org/abs/2212.08037

Attributed question answering: Evaluation and modeling for attributed large language models , author=. arXiv preprint arXiv:2212.08037 , year=

work page arXiv
[79]

2020 , publisher=

An introduction to sequential Monte Carlo , author=. 2020 , publisher=

work page 2020
[80]

and Caron, F

Cortinovis, S. and Caron, F. , booktitle =

work page

Showing first 80 references.