Recognition: no theorem link
Asymptotically Log-Optimal Bayes-Assisted Confidence Sequences for Bounded Means
Pith reviewed 2026-05-12 03:17 UTC · model grok-4.3
The pith
A Bayesian predictive model yields asymptotically log-optimal confidence sequences for bounded means that stay valid under misspecification.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors prove that a Bayes-assisted construction of confidence sequences, which adaptively chooses among valid one-step martingale factors the update maximizing predictive expected log-growth for each candidate mean and time point, is asymptotically log-optimal whenever the working predictive distribution converges in Wasserstein distance to the true data-generating distribution. This optimality means the sequences match the per-sample log-growth of an oracle with access to the true distribution. The framework preserves exact validity for any prior or predictive model, relying only on the observations being IID and bounded.
What carries the argument
The adaptive selection, for each time and candidate mean, of the valid one-step martingale factor that maximizes expected log-growth under the Bayesian predictive distribution.
If this is right
- Informative priors produce narrower confidence sequences than non-adaptive baselines.
- The approach reduces the number of samples needed for tasks such as sequential best-arm identification.
- It maintains anytime-valid coverage in prediction-powered inference settings.
- Robust instantiations such as Dirichlet-process mixtures and Bayesian exponentially tilted empirical likelihood yield practical implementations.
Where Pith is reading between the lines
- Similar predictive-assisted selection could be applied to other parameters if suitable families of martingales exist.
- The validity-under-misspecification property makes the method suitable for real data streams where the true distribution is unknown but bounded.
- Extensions to dependent observations would require adjusted consistency conditions on the predictive model.
Load-bearing premise
The working predictive distribution must converge in Wasserstein distance to the true distribution for the asymptotic log-optimality result to hold.
What would settle it
Generate repeated samples from a known bounded distribution, feed a Wasserstein-consistent predictive such as the empirical measure into the procedure, and verify whether the average log-growth rate of the constructed sequences approaches the rate achieved by an oracle that uses the true distribution directly.
read the original abstract
Confidence sequences based on test martingales provide time-uniform uncertainty quantification for the mean of bounded IID observations without parametric distributional assumptions. Their practical efficiency, however, depends strongly on the choice of martingale updates, and many existing constructions do not exploit prior information about plausible data-generating distributions or mean values. We propose a Bayes-assisted framework that uses a Bayesian working predictive model to adaptively construct confidence sequences. For each candidate mean and time point, the predictive distribution selects, among valid one-step martingale factors, the update maximising predictive expected log-growth; validity is therefore preserved even when the prior or working model is misspecified. We prove that if the predictive distribution is Wasserstein-consistent, the resulting procedure is asymptotically log-optimal, matching the per-sample log-growth of an oracle procedure with access to the true distribution. We instantiate the framework using robust predictives based on Dirichlet-process mixtures and Bayesian exponentially tilted empirical likelihood. Experiments on synthetic data, sequential best-arm identification for LLM evaluation, and prediction-powered inference show that informative priors can substantially reduce confidence-sequence width and sampling effort while retaining anytime-valid coverage.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a Bayes-assisted framework for constructing time-uniform confidence sequences for the mean of bounded IID observations. A Bayesian working predictive distribution is used to select, at each time and candidate mean, the valid one-step martingale factor that maximizes predictive expected log-growth. Validity is preserved under misspecification of the prior or predictive model. The central theoretical result states that Wasserstein consistency of the predictive distribution implies asymptotic log-optimality, in the sense that the per-sample log-growth rate matches that of an oracle procedure with access to the true data-generating distribution. The framework is instantiated with Dirichlet-process mixture predictives and Bayesian exponentially tilted empirical likelihood; experiments on synthetic data, sequential best-arm identification for LLM evaluation, and prediction-powered inference illustrate reduced width and sampling effort.
Significance. If the asymptotic optimality result holds, the work provides a principled bridge between Bayesian predictive modeling and frequentist anytime-valid inference, allowing informative priors to improve efficiency without sacrificing coverage guarantees. The explicit use of Wasserstein consistency as the sufficient condition for matching oracle log-growth is a clean and falsifiable contribution. Practical instantiations with robust nonparametric predictives and the reported experiments on real-world sequential tasks strengthen the case for adoption. The manuscript ships a clear statement of the consistency assumption and demonstrates that validity does not require correctness of the working model.
major comments (1)
- [§3] §3 (asymptotic optimality theorem): the proof sketch relies on Wasserstein consistency implying convergence of the selected log-growth rates to the oracle rate; it is not immediately clear whether the argument requires uniform integrability or a specific rate of convergence in Wasserstein distance to control the per-sample limit, or whether the result is only in probability rather than almost surely.
minor comments (3)
- [§2] The definition of the one-step martingale factor selection criterion (predictive expected log-growth) would benefit from an explicit equation number and a short derivation showing why it remains a valid test martingale even under misspecification.
- [§5] In the experimental section, the synthetic data figures would be clearer if the oracle width were plotted alongside the Bayes-assisted and baseline sequences for direct visual comparison of the asymptotic gap.
- [§4] A brief remark on computational cost of the Dirichlet-process mixture predictive (e.g., number of particles or truncation level) would help readers assess practicality for large-scale sequential tasks.
Simulated Author's Rebuttal
We thank the referee for the positive assessment and the detailed comment on the asymptotic optimality theorem. We address the concern below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [§3] §3 (asymptotic optimality theorem): the proof sketch relies on Wasserstein consistency implying convergence of the selected log-growth rates to the oracle rate; it is not immediately clear whether the argument requires uniform integrability or a specific rate of convergence in Wasserstein distance to control the per-sample limit, or whether the result is only in probability rather than almost surely.
Authors: We appreciate the referee highlighting the need for greater precision in the proof of asymptotic log-optimality. The argument establishes almost-sure convergence of the per-sample log-growth rate to the oracle rate. Wasserstein consistency is assumed to hold almost surely (as is standard), and the per-sample limit is taken along the same almost-sure event. Because the observations are bounded in [0,1], all admissible one-step log-growth rates are uniformly bounded by a constant independent of the data and of the predictive distribution. This boundedness directly supplies the uniform integrability required to interchange the limit and the predictive expectation when selecting the martingale factor, without needing any additional moment conditions. No quantitative rate of Wasserstein convergence is imposed beyond the consistency assumption itself, because the result concerns the limsup of the average log-growth as n→∞. We will revise §3 to state these points explicitly, including a short paragraph on the role of boundedness in securing uniform integrability and confirming that the convergence holds almost surely. revision: yes
Circularity Check
No significant circularity; derivation is self-contained under external assumption
full rationale
The central claim is a theorem establishing that Wasserstein consistency of any predictive distribution implies asymptotic per-sample log-optimality of the resulting confidence sequence (matching an oracle with the true distribution). This is an implication proved under an explicitly stated external condition on the predictive model, not a self-referential construction. Validity of the sequences holds independently of model correctness or misspecification. No load-bearing steps reduce by definition or by self-citation to the target result; the selection of martingale factors via predictive log-growth is a construction that preserves validity by design and whose optimality is derived conditionally on the consistency assumption rather than fitted or renamed from inputs. The framework does not rely on uniqueness theorems from the authors' prior work or smuggle ansatzes via citation. This is the normal case of a non-circular proof.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Bounded IID observations allow construction of test martingales for confidence sequences
- domain assumption Wasserstein consistency of the predictive distribution leads to asymptotic log-optimality
Reference graph
Works this paper leans on
-
[1]
The Bell System Technical Journal , volume =
A New Interpretation of Information Rate , author =. The Bell System Technical Journal , volume =. 1956 , doi =
work page 1956
-
[2]
Foundations and Trends in Statistics , volume =
Hypothesis Testing with E-values , author =. Foundations and Trends in Statistics , volume =. 2025 , doi =
work page 2025
-
[3]
The Annals of Mathematical Statistics , volume =
Statistical Methods Related to the Law of the Iterated Logarithm , author =. The Annals of Mathematical Statistics , volume =. 1970 , doi =
work page 1970
-
[4]
Journal of Machine Learning Research , volume =
Mixture Martingales Revisited with Applications to Sequential Tests and Confidence Intervals , author =. Journal of Machine Learning Research , volume =. 2021 , url =
work page 2021
-
[5]
Proceedings of the 41st International Conference on Machine Learning , series =
Gambling-Based Confidence Sequences for Bounded Random Vectors , author =. Proceedings of the 41st International Conference on Machine Learning , series =
-
[6]
Stochastic Processes and their Applications , volume =
Sequential Optimizing Strategy in Multi-Dimensional Bounded Forecasting Games , author =. Stochastic Processes and their Applications , volume =. 2011 , doi =
work page 2011
-
[7]
Prediction-powered inference , author=. Science , volume=. 2023 , publisher=
work page 2023
-
[8]
Berman, R. and den Bulte, C. , year = 2021, month = dec, journal =. False. doi:10.1287/mnsc.2021.4207 , urldate =
-
[9]
Horvitz, D. and Thompson, D. , year = 1952, journal =. A. doi:10.2307/2280784 , urldate =. 2280784 , eprinttype =
-
[10]
Kohavi, R. and Deng, A. and Vermeer, L. , year = 2022, month = aug, series =. A/. Proceedings of the 28th. doi:10.1145/3534678.3539160 , urldate =
-
[11]
Kohavi, R. and Longbotham, R. , year = 2023, pages =. Online. Encyclopedia of. doi:10.1007/978-1-4899-7502-7_891-2 , urldate =
-
[12]
Journal of Educational Psychology , volume =
Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies , author =. Journal of Educational Psychology , volume =
-
[13]
Statistical Science , volume =
On the. Statistical Science , volume =. 2245382 , eprinttype =
-
[14]
Clerico, E. , journal =. On the optimality of coin-betting for mean estimation , year =
-
[15]
Lyddon, S. and Walker, S. and Holmes, C. , journal =. Nonparametric learning from. 2018 , volume =
work page 2018
- [16]
-
[17]
Waudby-Smith, I. and Ramdas, A. , journal =. Estimating means of bounded random variables by betting , year =
-
[18]
Journal of the Royal Statistical Society Series B: Statistical Methodology , title =
Gr. Journal of the Royal Statistical Society Series B: Statistical Methodology , title =. 2024 , number =
work page 2024
-
[19]
Kessler, D. and Hoff, P. and Dunson, D. , journal =. Marginally specified priors for non-parametric Bayesian estimation , year =
-
[20]
Kumon, M. and Takemura, A. and Takeuchi, K. , journal =. Capital process and optimality properties of a. 2008 , number =
work page 2008
-
[21]
Shafer, G. and Vovk, V. , publisher =. Game-theoretic foundations for probability and finance , year =
-
[22]
Schennach, S. , journal =. Bayesian exponentially tilted empirical likelihood , year =
- [23]
- [24]
- [25]
- [26]
-
[27]
Blackwell, D. and MacQueen, J. , journal =. Ferguson distributions via. 1973 , number =
work page 1973
-
[28]
Vaart, A. and Wellner, J. , year = 2023, series =. Weak. doi:10.1007/978-3-031-29040-4 , urldate =
-
[29]
Convex Analysis , editor =
-
[30]
Hall, P. and Heyde, C. , publisher =. Martingale limit theory and its application , year =
-
[31]
Fournier, N. and Guillin, A. , journal =. On the rate of convergence in. 2015 , number =
work page 2015
-
[32]
Orabona, F. and Jun, K. , journal =. Tight concentrations and confidence sequences from the regret of universal portfolio , year =
- [33]
-
[34]
Liu, C. and Cardoso, \^. Datasets for Online Controlled Experiments , volume =. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks , editor =
-
[35]
arXiv preprint arXiv:2008.07146 , year=
Large-scale Open Dataset, Pipeline, and Benchmark for Bandit Algorithms , author=. arXiv preprint arXiv:2008.07146 , year=
-
[36]
Goldberg, K. and Roeder, T. and Gupta, D. and Perkins, C. , year = 2001, journal =. Eigentaste:. doi:10.1023/A:1011419012209 , urldate =
-
[37]
Harper, F. and Konstan, J. , year = 2015, journal =. The. doi:10.1145/2827872 , urldate =
-
[38]
Kalyanakrishnan, S. and Tewari, A. and Auer, P. and Stone, P. , year = 2012, journal =
work page 2012
-
[39]
Kaufmann, E. and Kalyanakrishnan, S. , editor =. Information. Proceedings of the 26th Annual Conference on Learning Theory , series =
-
[40]
Chen, J. and Variyath, A. and Abraham, B. , year = 2008, month = jun, journal =. Adjusted. doi:10.1198/106186008X321068 , urldate =
-
[41]
Electronic Journal of Statistics , volume =
Calibration of the Empirical Likelihood Method for a Vector Mean , author =. Electronic Journal of Statistics , volume =. doi:10.1214/09-EJS518 , urldate =
-
[42]
The Annals of Statistics , volume =
Adjusted Empirical Likelihood with High-Order Precision , author =. The Annals of Statistics , volume =. doi:10.1214/09-AOS750 , urldate =. 1010.0313 , primaryclass =
-
[43]
Adjusted Exponentially Tilted Likelihood with Applications to Brain Morphology , author =. Biometrics , volume =. doi:10.1111/j.1541-0420.2008.01124.x , langid =
-
[44]
Power-One Tests Based on Sample Sums
Lai, T. , year = 1976, month = mar, journal =. On. doi:10.1214/aos/1176343406 , urldate =
-
[45]
Robbins, H. and Siegmund, D. , year = 1970, journal =. Boundary Crossing Probabilities for the
work page 1970
-
[46]
Proceedings of the National Academy of Sciences , volume =
Confidence Sequences for Mean, Variance, and Median , author =. Proceedings of the National Academy of Sciences , volume =. doi:10.1073/pnas.58.1.66 , urldate =
- [47]
- [48]
-
[49]
Audibert, J. and Munos, R. and Szepesv. Tuning. Algorithmic. doi:10.1007/978-3-540-75225-7_15 , urldate =
-
[50]
Bennett, G. , year = 1962, journal =. Probability. doi:10.2307/2282438 , urldate =. 2282438 , eprinttype =
-
[51]
, year = 2004, month = apr, journal =
Bentkus, V. , year = 2004, month = apr, journal =. On. doi:10.1214/009117904000000360 , urldate =
-
[52]
Maurer, A. and Pontil, M. , year = 2009, month = jul, urldate =. Empirical. Annual
work page 2009
-
[53]
Hoeffding, W. , year = 1963, journal =. Probability. doi:10.2307/2282952 , urldate =. 2282952 , eprinttype =
- [54]
-
[55]
Australian & New Zealand Journal of Statistics , volume =
Further Properties of Frequentist Confidence Intervals in Regression That Utilize Uncertain Prior Information , author =. Australian & New Zealand Journal of Statistics , volume =
- [56]
-
[57]
Journal of the American Statistical Association , volume =
Length of Confidence Intervals , author =. Journal of the American Statistical Association , volume =
-
[58]
The Annals of Mathematical Statistics , pages =
Shorter Confidence Intervals for the Mean of a Normal Distribution with Known Variance , author =. The Annals of Mathematical Statistics , pages =
-
[59]
Adaptive Multigroup Confidence Intervals with Constant Coverage , author =. Biometrika , volume =
-
[60]
Brown, L. and Casella, G. and Hwang, J. , year = 1995, journal =. Optimal Confidence Sets, Bioequivalence, and the Limacon of
work page 1995
-
[61]
Cortinovis, S. and Caron, F. , year = 2024, journal =. Bayes-Assisted. 2410.20169 , archiveprefix =
-
[62]
Statistics & Probability Letters , volume =
Confidence Intervals for the Normal Mean Utilizing Prior Information , author =. Statistics & Probability Letters , volume =
-
[63]
Electronic Journal of Statistics , volume =
Exact Adaptive Confidence Intervals for Linear Regression Coefficients , author =. Electronic Journal of Statistics , volume =
-
[64]
Statistical Science , volume =
Game-Theoretic Statistics and Safe Anytime-Valid Inference , author =. Statistical Science , volume =. 2023 , doi =
work page 2023
- [65]
-
[66]
The Annals of Statistics , volume =
Time-Uniform, Nonparametric, Nonasymptotic Confidence Sequences , author =. The Annals of Statistics , volume =. doi:10.1214/20-AOS1991 , urldate =. 1810.08240 , primaryclass =
- [67]
-
[68]
Journal of Statistical Planning and Inference , volume =
Bayesian Inference with Misspecified Models , author =. Journal of Statistical Planning and Inference , volume =. doi:10.1016/j.jspi.2013.05.013 , urldate =
-
[69]
Bissiri, P. and Holmes, C. and Walker, S. , year = 2016, month = nov, journal =. A. doi:10.1111/rssb.12158 , urldate =. 1306.6430 , primaryclass =
-
[70]
Holm, S. , year = 1979, journal =. A. 4615733 , eprinttype =
work page 1979
-
[71]
Wilcoxon, F. , year = 1945, journal =. Individual. 3001968 , eprinttype =
work page 1945
-
[72]
, year = 2015, month = jan, series =
Santambrogio, F. , year = 2015, month = jan, series =. Optimal
work page 2015
-
[73]
Cover, T. and Ordentlich, E. , journal =. Universal Portfolios with Side Information , year =
-
[74]
Breiman, L. , booktitle =. Optimal Gambling Systems for Favorable Games , year =
-
[75]
arXiv preprint arXiv:2502.04294 , year=
Prediction-powered e-values , author=. arXiv preprint arXiv:2502.04294 , year=
-
[76]
Monthly Notices of the Royal Astronomical Society , volume=
Galaxy Zoo 2: detailed morphological classifications for 304 122 galaxies from the Sloan Digital Sky Survey , author=. Monthly Notices of the Royal Astronomical Society , volume=. 2013 , publisher=
work page 2013
-
[77]
Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
Deep residual learning for image recognition , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
-
[78]
https://arxiv.org/abs/2212.08037
Attributed question answering: Evaluation and modeling for attributed large language models , author=. arXiv preprint arXiv:2212.08037 , year=
-
[79]
An introduction to sequential Monte Carlo , author=. 2020 , publisher=
work page 2020
- [80]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.