pith. machine review for the scientific record. sign in

arxiv: 2605.07964 · v2 · submitted 2026-05-08 · 📊 stat.ML · cs.LG

Recognition: no theorem link

Asymptotically Log-Optimal Bayes-Assisted Confidence Sequences for Bounded Means

Fran\c{c}ois Caron, Stefano Cortinovis, Valentin Kilian

Pith reviewed 2026-05-12 03:17 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords confidence sequencestest martingalesBayesian predictive modelsWasserstein consistencyasymptotic log-optimalitybounded meansanytime-valid inferencesequential inference
0
0 comments X

The pith

A Bayesian predictive model yields asymptotically log-optimal confidence sequences for bounded means that stay valid under misspecification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a Bayes-assisted method to build time-uniform confidence sequences for the mean of bounded independent observations. It uses a working predictive distribution to select, at each step and for each candidate mean, the valid martingale update that maximizes expected log-growth. Validity is guaranteed regardless of whether the predictive model or prior is correct. The central theorem states that if the predictive distribution is consistent in the Wasserstein metric, the resulting sequences achieve the same per-sample log-growth rate as an oracle procedure that knows the true distribution. This lets users incorporate prior information to tighten bounds and reduce sampling needs in sequential tasks while preserving exact coverage guarantees.

Core claim

The authors prove that a Bayes-assisted construction of confidence sequences, which adaptively chooses among valid one-step martingale factors the update maximizing predictive expected log-growth for each candidate mean and time point, is asymptotically log-optimal whenever the working predictive distribution converges in Wasserstein distance to the true data-generating distribution. This optimality means the sequences match the per-sample log-growth of an oracle with access to the true distribution. The framework preserves exact validity for any prior or predictive model, relying only on the observations being IID and bounded.

What carries the argument

The adaptive selection, for each time and candidate mean, of the valid one-step martingale factor that maximizes expected log-growth under the Bayesian predictive distribution.

If this is right

  • Informative priors produce narrower confidence sequences than non-adaptive baselines.
  • The approach reduces the number of samples needed for tasks such as sequential best-arm identification.
  • It maintains anytime-valid coverage in prediction-powered inference settings.
  • Robust instantiations such as Dirichlet-process mixtures and Bayesian exponentially tilted empirical likelihood yield practical implementations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar predictive-assisted selection could be applied to other parameters if suitable families of martingales exist.
  • The validity-under-misspecification property makes the method suitable for real data streams where the true distribution is unknown but bounded.
  • Extensions to dependent observations would require adjusted consistency conditions on the predictive model.

Load-bearing premise

The working predictive distribution must converge in Wasserstein distance to the true distribution for the asymptotic log-optimality result to hold.

What would settle it

Generate repeated samples from a known bounded distribution, feed a Wasserstein-consistent predictive such as the empirical measure into the procedure, and verify whether the average log-growth rate of the constructed sequences approaches the rate achieved by an oracle that uses the true distribution directly.

read the original abstract

Confidence sequences based on test martingales provide time-uniform uncertainty quantification for the mean of bounded IID observations without parametric distributional assumptions. Their practical efficiency, however, depends strongly on the choice of martingale updates, and many existing constructions do not exploit prior information about plausible data-generating distributions or mean values. We propose a Bayes-assisted framework that uses a Bayesian working predictive model to adaptively construct confidence sequences. For each candidate mean and time point, the predictive distribution selects, among valid one-step martingale factors, the update maximising predictive expected log-growth; validity is therefore preserved even when the prior or working model is misspecified. We prove that if the predictive distribution is Wasserstein-consistent, the resulting procedure is asymptotically log-optimal, matching the per-sample log-growth of an oracle procedure with access to the true distribution. We instantiate the framework using robust predictives based on Dirichlet-process mixtures and Bayesian exponentially tilted empirical likelihood. Experiments on synthetic data, sequential best-arm identification for LLM evaluation, and prediction-powered inference show that informative priors can substantially reduce confidence-sequence width and sampling effort while retaining anytime-valid coverage.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The manuscript proposes a Bayes-assisted framework for constructing time-uniform confidence sequences for the mean of bounded IID observations. A Bayesian working predictive distribution is used to select, at each time and candidate mean, the valid one-step martingale factor that maximizes predictive expected log-growth. Validity is preserved under misspecification of the prior or predictive model. The central theoretical result states that Wasserstein consistency of the predictive distribution implies asymptotic log-optimality, in the sense that the per-sample log-growth rate matches that of an oracle procedure with access to the true data-generating distribution. The framework is instantiated with Dirichlet-process mixture predictives and Bayesian exponentially tilted empirical likelihood; experiments on synthetic data, sequential best-arm identification for LLM evaluation, and prediction-powered inference illustrate reduced width and sampling effort.

Significance. If the asymptotic optimality result holds, the work provides a principled bridge between Bayesian predictive modeling and frequentist anytime-valid inference, allowing informative priors to improve efficiency without sacrificing coverage guarantees. The explicit use of Wasserstein consistency as the sufficient condition for matching oracle log-growth is a clean and falsifiable contribution. Practical instantiations with robust nonparametric predictives and the reported experiments on real-world sequential tasks strengthen the case for adoption. The manuscript ships a clear statement of the consistency assumption and demonstrates that validity does not require correctness of the working model.

major comments (1)
  1. [§3] §3 (asymptotic optimality theorem): the proof sketch relies on Wasserstein consistency implying convergence of the selected log-growth rates to the oracle rate; it is not immediately clear whether the argument requires uniform integrability or a specific rate of convergence in Wasserstein distance to control the per-sample limit, or whether the result is only in probability rather than almost surely.
minor comments (3)
  1. [§2] The definition of the one-step martingale factor selection criterion (predictive expected log-growth) would benefit from an explicit equation number and a short derivation showing why it remains a valid test martingale even under misspecification.
  2. [§5] In the experimental section, the synthetic data figures would be clearer if the oracle width were plotted alongside the Bayes-assisted and baseline sequences for direct visual comparison of the asymptotic gap.
  3. [§4] A brief remark on computational cost of the Dirichlet-process mixture predictive (e.g., number of particles or truncation level) would help readers assess practicality for large-scale sequential tasks.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment and the detailed comment on the asymptotic optimality theorem. We address the concern below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [§3] §3 (asymptotic optimality theorem): the proof sketch relies on Wasserstein consistency implying convergence of the selected log-growth rates to the oracle rate; it is not immediately clear whether the argument requires uniform integrability or a specific rate of convergence in Wasserstein distance to control the per-sample limit, or whether the result is only in probability rather than almost surely.

    Authors: We appreciate the referee highlighting the need for greater precision in the proof of asymptotic log-optimality. The argument establishes almost-sure convergence of the per-sample log-growth rate to the oracle rate. Wasserstein consistency is assumed to hold almost surely (as is standard), and the per-sample limit is taken along the same almost-sure event. Because the observations are bounded in [0,1], all admissible one-step log-growth rates are uniformly bounded by a constant independent of the data and of the predictive distribution. This boundedness directly supplies the uniform integrability required to interchange the limit and the predictive expectation when selecting the martingale factor, without needing any additional moment conditions. No quantitative rate of Wasserstein convergence is imposed beyond the consistency assumption itself, because the result concerns the limsup of the average log-growth as n→∞. We will revise §3 to state these points explicitly, including a short paragraph on the role of boundedness in securing uniform integrability and confirming that the convergence holds almost surely. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained under external assumption

full rationale

The central claim is a theorem establishing that Wasserstein consistency of any predictive distribution implies asymptotic per-sample log-optimality of the resulting confidence sequence (matching an oracle with the true distribution). This is an implication proved under an explicitly stated external condition on the predictive model, not a self-referential construction. Validity of the sequences holds independently of model correctness or misspecification. No load-bearing steps reduce by definition or by self-citation to the target result; the selection of martingale factors via predictive log-growth is a construction that preserves validity by design and whose optimality is derived conditionally on the consistency assumption rather than fitted or renamed from inputs. The framework does not rely on uniqueness theorems from the authors' prior work or smuggle ansatzes via citation. This is the normal case of a non-circular proof.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper relies on standard properties of martingales and introduces a selection criterion based on predictive log-growth, with no new free parameters or invented entities.

axioms (2)
  • standard math Bounded IID observations allow construction of test martingales for confidence sequences
    This is a foundational assumption in the field of sequential analysis.
  • domain assumption Wasserstein consistency of the predictive distribution leads to asymptotic log-optimality
    This is the key condition stated for the main theoretical result.

pith-pipeline@v0.9.0 · 5499 in / 1349 out tokens · 59461 ms · 2026-05-12T03:17:10.090926+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

91 extracted references · 91 canonical work pages · 1 internal anchor

  1. [1]

    The Bell System Technical Journal , volume =

    A New Interpretation of Information Rate , author =. The Bell System Technical Journal , volume =. 1956 , doi =

  2. [2]

    Foundations and Trends in Statistics , volume =

    Hypothesis Testing with E-values , author =. Foundations and Trends in Statistics , volume =. 2025 , doi =

  3. [3]

    The Annals of Mathematical Statistics , volume =

    Statistical Methods Related to the Law of the Iterated Logarithm , author =. The Annals of Mathematical Statistics , volume =. 1970 , doi =

  4. [4]

    Journal of Machine Learning Research , volume =

    Mixture Martingales Revisited with Applications to Sequential Tests and Confidence Intervals , author =. Journal of Machine Learning Research , volume =. 2021 , url =

  5. [5]

    Proceedings of the 41st International Conference on Machine Learning , series =

    Gambling-Based Confidence Sequences for Bounded Random Vectors , author =. Proceedings of the 41st International Conference on Machine Learning , series =

  6. [6]

    Stochastic Processes and their Applications , volume =

    Sequential Optimizing Strategy in Multi-Dimensional Bounded Forecasting Games , author =. Stochastic Processes and their Applications , volume =. 2011 , doi =

  7. [7]

    Science , volume=

    Prediction-powered inference , author=. Science , volume=. 2023 , publisher=

  8. [8]

    and den Bulte, C

    Berman, R. and den Bulte, C. , year = 2021, month = dec, journal =. False. doi:10.1287/mnsc.2021.4207 , urldate =

  9. [9]

    and Thompson, D

    Horvitz, D. and Thompson, D. , year = 1952, journal =. A. doi:10.2307/2280784 , urldate =. 2280784 , eprinttype =

  10. [10]

    and Deng, A

    Kohavi, R. and Deng, A. and Vermeer, L. , year = 2022, month = aug, series =. A/. Proceedings of the 28th. doi:10.1145/3534678.3539160 , urldate =

  11. [11]

    and Longbotham, R

    Kohavi, R. and Longbotham, R. , year = 2023, pages =. Online. Encyclopedia of. doi:10.1007/978-1-4899-7502-7_891-2 , urldate =

  12. [12]

    Journal of Educational Psychology , volume =

    Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies , author =. Journal of Educational Psychology , volume =

  13. [13]

    Statistical Science , volume =

    On the. Statistical Science , volume =. 2245382 , eprinttype =

  14. [14]

    , journal =

    Clerico, E. , journal =. On the optimality of coin-betting for mean estimation , year =

  15. [15]

    and Walker, S

    Lyddon, S. and Walker, S. and Holmes, C. , journal =. Nonparametric learning from. 2018 , volume =

  16. [16]

    and Ho, N

    Bariletto, N. and Ho, N. , journal =. Bayesian nonparametrics meets data-driven distributionally robust optimization , year =

  17. [17]

    and Ramdas, A

    Waudby-Smith, I. and Ramdas, A. , journal =. Estimating means of bounded random variables by betting , year =

  18. [18]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , title =

    Gr. Journal of the Royal Statistical Society Series B: Statistical Methodology , title =. 2024 , number =

  19. [19]

    and Hoff, P

    Kessler, D. and Hoff, P. and Dunson, D. , journal =. Marginally specified priors for non-parametric Bayesian estimation , year =

  20. [20]

    and Takemura, A

    Kumon, M. and Takemura, A. and Takeuchi, K. , journal =. Capital process and optimality properties of a. 2008 , number =

  21. [21]

    and Vovk, V

    Shafer, G. and Vovk, V. , publisher =. Game-theoretic foundations for probability and finance , year =

  22. [22]

    , journal =

    Schennach, S. , journal =. Bayesian exponentially tilted empirical likelihood , year =

  23. [23]

    , journal =

    Lazar, N. , journal =. Bayesian empirical likelihood , year =

  24. [24]

    , publisher =

    Owen, A. , publisher =. Empirical likelihood , year =

  25. [25]

    , journal =

    Antoniak, C. , journal =. Mixtures of. 1974 , pages =

  26. [26]

    , journal =

    Ferguson, T. , journal =. A. 1973 , pages =

  27. [27]

    and MacQueen, J

    Blackwell, D. and MacQueen, J. , journal =. Ferguson distributions via. 1973 , number =

  28. [28]

    and Wellner, J

    Vaart, A. and Wellner, J. , year = 2023, series =. Weak. doi:10.1007/978-3-031-29040-4 , urldate =

  29. [29]

    Convex Analysis , editor =

  30. [30]

    and Heyde, C

    Hall, P. and Heyde, C. , publisher =. Martingale limit theory and its application , year =

  31. [31]

    and Guillin, A

    Fournier, N. and Guillin, A. , journal =. On the rate of convergence in. 2015 , number =

  32. [32]

    and Jun, K

    Orabona, F. and Jun, K. , journal =. Tight concentrations and confidence sequences from the regret of universal portfolio , year =

  33. [33]

    , journal =

    Cover, T. , journal =. Universal portfolios , year =

  34. [34]

    and Cardoso, \^

    Liu, C. and Cardoso, \^. Datasets for Online Controlled Experiments , volume =. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks , editor =

  35. [35]

    arXiv preprint arXiv:2008.07146 , year=

    Large-scale Open Dataset, Pipeline, and Benchmark for Bandit Algorithms , author=. arXiv preprint arXiv:2008.07146 , year=

  36. [36]

    and Roeder, T

    Goldberg, K. and Roeder, T. and Gupta, D. and Perkins, C. , year = 2001, journal =. Eigentaste:. doi:10.1023/A:1011419012209 , urldate =

  37. [37]

    Maxwell Harper and Joseph A

    Harper, F. and Konstan, J. , year = 2015, journal =. The. doi:10.1145/2827872 , urldate =

  38. [38]

    and Tewari, A

    Kalyanakrishnan, S. and Tewari, A. and Auer, P. and Stone, P. , year = 2012, journal =

  39. [39]

    and Kalyanakrishnan, S

    Kaufmann, E. and Kalyanakrishnan, S. , editor =. Information. Proceedings of the 26th Annual Conference on Learning Theory , series =

  40. [40]

    and Variyath, A

    Chen, J. and Variyath, A. and Abraham, B. , year = 2008, month = jun, journal =. Adjusted. doi:10.1198/106186008X321068 , urldate =

  41. [41]

    Electronic Journal of Statistics , volume =

    Calibration of the Empirical Likelihood Method for a Vector Mean , author =. Electronic Journal of Statistics , volume =. doi:10.1214/09-EJS518 , urldate =

  42. [42]

    The Annals of Statistics , volume =

    Adjusted Empirical Likelihood with High-Order Precision , author =. The Annals of Statistics , volume =. doi:10.1214/09-AOS750 , urldate =. 1010.0313 , primaryclass =

  43. [43]

    Biometrics , volume =

    Adjusted Exponentially Tilted Likelihood with Applications to Brain Morphology , author =. Biometrics , volume =. doi:10.1111/j.1541-0420.2008.01124.x , langid =

  44. [44]

    Power-One Tests Based on Sample Sums

    Lai, T. , year = 1976, month = mar, journal =. On. doi:10.1214/aos/1176343406 , urldate =

  45. [45]

    and Siegmund, D

    Robbins, H. and Siegmund, D. , year = 1970, journal =. Boundary Crossing Probabilities for the

  46. [46]

    Proceedings of the National Academy of Sciences , volume =

    Confidence Sequences for Mean, Variance, and Median , author =. Proceedings of the National Academy of Sciences , volume =. doi:10.1073/pnas.58.1.66 , urldate =

  47. [47]

    , title =

    Bernshtein, S. , title =. 1927 , publisher =

  48. [48]

    and Vovk, V

    Shafer, G. and Vovk, V. , publisher =

  49. [49]

    and Munos, R

    Audibert, J. and Munos, R. and Szepesv. Tuning. Algorithmic. doi:10.1007/978-3-540-75225-7_15 , urldate =

  50. [50]

    , year = 1962, journal =

    Bennett, G. , year = 1962, journal =. Probability. doi:10.2307/2282438 , urldate =. 2282438 , eprinttype =

  51. [51]

    , year = 2004, month = apr, journal =

    Bentkus, V. , year = 2004, month = apr, journal =. On. doi:10.1214/009117904000000360 , urldate =

  52. [52]

    and Pontil, M

    Maurer, A. and Pontil, M. , year = 2009, month = jul, urldate =. Empirical. Annual

  53. [53]

    , year = 1963, journal =

    Hoeffding, W. , year = 1963, journal =. Probability. doi:10.2307/2282952 , urldate =. 2282952 , eprinttype =

  54. [54]

    Bernoulli

    Bayes-Optimal Prediction with Frequentist Coverage Control , author =. Bernoulli. Official Journal of the Bernoulli Society for Mathematical Statistics and Probability , volume =

  55. [55]

    Australian & New Zealand Journal of Statistics , volume =

    Further Properties of Frequentist Confidence Intervals in Regression That Utilize Uncertain Prior Information , author =. Australian & New Zealand Journal of Statistics , volume =

  56. [56]

    Stat , volume =

    Confidence Intervals That Utilize Sparsity , author =. Stat , volume =

  57. [57]

    Journal of the American Statistical Association , volume =

    Length of Confidence Intervals , author =. Journal of the American Statistical Association , volume =

  58. [58]

    The Annals of Mathematical Statistics , pages =

    Shorter Confidence Intervals for the Mean of a Normal Distribution with Known Variance , author =. The Annals of Mathematical Statistics , pages =

  59. [59]

    Biometrika , volume =

    Adaptive Multigroup Confidence Intervals with Constant Coverage , author =. Biometrika , volume =

  60. [60]

    and Casella, G

    Brown, L. and Casella, G. and Hwang, J. , year = 1995, journal =. Optimal Confidence Sets, Bioequivalence, and the Limacon of

  61. [61]

    and Caron, F

    Cortinovis, S. and Caron, F. , year = 2024, journal =. Bayes-Assisted. 2410.20169 , archiveprefix =

  62. [62]

    Statistics & Probability Letters , volume =

    Confidence Intervals for the Normal Mean Utilizing Prior Information , author =. Statistics & Probability Letters , volume =

  63. [63]

    Electronic Journal of Statistics , volume =

    Exact Adaptive Confidence Intervals for Linear Regression Coefficients , author =. Electronic Journal of Statistics , volume =

  64. [64]

    Statistical Science , volume =

    Game-Theoretic Statistics and Safe Anytime-Valid Inference , author =. Statistical Science , volume =. 2023 , doi =

  65. [65]

    , year = 1939, publisher =

    Ville, J. , year = 1939, publisher =

  66. [66]

    The Annals of Statistics , volume =

    Time-Uniform, Nonparametric, Nonasymptotic Confidence Sequences , author =. The Annals of Statistics , volume =. doi:10.1214/20-AOS1991 , urldate =. 1810.08240 , primaryclass =

  67. [67]

    , journal =

    Wald, A. , journal =. Sequential Tests of Statistical Hypotheses , year =

  68. [68]

    Journal of Statistical Planning and Inference , volume =

    Bayesian Inference with Misspecified Models , author =. Journal of Statistical Planning and Inference , volume =. doi:10.1016/j.jspi.2013.05.013 , urldate =

  69. [69]

    and Holmes, C

    Bissiri, P. and Holmes, C. and Walker, S. , year = 2016, month = nov, journal =. A. doi:10.1111/rssb.12158 , urldate =. 1306.6430 , primaryclass =

  70. [70]

    , year = 1979, journal =

    Holm, S. , year = 1979, journal =. A. 4615733 , eprinttype =

  71. [71]

    , year = 1945, journal =

    Wilcoxon, F. , year = 1945, journal =. Individual. 3001968 , eprinttype =

  72. [72]

    , year = 2015, month = jan, series =

    Santambrogio, F. , year = 2015, month = jan, series =. Optimal

  73. [73]

    and Ordentlich, E

    Cover, T. and Ordentlich, E. , journal =. Universal Portfolios with Side Information , year =

  74. [74]

    , booktitle =

    Breiman, L. , booktitle =. Optimal Gambling Systems for Favorable Games , year =

  75. [75]

    arXiv preprint arXiv:2502.04294 , year=

    Prediction-powered e-values , author=. arXiv preprint arXiv:2502.04294 , year=

  76. [76]

    Monthly Notices of the Royal Astronomical Society , volume=

    Galaxy Zoo 2: detailed morphological classifications for 304 122 galaxies from the Sloan Digital Sky Survey , author=. Monthly Notices of the Royal Astronomical Society , volume=. 2013 , publisher=

  77. [77]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Deep residual learning for image recognition , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  78. [78]

    https://arxiv.org/abs/2212.08037

    Attributed question answering: Evaluation and modeling for attributed large language models , author=. arXiv preprint arXiv:2212.08037 , year=

  79. [79]

    2020 , publisher=

    An introduction to sequential Monte Carlo , author=. 2020 , publisher=

  80. [80]

    and Caron, F

    Cortinovis, S. and Caron, F. , booktitle =

Showing first 80 references.