MINTS: Minimalist Thompson Sampling

Kaizheng Wang

arxiv: 2606.01655 · v1 · pith:UXWGNI5Znew · submitted 2026-06-01 · 🧮 math.OC · cs.AI· cs.LG· stat.ML

MINTS: Minimalist Thompson Sampling

Kaizheng Wang This is my paper

Pith reviewed 2026-06-28 13:46 UTC · model grok-4.3

classification 🧮 math.OC cs.AIcs.LGstat.ML

keywords multi-armed banditsThompson samplingprofile likelihoodregret boundsstructural constraintsasymptotic optimalityunimodal structure

0 comments

The pith

A minimalist prior placed only on the location of the optimum yields a generalized posterior that attains the Lai-Robbins regret constant in unstructured bandits and adapts to unimodal structure.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a Bayesian framework for sequential decisions that avoids specifying full probabilistic models for every parameter. It places a prior solely on the identity of the optimal arm and removes the remaining parameters through profile likelihood, producing a generalized posterior. This posterior directly incorporates mean constraints such as unimodality. The resulting MINTS algorithm delivers near-optimal finite-time regret bounds together with sharp asymptotic characterizations. A reader would care because the construction recovers classical optimality results while automatically exploiting structure without additional modeling effort.

Core claim

By restricting the prior to the location of the optimum and eliminating nuisance parameters via profile likelihood, the generalized posterior accommodates structural constraints on arm means. The associated Thompson sampling procedure, MINTS, attains the Lai-Robbins constant in the unstructured multi-armed bandit setting and the sharp constant determined solely by the immediate neighbors of the optimal arm under unimodal structure, while also providing near-optimal non-asymptotic regret guarantees.

What carries the argument

The minimalist framework that places a prior only on the location of the optimum while eliminating nuisance parameters through profile likelihood to obtain a generalized posterior.

If this is right

In the absence of structure the algorithm recovers the information-theoretically optimal asymptotic regret rate.
Under unimodal mean constraints the regret rate depends only on the two neighboring arms of the optimum.
Near-optimal finite-time regret bounds hold uniformly over the class of problems with mean constraints.
The same construction applies to other structural constraints without redesigning the sampling rule.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same profile-likelihood reduction could be applied to other sequential decision problems whose constraints act only on the location of the optimum.
Non-asymptotic bounds derived for MINTS may supply concrete constants usable in practical hyper-parameter tuning for constrained bandits.
Extensions to continuous action spaces or contextual settings would require verifying that the profile likelihood step remains tractable.

Load-bearing premise

Profile likelihood applied after placing a prior solely on the optimum's location is sufficient to produce a generalized posterior that respects mean constraints without further modeling.

What would settle it

An explicit calculation or long-horizon simulation in which the almost-sure asymptotic regret rate of MINTS exceeds the Lai-Robbins constant in an unstructured instance would refute the central claim.

Figures

Figures reproduced from arXiv: 2606.01655 by Kaizheng Wang.

read the original abstract

The Bayesian paradigm offers principled tools for sequential decision-making under uncertainty, but its reliance on a probabilistic model for all parameters can hinder the incorporation of complex structural constraints. We introduce a minimalist Bayesian framework that places a prior only on the location of the optimum, while eliminating nuisance parameters through profile likelihood. This yields a generalized posterior that naturally accommodates structural constraints. As a direct instantiation, we develop MINimalist Thompson Sampling (MINTS). For multi-armed bandits with mean constraints, we establish near-optimal non-asymptotic regret guarantees and sharp almost-sure asymptotic regret characterizations. In particular, MINTS attains the classical Lai--Robbins constant in the unstructured setting and automatically adapts to unimodal structure, achieving the sharp constant determined only by the immediate neighbors of the optimal arm.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MINTS claims sharp Lai-Robbins regret plus automatic unimodal adaptation through a profile-likelihood generalized posterior, but the rate preservation under constraints is the part that needs direct verification.

read the letter

MINTS is a minimalist take on Thompson sampling for multi-armed bandits with mean constraints. It puts a prior only on the argmax and profiles out the nuisance parameters to form a generalized posterior. This is presented as a way to handle structural constraints without a full model on every parameter.

The new part is this construction and the claim that it recovers the classical Lai-Robbins constant without constraints and then switches to the neighbor-only constant under unimodality. The non-asymptotic guarantees are also stated as near-optimal.

This framework has appeal because it keeps the Bayesian flavor while making constraints easier to incorporate. The automatic adaptation sounds useful if the math works out.

The soft spot is the asymptotic rate. The concern is that profile likelihood, being a point estimate, may not deliver the exact same large-deviation rate as a full posterior when the means are coupled by the constraint. The paper asserts both the standard and the adapted sharp constants, so it needs to verify that the generalized posterior sampling probabilities produce the right information rate. Without the derivation, it's difficult to see how the profile likelihood ensures the sampling prob has the precise KL to the optimal arm. In the unstructured case it should reduce to standard, but with unimodality the constraint set changes the effective rate function, and the paper must show it matches the known sharp constant exactly.

The paper is aimed at people doing theoretical work on constrained bandits. A reader focused on regret bounds would find the adaptation claim worth examining. It deserves a serious referee because the idea is distinct and the claims are testable.

I would recommend sending it out for review to get the details checked.

Referee Report

2 major / 1 minor

Summary. The paper introduces MINTS, a minimalist Bayesian Thompson Sampling algorithm that places a prior solely on the argmax location and eliminates nuisance parameters via profile likelihood to obtain a generalized posterior. This framework is instantiated for multi-armed bandits with mean constraints, with claims of near-optimal non-asymptotic regret bounds and sharp almost-sure asymptotic characterizations: it recovers the classical Lai-Robbins constant in the unstructured case and automatically adapts to unimodal structure to achieve the sharp constant determined only by the immediate neighbors of the optimal arm.

Significance. If the central claims on regret rates hold, the minimalist profile-likelihood approach offers a technically clean route to incorporating structural constraints without a full probabilistic model on all parameters, which could simplify analysis for constrained bandits while preserving sharp information-theoretic constants.

major comments (2)

[Abstract / theoretical analysis section] The abstract asserts that the generalized posterior produces sampling probabilities whose large-deviation rate exactly matches the KL divergence to the optimal arm (both for the unstructured Lai-Robbins constant and the neighbor-only unimodal constant). Because profile likelihood yields a point estimate rather than a marginal, an explicit lemma or derivation is required showing that the resulting rate function remains identical to the full-posterior case when the constraint set couples the means; without this step the almost-sure logarithmic coefficient is not guaranteed.
[Theoretical results section] The non-asymptotic regret guarantees are stated to be near-optimal, yet the manuscript supplies no derivation details, proof sketches, or verification steps for how the profile-likelihood posterior translates into the claimed finite-time bounds under mean constraints.

minor comments (1)

[Method section] Notation for the profile likelihood and the resulting generalized posterior should be introduced with an explicit contrast to the standard Bayesian posterior to prevent reader confusion.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our manuscript. We address each major comment below and will incorporate revisions to strengthen the theoretical presentation.

read point-by-point responses

Referee: [Abstract / theoretical analysis section] The abstract asserts that the generalized posterior produces sampling probabilities whose large-deviation rate exactly matches the KL divergence to the optimal arm (both for the unstructured Lai-Robbins constant and the neighbor-only unimodal constant). Because profile likelihood yields a point estimate rather than a marginal, an explicit lemma or derivation is required showing that the resulting rate function remains identical to the full-posterior case when the constraint set couples the means; without this step the almost-sure logarithmic coefficient is not guaranteed.

Authors: We agree that an explicit derivation is needed to confirm the large-deviation rate of the generalized posterior matches the KL divergence under mean constraints. The current manuscript states the asymptotic result but does not isolate this step as a standalone lemma. In the revision we will add a dedicated lemma in the theoretical analysis section deriving the rate function for the profile-likelihood posterior, showing equivalence to the full-posterior case even when constraints couple the means. This will directly support the claimed Lai-Robbins and neighbor-based constants. revision: yes
Referee: [Theoretical results section] The non-asymptotic regret guarantees are stated to be near-optimal, yet the manuscript supplies no derivation details, proof sketches, or verification steps for how the profile-likelihood posterior translates into the claimed finite-time bounds under mean constraints.

Authors: We acknowledge that the non-asymptotic bounds are asserted without accompanying proof sketches or key derivation steps. The manuscript focuses on the final statements rather than the intermediate arguments linking the generalized posterior to the finite-time regret. In the revision we will add a proof-sketch subsection (or appendix) outlining the main steps: concentration of the profile-likelihood posterior, the resulting sampling probabilities, and how these yield the near-optimal regret bound under the mean constraints. revision: yes

Circularity Check

0 steps flagged

No circularity: theoretical regret analysis stands independently of inputs

full rationale

The provided abstract and context describe a new minimalist Bayesian framework using profile likelihood to eliminate nuisance parameters, followed by derivation of non-asymptotic regret bounds and almost-sure asymptotic characterizations that match known Lai-Robbins and unimodal constants. No equations or steps are shown that reduce a claimed prediction or constant to a fitted quantity or self-citation by construction. The central results are presented as outcomes of theoretical analysis on the generalized posterior, with no load-bearing self-citation chains or ansatz smuggling visible. This is the common case of a self-contained theoretical contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no identifiable free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5653 in / 1022 out tokens · 24044 ms · 2026-06-28T13:46:06.606129+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

85 extracted references · 5 canonical work pages

[1]

arXiv preprint arXiv:2601.21131 , year=

Thompson sampling: Precise arm-pull dynamics and adaptive inference , author=. arXiv preprint arXiv:2601.21131 , year=

work page arXiv
[2]

Operations Research , volume=

The fragility of optimized bandit algorithms , author=. Operations Research , volume=. 2025 , publisher=

2025
[3]

Ucb algorithms for multi-armed bandits: Precise regret and adaptive inference.arXiv preprint arXiv:2412.06126, 2024

UCB algorithms for multi-armed bandits: precise regret and adaptive inference , author=. arXiv preprint arXiv:2412.06126 , year=

work page arXiv
[4]

Advances in Neural Information Processing Systems , volume=

A closer look at the worst-case behavior of multi-armed bandit algorithms , author=. Advances in Neural Information Processing Systems , volume=
[5]

Journal of Computer and System Sciences , volume=

Combinatorial bandits , author=. Journal of Computer and System Sciences , volume=. 2012 , publisher=

2012
[6]

Conference on learning theory , pages=

Sampling from a log-concave distribution with compact support with proximal Langevin Monte Carlo , author=. Conference on learning theory , pages=. 2017 , organization=

2017
[7]

Advances in Neural Information Processing Systems , volume=

Mirrored langevin dynamics , author=. Advances in Neural Information Processing Systems , volume=
[8]

Discrete & Computational Geometry , volume=

Sampling from a log-concave distribution with projected Langevin Monte Carlo , author=. Discrete & Computational Geometry , volume=. 2018 , publisher=

2018
[9]

International conference on machine learning , pages=

Thompson sampling for complex online problems , author=. International conference on machine learning , pages=. 2014 , organization=

2014
[10]

Journal of Machine Learning Research , volume=

An information-theoretic analysis of thompson sampling , author=. Journal of Machine Learning Research , volume=
[11]

Annals of Probability , pages=

Self-Normalized Processes: Exponential Inequalities, Moment Bounds and Iterated Logarithm Laws , author=. Annals of Probability , pages=. 2004 , publisher=

2004
[12]

Working paper , year=

TBD , author=. Working paper , year=
[13]

Foundations and Trends

Graphical models, exponential families, and variational inference , author=. Foundations and Trends. 2008 , publisher=

2008
[14]

Asymptotically efficient adaptive choice of control laws incontrolled

Graves, Todd L and Lai, Tze Leung , journal=. Asymptotically efficient adaptive choice of control laws incontrolled. 1997 , publisher=

1997
[15]

Proceedings of the 24th annual conference on learning theory , pages=

The KL-UCB algorithm for bounded stochastic bandits and beyond , author=. Proceedings of the 24th annual conference on learning theory , pages=. 2011 , organization=

2011
[16]

International Conference on Machine Learning , pages=

Structure adaptive algorithms for stochastic bandits , author=. International Conference on Machine Learning , pages=. 2020 , organization=

2020
[17]

Advances in Neural Information Processing Systems , volume=

Minimal exploration in structured stochastic bandits , author=. Advances in Neural Information Processing Systems , volume=
[18]

Probability in the Engineering and Informational Sciences , volume=

Exploration--exploitation policies with almost sure, arbitrarily slow growing asymptotic regret , author=. Probability in the Engineering and Informational Sciences , volume=. 2020 , publisher=

2020
[19]

arXiv preprint arXiv:2210.05660 , year=

The typical behavior of bandit algorithms , author=. arXiv preprint arXiv:2210.05660 , year=

work page arXiv
[20]

IEEE Transactions on Neural Networks and Learning Systems , volume=

A Thompson sampling algorithm with logarithmic regret for unimodal Gaussian bandit , author=. IEEE Transactions on Neural Networks and Learning Systems , volume=. 2023 , publisher=

2023
[21]

Proceedings of the 28th International Conference on International Conference on Machine Learning , pages=

Unimodal bandits , author=. Proceedings of the 28th International Conference on International Conference on Machine Learning , pages=
[22]

International Conference on Machine Learning , pages=

Unimodal bandits: Regret lower bounds and optimal algorithms , author=. International Conference on Machine Learning , pages=. 2014 , organization=

2014
[23]

2025 , url=

Kaizheng Wang , booktitle=. 2025 , url=

2025
[24]

2025 , url=

Anonymous , booktitle=. 2025 , url=

2025
[25]

American Journal of Mathematics , volume=

On the law of the iterated logarithm , author=. American Journal of Mathematics , volume=. 1941 , publisher=

1941
[26]

Tohoku Mathematical Journal, Second Series , volume=

Weighted sums of certain dependent random variables , author=. Tohoku Mathematical Journal, Second Series , volume=. 1967 , publisher=

1967
[27]

2019 , publisher=

Probability: theory and examples , author=. 2019 , publisher=

2019
[28]

2014 , publisher=

Martingale limit theory and its application , author=. 2014 , publisher=

2014
[29]

Advances in neural information processing systems , volume=

Improved algorithms for linear stochastic bandits , author=. Advances in neural information processing systems , volume=
[30]

Lecture notes-monograph series , pages=

Global versus local search in constrained optimization of computer models , author=. Lecture notes-monograph series , pages=. 1998 , publisher=

1998
[31]

, author=

Bayesian optimization with inequality constraints. , author=. ICML , volume=
[32]

Advances in neural information processing systems , volume=

Algorithms for hyper-parameter optimization , author=. Advances in neural information processing systems , volume=
[33]

Advances in neural information processing systems , volume=

Predictive entropy search for efficient global optimization of black-box functions , author=. Advances in neural information processing systems , volume=
[34]

Journal of Global Optimization , volume=

An informational approach to the global optimization of expensive-to-evaluate functions , author=. Journal of Global Optimization , volume=. 2009 , publisher=

2009
[35]

The Journal of Machine Learning Research , volume=

Entropy search for information-efficient global optimization , author=. The Journal of Machine Learning Research , volume=. 2012 , publisher=

2012
[36]

Statistical learning theory and stochastic optimization: Ecole d'Et

Catoni, Olivier , year=. Statistical learning theory and stochastic optimization: Ecole d'Et
[37]

How to use expert advice , year =

Cesa-Bianchi, Nicol\'. How to use expert advice , year =. doi:10.1145/258128.258179 , journal=

work page doi:10.1145/258128.258179
[38]

, title =

Vovk, Volodimir G. , title =. Proceedings of the Third Annual Workshop on Computational Learning Theory , pages =. 1990 , isbn =

1990
[39]

30th Annual Symposium on Foundations of Computer Science , pages=

The weighted majority algorithm , author=. 30th Annual Symposium on Foundations of Computer Science , pages=. 1989 , organization=

1989
[40]

Management Science , volume=

Optimal learning for structured bandits , author=. Management Science , volume=. 2024 , publisher=

2024
[41]

Surveys in operations research and management science , volume=

Dynamic pricing and learning: Historical origins, current research, and new directions , author=. Surveys in operations research and management science , volume=. 2015 , publisher=

2015
[42]

Operations Research , volume=

Dynamic pricing with a prior on market response , author=. Operations Research , volume=. 2010 , publisher=

2010
[43]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

A general framework for updating belief distributions , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2016 , publisher=

2016
[44]

Joint European Conference on Machine Learning and Knowledge Discovery in Databases , pages=

Bayesian optimization with a prior for the optimum , author=. Joint European Conference on Machine Learning and Knowledge Discovery in Databases , pages=. 2021 , organization=

2021
[45]

Advances in Applied Mathematics , volume=

Asymptotically efficient adaptive allocation rules , author=. Advances in Applied Mathematics , volume=. 1985 , publisher=

1985
[46]

Machine Learning , volume=

Finite-time analysis of the multiarmed bandit problem , author=. Machine Learning , volume=. 2002 , publisher=

2002
[47]

Mathematics of Operations Research , volume=

Explore first, exploit next: The true shape of regret in bandit problems , author=. Mathematics of Operations Research , volume=. 2019 , publisher=

2019
[48]

Foundations and Trends

Regret analysis of stochastic and nonstochastic multi-armed bandit problems , author=. Foundations and Trends. 2012 , publisher=

2012
[49]

2010 , publisher=

Auer, Peter and Ortner, Ronald , journal=. 2010 , publisher=

2010
[50]

2020 , publisher=

Bandit algorithms , author=. 2020 , publisher=

2020
[51]

Kirszbraun, Mojzesz , journal=
[52]

, title =

Minka, Thomas P. , title =. Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence , pages =. 2001 , isbn =

2001
[53]

Journal of the ACM (JACM) , volume=

Solving convex programs by random walks , author=. Journal of the ACM (JACM) , volume=. 2004 , publisher=

2004
[54]

2018 , publisher=

Lectures on convex optimization , author=. 2018 , publisher=

2018
[55]

A modern

Scott, Steven L , journal=. A modern. 2010 , publisher=

2010
[56]

ISBN 9781605580470

Kleinberg, Robert and Slivkins, Aleksandrs and Upfal, Eli , title =. 2008 , isbn =. doi:10.1145/1374376.1374475 , booktitle =

work page doi:10.1145/1374376.1374475 2008
[57]

2012 , publisher=

Theory of statistics , author=. 2012 , publisher=

2012
[58]

Barndorff-Nielsen, O. E. and Cox, D. R. Inference and asymptotics. 1994

1994
[59]

A survey of constrained

Swiler, Laura P and Gulian, Mamikon and Frankel, Ari L and Safta, Cosmin and Jakeman, John D , journal=. A survey of constrained. 2020 , publisher=

2020
[60]

Naval Research Logistics (NRL) , volume=

Bayesian strategies for dynamic pricing in e-commerce , author=. Naval Research Logistics (NRL) , volume=. 2007 , publisher=

2007
[61]

Operations Research , volume=

Dynamic pricing and demand learning with limited price experimentation , author=. Operations Research , volume=. 2017 , publisher=

2017
[62]

Journal of Economic Dynamics and Control , volume=

Price dispersion and incomplete learning in the long run , author=. Journal of Economic Dynamics and Control , volume=. 1984 , publisher=

1984
[63]

Management Science , volume=

Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution , author=. Management Science , volume=. 2012 , publisher=

2012
[64]

Unimodal

Paladino, Stefano and Trovo, Francesco and Restelli, Marcello and Gatti, Nicola , booktitle=. Unimodal
[65]

SIAM Journal on Control and Optimization , volume=

Bisection search with noisy responses , author=. SIAM Journal on Control and Optimization , volume=. 2013 , publisher=

2013
[66]

Operations Research , volume=

Close the gaps: A learning-while-doing algorithm for single-product revenue management problems , author=. Operations Research , volume=. 2014 , publisher=

2014
[67]

Operations Research , volume=

Blind network revenue management , author=. Operations Research , volume=. 2012 , publisher=

2012
[68]

Management Science , volume=

Multimodal dynamic pricing , author=. Management Science , volume=. 2021 , publisher=

2021
[69]

Operations research , volume=

Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies , author=. Operations research , volume=. 2014 , publisher=

2014
[70]

Operations research , volume=

Online network revenue management using Thompson sampling , author=. Operations research , volume=. 2018 , publisher=

2018
[71]

Operations Research , volume=

Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms , author=. Operations Research , volume=. 2009 , publisher=

2009
[72]

Operations Research , volume=

Dynamic pricing under a general parametric choice model , author=. Operations Research , volume=. 2012 , publisher=

2012
[73]

Operations Research , volume=

Relationships among three assumptions in revenue management , author=. Operations Research , volume=. 2004 , publisher=

2004
[74]

Recent advances in optimization and modeling of contemporary problems , pages=

Bayesian optimization , author=. Recent advances in optimization and modeling of contemporary problems , pages=. 2018 , publisher=

2018
[75]

Journal of Global Optimization , volume=

Efficient global optimization of expensive black-box functions , author=. Journal of Global Optimization , volume=. 1998 , publisher=

1998
[76]

Mo. On. IFIP Technical Conference on Optimization Techniques , pages=. 1974 , organization=

1974
[77]

INFORMS Journal on Computing , volume=

The knowledge-gradient policy for correlated normal beliefs , author=. INFORMS Journal on Computing , volume=. 2009 , publisher=

2009
[78]

Kaufmann, Emilie and Capp. On. Artificial Intelligence and Statistics , pages=. 2012 , organization=

2012
[79]

Operations Research , volume=

Learning to optimize via information-directed sampling , author=. Operations Research , volume=. 2018 , publisher=

2018
[80]

Biometrika , volume=

On the likelihood that one unknown probability exceeds another in view of the evidence of two samples , author=. Biometrika , volume=. 1933 , publisher=

1933

Showing first 80 references.

[1] [1]

arXiv preprint arXiv:2601.21131 , year=

Thompson sampling: Precise arm-pull dynamics and adaptive inference , author=. arXiv preprint arXiv:2601.21131 , year=

work page arXiv

[2] [2]

Operations Research , volume=

The fragility of optimized bandit algorithms , author=. Operations Research , volume=. 2025 , publisher=

2025

[3] [3]

Ucb algorithms for multi-armed bandits: Precise regret and adaptive inference.arXiv preprint arXiv:2412.06126, 2024

UCB algorithms for multi-armed bandits: precise regret and adaptive inference , author=. arXiv preprint arXiv:2412.06126 , year=

work page arXiv

[4] [4]

Advances in Neural Information Processing Systems , volume=

A closer look at the worst-case behavior of multi-armed bandit algorithms , author=. Advances in Neural Information Processing Systems , volume=

[5] [5]

Journal of Computer and System Sciences , volume=

Combinatorial bandits , author=. Journal of Computer and System Sciences , volume=. 2012 , publisher=

2012

[6] [6]

Conference on learning theory , pages=

Sampling from a log-concave distribution with compact support with proximal Langevin Monte Carlo , author=. Conference on learning theory , pages=. 2017 , organization=

2017

[7] [7]

Advances in Neural Information Processing Systems , volume=

Mirrored langevin dynamics , author=. Advances in Neural Information Processing Systems , volume=

[8] [8]

Discrete & Computational Geometry , volume=

Sampling from a log-concave distribution with projected Langevin Monte Carlo , author=. Discrete & Computational Geometry , volume=. 2018 , publisher=

2018

[9] [9]

International conference on machine learning , pages=

Thompson sampling for complex online problems , author=. International conference on machine learning , pages=. 2014 , organization=

2014

[10] [10]

Journal of Machine Learning Research , volume=

An information-theoretic analysis of thompson sampling , author=. Journal of Machine Learning Research , volume=

[11] [11]

Annals of Probability , pages=

Self-Normalized Processes: Exponential Inequalities, Moment Bounds and Iterated Logarithm Laws , author=. Annals of Probability , pages=. 2004 , publisher=

2004

[12] [12]

Working paper , year=

TBD , author=. Working paper , year=

[13] [13]

Foundations and Trends

Graphical models, exponential families, and variational inference , author=. Foundations and Trends. 2008 , publisher=

2008

[14] [14]

Asymptotically efficient adaptive choice of control laws incontrolled

Graves, Todd L and Lai, Tze Leung , journal=. Asymptotically efficient adaptive choice of control laws incontrolled. 1997 , publisher=

1997

[15] [15]

Proceedings of the 24th annual conference on learning theory , pages=

The KL-UCB algorithm for bounded stochastic bandits and beyond , author=. Proceedings of the 24th annual conference on learning theory , pages=. 2011 , organization=

2011

[16] [16]

International Conference on Machine Learning , pages=

Structure adaptive algorithms for stochastic bandits , author=. International Conference on Machine Learning , pages=. 2020 , organization=

2020

[17] [17]

Advances in Neural Information Processing Systems , volume=

Minimal exploration in structured stochastic bandits , author=. Advances in Neural Information Processing Systems , volume=

[18] [18]

Probability in the Engineering and Informational Sciences , volume=

Exploration--exploitation policies with almost sure, arbitrarily slow growing asymptotic regret , author=. Probability in the Engineering and Informational Sciences , volume=. 2020 , publisher=

2020

[19] [19]

arXiv preprint arXiv:2210.05660 , year=

The typical behavior of bandit algorithms , author=. arXiv preprint arXiv:2210.05660 , year=

work page arXiv

[20] [20]

IEEE Transactions on Neural Networks and Learning Systems , volume=

A Thompson sampling algorithm with logarithmic regret for unimodal Gaussian bandit , author=. IEEE Transactions on Neural Networks and Learning Systems , volume=. 2023 , publisher=

2023

[21] [21]

Proceedings of the 28th International Conference on International Conference on Machine Learning , pages=

Unimodal bandits , author=. Proceedings of the 28th International Conference on International Conference on Machine Learning , pages=

[22] [22]

International Conference on Machine Learning , pages=

Unimodal bandits: Regret lower bounds and optimal algorithms , author=. International Conference on Machine Learning , pages=. 2014 , organization=

2014

[23] [23]

2025 , url=

Kaizheng Wang , booktitle=. 2025 , url=

2025

[24] [24]

2025 , url=

Anonymous , booktitle=. 2025 , url=

2025

[25] [25]

American Journal of Mathematics , volume=

On the law of the iterated logarithm , author=. American Journal of Mathematics , volume=. 1941 , publisher=

1941

[26] [26]

Tohoku Mathematical Journal, Second Series , volume=

Weighted sums of certain dependent random variables , author=. Tohoku Mathematical Journal, Second Series , volume=. 1967 , publisher=

1967

[27] [27]

2019 , publisher=

Probability: theory and examples , author=. 2019 , publisher=

2019

[28] [28]

2014 , publisher=

Martingale limit theory and its application , author=. 2014 , publisher=

2014

[29] [29]

Advances in neural information processing systems , volume=

Improved algorithms for linear stochastic bandits , author=. Advances in neural information processing systems , volume=

[30] [30]

Lecture notes-monograph series , pages=

Global versus local search in constrained optimization of computer models , author=. Lecture notes-monograph series , pages=. 1998 , publisher=

1998

[31] [31]

, author=

Bayesian optimization with inequality constraints. , author=. ICML , volume=

[32] [32]

Advances in neural information processing systems , volume=

Algorithms for hyper-parameter optimization , author=. Advances in neural information processing systems , volume=

[33] [33]

Advances in neural information processing systems , volume=

Predictive entropy search for efficient global optimization of black-box functions , author=. Advances in neural information processing systems , volume=

[34] [34]

Journal of Global Optimization , volume=

An informational approach to the global optimization of expensive-to-evaluate functions , author=. Journal of Global Optimization , volume=. 2009 , publisher=

2009

[35] [35]

The Journal of Machine Learning Research , volume=

Entropy search for information-efficient global optimization , author=. The Journal of Machine Learning Research , volume=. 2012 , publisher=

2012

[36] [36]

Statistical learning theory and stochastic optimization: Ecole d'Et

Catoni, Olivier , year=. Statistical learning theory and stochastic optimization: Ecole d'Et

[37] [37]

How to use expert advice , year =

Cesa-Bianchi, Nicol\'. How to use expert advice , year =. doi:10.1145/258128.258179 , journal=

work page doi:10.1145/258128.258179

[38] [38]

, title =

Vovk, Volodimir G. , title =. Proceedings of the Third Annual Workshop on Computational Learning Theory , pages =. 1990 , isbn =

1990

[39] [39]

30th Annual Symposium on Foundations of Computer Science , pages=

The weighted majority algorithm , author=. 30th Annual Symposium on Foundations of Computer Science , pages=. 1989 , organization=

1989

[40] [40]

Management Science , volume=

Optimal learning for structured bandits , author=. Management Science , volume=. 2024 , publisher=

2024

[41] [41]

Surveys in operations research and management science , volume=

Dynamic pricing and learning: Historical origins, current research, and new directions , author=. Surveys in operations research and management science , volume=. 2015 , publisher=

2015

[42] [42]

Operations Research , volume=

Dynamic pricing with a prior on market response , author=. Operations Research , volume=. 2010 , publisher=

2010

[43] [43]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

A general framework for updating belief distributions , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2016 , publisher=

2016

[44] [44]

Joint European Conference on Machine Learning and Knowledge Discovery in Databases , pages=

Bayesian optimization with a prior for the optimum , author=. Joint European Conference on Machine Learning and Knowledge Discovery in Databases , pages=. 2021 , organization=

2021

[45] [45]

Advances in Applied Mathematics , volume=

Asymptotically efficient adaptive allocation rules , author=. Advances in Applied Mathematics , volume=. 1985 , publisher=

1985

[46] [46]

Machine Learning , volume=

Finite-time analysis of the multiarmed bandit problem , author=. Machine Learning , volume=. 2002 , publisher=

2002

[47] [47]

Mathematics of Operations Research , volume=

Explore first, exploit next: The true shape of regret in bandit problems , author=. Mathematics of Operations Research , volume=. 2019 , publisher=

2019

[48] [48]

Foundations and Trends

Regret analysis of stochastic and nonstochastic multi-armed bandit problems , author=. Foundations and Trends. 2012 , publisher=

2012

[49] [49]

2010 , publisher=

Auer, Peter and Ortner, Ronald , journal=. 2010 , publisher=

2010

[50] [50]

2020 , publisher=

Bandit algorithms , author=. 2020 , publisher=

2020

[51] [51]

Kirszbraun, Mojzesz , journal=

[52] [52]

, title =

Minka, Thomas P. , title =. Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence , pages =. 2001 , isbn =

2001

[53] [53]

Journal of the ACM (JACM) , volume=

Solving convex programs by random walks , author=. Journal of the ACM (JACM) , volume=. 2004 , publisher=

2004

[54] [54]

2018 , publisher=

Lectures on convex optimization , author=. 2018 , publisher=

2018

[55] [55]

A modern

Scott, Steven L , journal=. A modern. 2010 , publisher=

2010

[56] [56]

ISBN 9781605580470

Kleinberg, Robert and Slivkins, Aleksandrs and Upfal, Eli , title =. 2008 , isbn =. doi:10.1145/1374376.1374475 , booktitle =

work page doi:10.1145/1374376.1374475 2008

[57] [57]

2012 , publisher=

Theory of statistics , author=. 2012 , publisher=

2012

[58] [58]

Barndorff-Nielsen, O. E. and Cox, D. R. Inference and asymptotics. 1994

1994

[59] [59]

A survey of constrained

Swiler, Laura P and Gulian, Mamikon and Frankel, Ari L and Safta, Cosmin and Jakeman, John D , journal=. A survey of constrained. 2020 , publisher=

2020

[60] [60]

Naval Research Logistics (NRL) , volume=

Bayesian strategies for dynamic pricing in e-commerce , author=. Naval Research Logistics (NRL) , volume=. 2007 , publisher=

2007

[61] [61]

Operations Research , volume=

Dynamic pricing and demand learning with limited price experimentation , author=. Operations Research , volume=. 2017 , publisher=

2017

[62] [62]

Journal of Economic Dynamics and Control , volume=

Price dispersion and incomplete learning in the long run , author=. Journal of Economic Dynamics and Control , volume=. 1984 , publisher=

1984

[63] [63]

Management Science , volume=

Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution , author=. Management Science , volume=. 2012 , publisher=

2012

[64] [64]

Unimodal

Paladino, Stefano and Trovo, Francesco and Restelli, Marcello and Gatti, Nicola , booktitle=. Unimodal

[65] [65]

SIAM Journal on Control and Optimization , volume=

Bisection search with noisy responses , author=. SIAM Journal on Control and Optimization , volume=. 2013 , publisher=

2013

[66] [66]

Operations Research , volume=

Close the gaps: A learning-while-doing algorithm for single-product revenue management problems , author=. Operations Research , volume=. 2014 , publisher=

2014

[67] [67]

Operations Research , volume=

Blind network revenue management , author=. Operations Research , volume=. 2012 , publisher=

2012

[68] [68]

Management Science , volume=

Multimodal dynamic pricing , author=. Management Science , volume=. 2021 , publisher=

2021

[69] [69]

Operations research , volume=

Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies , author=. Operations research , volume=. 2014 , publisher=

2014

[70] [70]

Operations research , volume=

Online network revenue management using Thompson sampling , author=. Operations research , volume=. 2018 , publisher=

2018

[71] [71]

Operations Research , volume=

Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms , author=. Operations Research , volume=. 2009 , publisher=

2009

[72] [72]

Operations Research , volume=

Dynamic pricing under a general parametric choice model , author=. Operations Research , volume=. 2012 , publisher=

2012

[73] [73]

Operations Research , volume=

Relationships among three assumptions in revenue management , author=. Operations Research , volume=. 2004 , publisher=

2004

[74] [74]

Recent advances in optimization and modeling of contemporary problems , pages=

Bayesian optimization , author=. Recent advances in optimization and modeling of contemporary problems , pages=. 2018 , publisher=

2018

[75] [75]

Journal of Global Optimization , volume=

Efficient global optimization of expensive black-box functions , author=. Journal of Global Optimization , volume=. 1998 , publisher=

1998

[76] [76]

Mo. On. IFIP Technical Conference on Optimization Techniques , pages=. 1974 , organization=

1974

[77] [77]

INFORMS Journal on Computing , volume=

The knowledge-gradient policy for correlated normal beliefs , author=. INFORMS Journal on Computing , volume=. 2009 , publisher=

2009

[78] [78]

Kaufmann, Emilie and Capp. On. Artificial Intelligence and Statistics , pages=. 2012 , organization=

2012

[79] [79]

Operations Research , volume=

Learning to optimize via information-directed sampling , author=. Operations Research , volume=. 2018 , publisher=

2018

[80] [80]

Biometrika , volume=

On the likelihood that one unknown probability exceeds another in view of the evidence of two samples , author=. Biometrika , volume=. 1933 , publisher=

1933