Recognition: no theorem link
Offline Local Search for Online Stochastic Bandits
Pith reviewed 2026-05-10 18:05 UTC · model grok-4.3
The pith
A generic method converts offline local search into online stochastic combinatorial bandits with O(log^3 T) approximate regret.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We give a generic method for converting an offline local search algorithm that terminates in an approximately optimal solution into an online stochastic combinatorial bandit algorithm with O(log^3 T) approximate regret. The framework is applied to three concrete problems: scheduling to minimize total completion time, finding a minimum-cost base in a matroid, and uncertain clustering.
What carries the argument
The generic offline-to-online conversion wrapper that invokes the local search procedure on estimated costs while controlling the number of exploration rounds to accumulate only logarithmic regret.
If this is right
- Scheduling to minimize total completion time can be solved online with only O(log^3 T) regret using any local-search routine that works offline.
- Minimum-cost matroid base selection inherits the same O(log^3 T) bound in the stochastic bandit model.
- Uncertain clustering instances can be handled online via the same conversion while retaining logarithmic regret.
- Any combinatorial optimization problem whose offline version admits an approximately optimal local-search algorithm immediately yields an online stochastic bandit algorithm with O(log^3 T) approximate regret.
Where Pith is reading between the lines
- The same wrapper idea could be tested on other common offline heuristics such as greedy or swap-based methods to see whether logarithmic regret remains attainable.
- In applications where costs are stationary but the decision maker only observes the chosen action's cost, the framework offers a practical route to near-optimal performance after only polylogarithmic time.
- Because the regret is independent of the approximation ratio in the leading term, tightening the offline local-search guarantee directly improves the online bound without extra algorithmic work.
Load-bearing premise
The offline local search procedure must reach a solution whose cost is within a fixed factor of the true optimum for the underlying combinatorial problem, and the per-round costs must be drawn independently from the same fixed but unknown distributions.
What would settle it
Execute the converted algorithm on the matroid base problem for T = 10,000 rounds with known cost distributions and check whether cumulative approximate regret remains O((log T)^3) or instead grows linearly or as T to a positive power.
read the original abstract
Combinatorial multi-armed bandits provide a fundamental online decision-making environment where a decision-maker interacts with an environment across $T$ time steps, each time selecting an action and learning the cost of that action. The goal is to minimize regret, defined as the loss compared to the optimal fixed action in hindsight under full-information. There has been substantial interest in leveraging what is known about offline algorithm design in this online setting. Offline greedy and linear optimization algorithms (both exact and approximate) have been shown to provide useful guarantees when deployed online. We investigate local search methods, a broad class of algorithms used widely in both theory and practice, which have thus far been under-explored in this context. We focus on problems where offline local search terminates in an approximately optimal solution and give a generic method for converting such an offline algorithm into an online stochastic combinatorial bandit algorithm with $O(\log^3 T)$ (approximate) regret. In contrast, existing offline-to-online frameworks yield regret (and approximate regret) which depend sub-linearly, but polynomially on $T$. We demonstrate the flexibility of our framework by applying it to three online stochastic combinatorial optimization problems: scheduling to minimize total completion time, finding a minimum cost base of a matroid and uncertain clustering.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a generic framework for converting offline local search algorithms—which terminate at approximately optimal solutions for combinatorial optimization problems—into online algorithms for stochastic combinatorial multi-armed bandits. Under i.i.d. stochastic costs, the method achieves O(log³ T) approximate regret. The framework is instantiated on three problems: scheduling to minimize total completion time, minimum-cost matroid bases, and uncertain clustering. The central claim contrasts this logarithmic dependence with prior offline-to-online conversions that incur polynomial dependence on T.
Significance. If the O(log³ T) bound holds under the stated assumptions, the result strengthens the link between offline approximation algorithms and online regret minimization for combinatorial settings. It improves the dependence on T relative to existing frameworks and demonstrates applicability across scheduling, matroids, and clustering. The generic conversion and explicit applications are strengths; the use of standard i.i.d. stochastic assumptions makes the bound relevant to practical online decision-making.
major comments (2)
- [§3] §3 (Generic Conversion): The derivation of the O(log³ T) bound from the offline local-search guarantee and the online exploration schedule is load-bearing for the main claim. The analysis should explicitly isolate the contribution of each log factor (e.g., from restarts, concentration, and local-search calls) so that readers can verify whether the cubic dependence is necessary or improvable.
- [§5] §5 (Scheduling Application): The reduction assumes the offline local search returns an α-approximate solution, yet the regret statement is stated as O(log³ T) approximate regret without quantifying how the approximation factor α propagates into the final bound. This detail is needed to confirm the bound remains logarithmic when α > 1.
minor comments (3)
- [Introduction] The introduction would benefit from a short table comparing the new O(log³ T) bound with the polynomial bounds of the cited offline-to-online frameworks.
- [Abstract] Notation for 'approximate regret' is introduced in the abstract but defined only later; a forward reference or early definition would improve readability.
- [§6] The matroid application (§6) could briefly note that exact polynomial-time algorithms exist, clarifying why the approximate-regret setting is still of interest.
Simulated Author's Rebuttal
We thank the referee for the positive assessment and constructive feedback. We appreciate the recognition of the framework's contributions to linking offline local search with online combinatorial bandits. We address the major comments point by point below and will incorporate the requested clarifications in the revised manuscript.
read point-by-point responses
-
Referee: [§3] §3 (Generic Conversion): The derivation of the O(log³ T) bound from the offline local-search guarantee and the online exploration schedule is load-bearing for the main claim. The analysis should explicitly isolate the contribution of each log factor (e.g., from restarts, concentration, and local-search calls) so that readers can verify whether the cubic dependence is necessary or improvable.
Authors: We agree that an explicit breakdown of the logarithmic factors will strengthen the presentation. In the revised §3, we will insert a dedicated remark immediately following the main theorem that isolates the three sources: (i) a log T factor arising from the concentration bounds on the empirical costs during the exploration phases, (ii) a second log T factor from the doubling/restart schedule that controls the number of phases, and (iii) a third log T factor stemming from the per-phase invocation of the offline local-search procedure (whose running time is treated as a constant relative to T). This decomposition will make clear that the cubic dependence is an artifact of composing standard concentration, doubling, and local-search primitives; while we do not claim it is information-theoretically tight, the structure suggests that any improvement would require a qualitatively different exploration schedule. revision: yes
-
Referee: [§5] §5 (Scheduling Application): The reduction assumes the offline local search returns an α-approximate solution, yet the regret statement is stated as O(log³ T) approximate regret without quantifying how the approximation factor α propagates into the final bound. This detail is needed to confirm the bound remains logarithmic when α > 1.
Authors: The referee is correct that the dependence on α should be stated explicitly. In our framework the notion of “approximate regret” is defined relative to the α-approximate offline optimum returned by local search; consequently the generic bound is O(α log³ T) approximate regret. Because α is a fixed constant independent of T, the dependence on the time horizon remains logarithmic for any constant α > 1. In the revision we will add this explicit propagation both in the statement of the generic theorem (Theorem 3.1) and in the scheduling application (Corollary 5.1), together with a short sentence confirming that the bound stays O(log³ T) whenever α is constant. revision: yes
Circularity Check
No significant circularity identified
full rationale
The paper presents a generic algorithmic conversion from any offline local search procedure that terminates at an approximate optimum into an online stochastic combinatorial bandit algorithm achieving O(log^3 T) approximate regret under i.i.d. costs. The derivation relies on standard stochastic regret analysis and does not reduce any central claim to a fitted parameter, self-referential definition, or load-bearing self-citation chain. The three example applications are consistent demonstrations rather than the source of the bound. The framework is self-contained and externally falsifiable via standard bandit techniques.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Costs are drawn i.i.d. from fixed but unknown distributions; regret is measured against the single best fixed action in hindsight under full information.
Reference graph
Works this paper leans on
-
[1]
51 Shay Solomon, Amitai Uzrad, and Tianyi Zhang
Arpit Agarwal, Rohan Ghuge, and Viswanath Nagarajan. Semi-bandit learning for monotone stochastic optimization. In65th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2024, Chicago, IL, USA, October 27-30, 2024, pages 1260–1274. IEEE, 2024. doi: 10.1109/FOCS61266.2024.00083. URLhttps://doi.org/10.1109/FOCS61266.2024.00083
-
[3]
Local search heuristic for k-median and facility location problems
Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh Munagala, and Vinayaka Pandit. Local search heuristic for k-median and facility location problems. InProceedings of the thirty-third annual ACM symposium on Theory of computing, pages 21–29, 2001
2001
-
[4]
Regret in online combinatorial optimization.Math
Jean-Yves Audibert, Sébastien Bubeck, and Gábor Lugosi. Regret in online combinatorial optimization.Math. Oper. Res., 39(1):31–45, 2014. doi: 10.1287/MOOR.2013.0598. URL https://doi.org/10.1287/moor.2013.0598
-
[5]
Finite-time analysis of the multiarmed bandit problem.Mach
Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem.Mach. Learn., 47(2-3):235–256, 2002. doi: 10.1023/A:1013689704352. URL https://doi.org/10.1023/A:1013689704352
-
[6]
The nonstochastic multiarmed bandit problem,
Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. The nonstochastic mul- tiarmed bandit problem.SIAM J. Comput., 32(1):48–77, 2002. doi: 10.1137/S0097539701398375. URLhttps://doi.org/10.1137/S0097539701398375
-
[7]
Brief announcement: Stochastic parallel scheduling with bandit feedback
Gerdus Benadè, Rathish Das, and Thomas Lavastida. Brief announcement: Stochastic parallel scheduling with bandit feedback. InProceedings of the 37th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2025, Portland, OR, USA, 28 July 2025 - 1 August 2025, pages 618–622. ACM, 2025. doi: 10.1145/3694906.3743344. URLhttps://doi.org/10.1145/ 36...
-
[8]
Randomized local search for real-life inventory routing.Transportation Science, 45(3):381–398, 2011
Thierry Benoist, Frédéric Gardi, Antoine Jeanjean, and Bertrand Estellon. Randomized local search for real-life inventory routing.Transportation Science, 45(3):381–398, 2011
2011
-
[9]
Balls into bins via local search
Paul Bogdan, Thomas Sauerwald, Alexandre Stauffer, and He Sun. Balls into bins via local search. In Sanjeev Khanna, editor,Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013, New Orleans, Louisiana, USA, January 6-8, 21 2013, pages 16–34. SIAM, 2013. doi: 10.1137/1.9781611973105.2. URLhttps://doi.org/10. 1137/1.97...
-
[10]
Sébastien Bubeck, Nicolò Cesa-Bianchi, and Sham M. Kakade. Towards minimax policies for online linear optimization with bandit feedback. In Shie Mannor, Nathan Srebro, and Robert C. Williamson, editors,COLT 2012 - The 25th Annual Conference on Learning Theory, June 25-27, 2012, Edinburgh, Scotland, volume 23 ofJMLR Proceedings, pages 41.1–41.14. JMLR.org,
2012
-
[11]
URLhttp://proceedings.mlr.press/v23/bubeck12a/bubeck12a.pdf
-
[12]
Bandits with heavy tail.IEEE Trans
Sébastien Bubeck, Nicolò Cesa-Bianchi, and Gábor Lugosi. Bandits with heavy tail.IEEE Trans. Inf. Theory, 59(11):7711–7717, 2013. doi: 10.1109/TIT.2013.2277869. URLhttps: //doi.org/10.1109/TIT.2013.2277869
-
[13]
Local search for a multi-drop multi-container loading problem
Sara Ceschia and Andrea Schaerf. Local search for a multi-drop multi-container loading problem. Journal of Heuristics, 19(2):275–294, 2013
2013
-
[14]
Ke Chen. A constant factor approximation algorithm fork-median clustering with outliers. In Shang-Hua Teng, editor,Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2008, San Francisco, California, USA, January 20-22, 2008, pages 826–835. SIAM, 2008. URLhttp://dl.acm.org/citation.cfm?id=1347082.1347173
-
[15]
Combinatorial multi-armed bandit: General framework and applications
Wei Chen, Yajun Wang, and Yang Yuan. Combinatorial multi-armed bandit: General framework and applications. InProceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013, volume 28 ofJMLR Workshop and Conference Proceedings, pages 151–159. JMLR.org, 2013. URLhttp://proceedings.mlr.press/v28/ chen13a.html
2013
-
[16]
An improved local search algorithm for k-median
Vincent Cohen-Addad, Anupam Gupta, Lunjia Hu, Hoon Oh, and David Saulpic. An improved local search algorithm for k-median. In Joseph (Seffi) Naor and Niv Buchbinder, editors, Proceedings of the 2022 ACM-SIAM Symposium on Discrete Algorithms, SODA 2022, Virtual Conference / Alexandria, VA, USA, January 9 - 12, 2022, pages 1556–1612. SIAM, 2022. doi: 10.113...
-
[17]
Com- binatorial bandits revisited
Richard Combes, Mohammad Sadegh Talebi, Alexandre Proutière, and Marc Lelarge. Com- binatorial bandits revisited. In Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett, editors,Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montre...
2015
-
[18]
Approximation algorithms for clustering uncertain data
Graham Cormode and Andrew McGregor. Approximation algorithms for clustering uncertain data. InProceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 191–200, 2008
2008
-
[19]
Statistically efficient, polynomial- time algorithms for combinatorial semi-bandits.Proc
Thibaut Cuvelier, Richard Combes, and Eric Gourdin. Statistically efficient, polynomial- time algorithms for combinatorial semi-bandits.Proc. ACM Meas. Anal. Comput. Syst., 5(1): 09:1–09:31, 2021. doi: 10.1145/3447387. URLhttps://doi.org/10.1145/3447387
-
[20]
The price of bandit information for online optimization
Varsha Dani, Sham M Kakade, and Thomas Hayes. The price of bandit information for online optimization. InAdvances in Neural Information Processing Systems, volume 20. Curran Associates, Inc., 2007. URL https://proceedings.neurips.cc/paper_files/paper/2007/ file/bf62768ca46b6c3b5bea9515d1a1fc45-Paper.pdf. 22
2007
-
[21]
Dubhashi and Alessandro Panconesi.Concentration of Measure for the Analysis of Randomized Algorithms
Devdatt P. Dubhashi and Alessandro Panconesi.Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press, 2009
2009
-
[22]
Schapire, Vasilis Syrgkanis, and Jennifer Wortman Vaughan
Miroslav Dudík, Nika Haghtalab, Haipeng Luo, Robert E. Schapire, Vasilis Syrgkanis, and Jennifer Wortman Vaughan. Oracle-efficient online learning and auction design.J. ACM, 67(5): 26:1–26:57, 2020. doi: 10.1145/3402203. URLhttps://doi.org/10.1145/3402203
-
[23]
PAC bounds for multi-armed bandit and markov decision processes
Eyal Even-Dar, Shie Mannor, and Yishay Mansour. PAC bounds for multi-armed bandit and markov decision processes. In Jyrki Kivinen and Robert H. Sloan, editors,Computational Learning Theory, 15th Annual Conference on Computational Learning Theory, COLT 2002, Sydney, Australia, July 8-10, 2002, Proceedings, volume 2375 ofLecture Notes in Computer Science, p...
-
[24]
Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems.J
Eyal Even-Dar, Shie Mannor, and Yishay Mansour. Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems.J. Mach. Learn. Res., 7: 1079–1105, 2006. URLhttps://jmlr.org/papers/v7/evendar06a.html
2006
-
[25]
Combina- torial stochastic-greedy bandit
Fares Fourati, Christopher John Quinn, Mohamed-Slim Alouini, and Vaneet Aggarwal. Combina- torial stochastic-greedy bandit. In Michael J. Wooldridge, Jennifer G. Dy, and Sriraam Natarajan, editors,Thirty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024, Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence, IAAI 2024, F...
-
[26]
Oxford University Press Oxford, 2011
András Frank.Connections in combinatorial optimization, volume 38. Oxford University Press Oxford, 2011
2011
-
[27]
Shayan Oveis Gharan and Luca Trevisan. Partitioning into expanders. In Chandra Chekuri, editor,Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Portland, Oregon, USA, January 5-7, 2014, pages 1256–1266. SIAM, 2014. doi: 10.1137/1.9781611973402.93. URLhttps://doi.org/10.1137/1.9781611973402.93
-
[28]
Tight bounds forγ-regret via the decision-estimation coefficient, 2023
Margalit Glasgow and Alexander Rakhlin. Tight bounds forγ-regret via the decision-estimation coefficient, 2023
2023
-
[29]
Exceeding expectations and clustering uncertain data
Sudipto Guha and Kamesh Munagala. Exceeding expectations and clustering uncertain data. In Jan Paredaens and Jianwen Su, editors,Proceedings of the Twenty-Eigth ACM SIGMOD- SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2009, June 19 - July 1, 2009, Providence, Rhode Island, USA, pages 269–278. ACM, 2009. doi: 10.1145/1559795.1559836. URL...
-
[30]
Anupam Gupta, Euiwoong Lee, and Jason Li. A local search-based approach for set covering. In Telikepalli Kavitha and Kurt Mehlhorn, editors,2023 Symposium on Simplicity in Algorithms, SOSA 2023, Florence, Italy, January 23-25, 2023, pages 1–11. SIAM, 2023. doi: 10.1137/1. 9781611977585.CH1. URLhttps://doi.org/10.1137/1.9781611977585.ch1
work page doi:10.1137/1 2023
-
[31]
Logarithmic regret algorithms for online convex optimization.Machine Learning, 69(2–3):169–192, 2007
Elad Hazan, Amit Agarwal, and Satyen Kale. Logarithmic regret algorithms for online convex optimization.Mach. Learn., 69(2-3):169–192, 2007. doi: 10.1007/S10994-007-5016-8. URL https://doi.org/10.1007/s10994-007-5016-8. 23
-
[32]
A 1.875: approximation algorithm for the stable marriage problem
Kazuo Iwama, Shuichi Miyazaki, and Naoya Yamauchi. A 1.875: approximation algorithm for the stable marriage problem. In Nikhil Bansal, Kirk Pruhs, and Clifford Stein, editors, Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, New Orleans, Louisiana, USA, January 7-9, 2007, pages 288–297. SIAM, 2007. URL http://dl.a...
-
[33]
Kakade, Adam Tauman Kalai, and Katrina Ligett
Sham M. Kakade, Adam Tauman Kalai, and Katrina Ligett. Playing games with approximation algorithms.SIAM J. Comput., 39(3):1088–1106, 2009. doi: 10.1137/070701704. URLhttps: //doi.org/10.1137/070701704
-
[34]
Adam Tauman Kalai and Santosh S. Vempala. Efficient algorithms for online decision problems. J. Comput. Syst. Sci., 71(3):291–307, 2005. doi: 10.1016/J.JCSS.2004.10.016. URLhttps: //doi.org/10.1016/j.jcss.2004.10.016
-
[35]
Multi-armed bandit-based hyper-heuristics for combinatorial optimizationproblems.Eur
Felipe Lagos and Jordi Pereira. Multi-armed bandit-based hyper-heuristics for combinatorial optimizationproblems.Eur. J. Oper. Res., 312(1):70–91, 2024. doi: 10.1016/J.EJOR.2023.06.016. URLhttps://doi.org/10.1016/j.ejor.2023.06.016
-
[36]
Advances in Applied Mathematics , volume =
T.L Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules.Adv. Appl. Math., 6(1):4–22, March 1985. ISSN 0196-8858. doi: 10.1016/0196-8858(85)90002-8. URL https://doi.org/10.1016/0196-8858(85)90002-8
-
[37]
Cambridge University Press, 2020
Tor Lattimore and Csaba Szepesvári.Bandit Algorithms. Cambridge University Press, 2020
2020
-
[38]
Schapire
Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. A contextual-bandit approach to personalized news article recommendation. In Michael Rappa, Paul Jones, Juliana Freire, and Soumen Chakrabarti, editors,Proceedings of the 19th International Conference on World Wide Web, WWW 2010, Raleigh, North Carolina, USA, April 26-30, 2010, pages 661–670. ACM,
2010
-
[39]
URLhttps://doi.org/10.1145/1772690.1772758
doi: 10.1145/1772690.1772758. URLhttps://doi.org/10.1145/1772690.1772758
-
[40]
Permutation predictions for non-clairvoyant scheduling
Alexander Lindermayr and Nicole Megow. Permutation predictions for non-clairvoyant scheduling. In Kunal Agrawal and I-Ting Angelina Lee, editors,SPAA ’22: 34th ACM Symposium on Parallelism in Algorithms and Architectures, Philadelphia, PA, USA, July 11 - 14, 2022, pages 357–368. ACM, 2022. doi: 10.1145/3490148.3538579. URL https: //doi.org/10.1145/3490148.3538579
-
[41]
Stochastic local search and machine learning: From theory to applications and vice versa
Ole Jakob Mengshoel, Tong Yu, and Ming Zeng. Stochastic local search and machine learning: From theory to applications and vice versa. In Giuseppe De Giacomo, Alejandro Catalá, Bistra Dilkina, Michela Milano, Senén Barro, Alberto Bugarín, and Jérôme Lang, editors, ECAI 2020 - 24th European Conference on Artificial Intelligence, 29 August-8 September 2020,...
-
[42]
Local search for shift design.European journal of operational research, 153(1):51–64, 2004
Nysret Musliu, Andrea Schaerf, and Wolfgang Slany. Local search for shift design.European journal of operational research, 153(1):51–64, 2004
2004
-
[43]
Importance weighting without importance weights: An efficient algorithm for combinatorial semi-bandits.J
Gergely Neu and Gábor Bartók. Importance weighting without importance weights: An efficient algorithm for combinatorial semi-bandits.J. Mach. Learn. Res., 17:154:1–154:21, 2016. URL https://jmlr.org/papers/v17/15-091.html. 24
2016
-
[44]
Wang, Fransisca Susan, and Ashwinkumar Badani- diyuru
Rad Niazadeh, Negin Golrezaei, Joshua R. Wang, Fransisca Susan, and Ashwinkumar Badani- diyuru. Online learning via offline greedy algorithms: Applications in market design and optimization.Manag. Sci., 69(7):3797–3817, 2023. doi: 10.1287/MNSC.2022.4558. URL https://doi.org/10.1287/mnsc.2022.4558
-
[45]
Exploiting structure of uncertainty for efficient matroid semi-bandits
Pierre Perrault, Vianney Perchet, and Michal Valko. Exploiting structure of uncertainty for efficient matroid semi-bandits. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors,Pro- ceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 ofProceedings of Machine Learning Resea...
2019
- [46]
-
[47]
Shai Shalev-Shwartz and Sham M. Kakade. Mind the duality gap: Logarithmic regret algorithms for online optimization. In Daphne Koller, Dale Schuurmans, Yoshua Ben- gio, and Léon Bottou, editors,Advances in Neural Information Processing Systems 21, Proceedings of the Twenty-Second Annual Conference on Neural Information Processing Systems, Vancouver, Briti...
2008
-
[48]
Wayne E. Smith. Various optimizers for single-stage production.Naval Research Logistics Quarterly, 3(1-2):59–66, 1956. doi: https://doi.org/10.1002/nav.3800030106. URL https: //onlinelibrary.wiley.com/doi/abs/10.1002/nav.3800030106
-
[49]
Online allocation and pricing: Constant regret via bellman inequalities.Oper
Alberto Vera, Siddhartha Banerjee, and Itai Gurvich. Online allocation and pricing: Constant regret via bellman inequalities.Oper. Res., 69(3):821–840, 2021. doi: 10.1287/OPRE.2020.2061. URLhttps://doi.org/10.1287/opre.2020.2061
-
[50]
Efficient learning in large-scale combina- torial semi-bandits
Zheng Wen, Branislav Kveton, and Azin Ashkan. Efficient learning in large-scale combina- torial semi-bandits. In Francis R. Bach and David M. Blei, editors,Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, volume 37 ofJMLR Workshop and Conference Proceedings, pages 1113–1122. JMLR.org, 2015. UR...
2015
-
[51]
Logarithmic regret in feature-based dynamic pricing
Jianyu Xu and Yu-Xiang Wang. Logarithmic regret in feature-based dynamic pricing. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wort- man Vaughan, editors,Advances in Neural Information Processing Systems 34: Annual Con- ference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtu...
2021
-
[52]
Dhillon, and Sujay Sanghavi
Shuo Yang, Tongzheng Ren, Sanjay Shakkottai, Eric Price, Inderjit S. Dhillon, and Sujay Sanghavi. Linear bandit algorithms with sublinear time complexity. InInternational Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 ofProceedings of Machine Learning Research, pages 25241–25260. PMLR, 2022. URL https://pr...
2022
-
[53]
Tong Yu, Branislav Kveton, and Ole J. Mengshoel. Thompson sampling for optimizing stochastic local search. In Michelangelo Ceci, Jaakko Hollmén, Ljupco Todorovski, Celine Vens, and 25 Saso Dzeroski, editors,Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2017, Skopje, Macedonia, September 18-22, 2017, Proceedings, Pa...
-
[54]
Bandmaxsat: A local search maxsat solver with multi-armed bandit
Jiongzhi Zheng, Kun He, Jianrong Zhou, Yan Jin, Chu-Min Li, and Felip Manyà. Bandmaxsat: A local search maxsat solver with multi-armed bandit. In Luc De Raedt, editor,Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022, pages 1901–1907. ijcai.org, 2022. doi: 10.24963/IJCAI...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.