pith. machine review for the scientific record. sign in

arxiv: 2604.09423 · v1 · submitted 2026-04-10 · 💻 cs.LG

Recognition: no theorem link

Offline Local Search for Online Stochastic Bandits

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:05 UTC · model grok-4.3

classification 💻 cs.LG
keywords combinatorial banditslocal searchstochastic banditsoffline-to-online conversionregret minimizationschedulingmatroidsclustering
0
0 comments X

The pith

A generic method converts offline local search into online stochastic combinatorial bandits with O(log^3 T) approximate regret.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to take local search algorithms designed for offline combinatorial problems and deploy them in an online stochastic bandit setting. This yields regret that grows only as the cube of the log of the time horizon T, rather than any positive power of T itself. A reader would care because local search is already a practical workhorse for hard offline tasks, and the new conversion lets it handle sequential decisions with far less cumulative loss than prior offline-to-online reductions. The approach requires only that the offline local search reach an approximate optimum and that costs arrive i.i.d. from fixed distributions.

Core claim

We give a generic method for converting an offline local search algorithm that terminates in an approximately optimal solution into an online stochastic combinatorial bandit algorithm with O(log^3 T) approximate regret. The framework is applied to three concrete problems: scheduling to minimize total completion time, finding a minimum-cost base in a matroid, and uncertain clustering.

What carries the argument

The generic offline-to-online conversion wrapper that invokes the local search procedure on estimated costs while controlling the number of exploration rounds to accumulate only logarithmic regret.

If this is right

  • Scheduling to minimize total completion time can be solved online with only O(log^3 T) regret using any local-search routine that works offline.
  • Minimum-cost matroid base selection inherits the same O(log^3 T) bound in the stochastic bandit model.
  • Uncertain clustering instances can be handled online via the same conversion while retaining logarithmic regret.
  • Any combinatorial optimization problem whose offline version admits an approximately optimal local-search algorithm immediately yields an online stochastic bandit algorithm with O(log^3 T) approximate regret.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same wrapper idea could be tested on other common offline heuristics such as greedy or swap-based methods to see whether logarithmic regret remains attainable.
  • In applications where costs are stationary but the decision maker only observes the chosen action's cost, the framework offers a practical route to near-optimal performance after only polylogarithmic time.
  • Because the regret is independent of the approximation ratio in the leading term, tightening the offline local-search guarantee directly improves the online bound without extra algorithmic work.

Load-bearing premise

The offline local search procedure must reach a solution whose cost is within a fixed factor of the true optimum for the underlying combinatorial problem, and the per-round costs must be drawn independently from the same fixed but unknown distributions.

What would settle it

Execute the converted algorithm on the matroid base problem for T = 10,000 rounds with known cost distributions and check whether cumulative approximate regret remains O((log T)^3) or instead grows linearly or as T to a positive power.

read the original abstract

Combinatorial multi-armed bandits provide a fundamental online decision-making environment where a decision-maker interacts with an environment across $T$ time steps, each time selecting an action and learning the cost of that action. The goal is to minimize regret, defined as the loss compared to the optimal fixed action in hindsight under full-information. There has been substantial interest in leveraging what is known about offline algorithm design in this online setting. Offline greedy and linear optimization algorithms (both exact and approximate) have been shown to provide useful guarantees when deployed online. We investigate local search methods, a broad class of algorithms used widely in both theory and practice, which have thus far been under-explored in this context. We focus on problems where offline local search terminates in an approximately optimal solution and give a generic method for converting such an offline algorithm into an online stochastic combinatorial bandit algorithm with $O(\log^3 T)$ (approximate) regret. In contrast, existing offline-to-online frameworks yield regret (and approximate regret) which depend sub-linearly, but polynomially on $T$. We demonstrate the flexibility of our framework by applying it to three online stochastic combinatorial optimization problems: scheduling to minimize total completion time, finding a minimum cost base of a matroid and uncertain clustering.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript proposes a generic framework for converting offline local search algorithms—which terminate at approximately optimal solutions for combinatorial optimization problems—into online algorithms for stochastic combinatorial multi-armed bandits. Under i.i.d. stochastic costs, the method achieves O(log³ T) approximate regret. The framework is instantiated on three problems: scheduling to minimize total completion time, minimum-cost matroid bases, and uncertain clustering. The central claim contrasts this logarithmic dependence with prior offline-to-online conversions that incur polynomial dependence on T.

Significance. If the O(log³ T) bound holds under the stated assumptions, the result strengthens the link between offline approximation algorithms and online regret minimization for combinatorial settings. It improves the dependence on T relative to existing frameworks and demonstrates applicability across scheduling, matroids, and clustering. The generic conversion and explicit applications are strengths; the use of standard i.i.d. stochastic assumptions makes the bound relevant to practical online decision-making.

major comments (2)
  1. [§3] §3 (Generic Conversion): The derivation of the O(log³ T) bound from the offline local-search guarantee and the online exploration schedule is load-bearing for the main claim. The analysis should explicitly isolate the contribution of each log factor (e.g., from restarts, concentration, and local-search calls) so that readers can verify whether the cubic dependence is necessary or improvable.
  2. [§5] §5 (Scheduling Application): The reduction assumes the offline local search returns an α-approximate solution, yet the regret statement is stated as O(log³ T) approximate regret without quantifying how the approximation factor α propagates into the final bound. This detail is needed to confirm the bound remains logarithmic when α > 1.
minor comments (3)
  1. [Introduction] The introduction would benefit from a short table comparing the new O(log³ T) bound with the polynomial bounds of the cited offline-to-online frameworks.
  2. [Abstract] Notation for 'approximate regret' is introduced in the abstract but defined only later; a forward reference or early definition would improve readability.
  3. [§6] The matroid application (§6) could briefly note that exact polynomial-time algorithms exist, clarifying why the approximate-regret setting is still of interest.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and constructive feedback. We appreciate the recognition of the framework's contributions to linking offline local search with online combinatorial bandits. We address the major comments point by point below and will incorporate the requested clarifications in the revised manuscript.

read point-by-point responses
  1. Referee: [§3] §3 (Generic Conversion): The derivation of the O(log³ T) bound from the offline local-search guarantee and the online exploration schedule is load-bearing for the main claim. The analysis should explicitly isolate the contribution of each log factor (e.g., from restarts, concentration, and local-search calls) so that readers can verify whether the cubic dependence is necessary or improvable.

    Authors: We agree that an explicit breakdown of the logarithmic factors will strengthen the presentation. In the revised §3, we will insert a dedicated remark immediately following the main theorem that isolates the three sources: (i) a log T factor arising from the concentration bounds on the empirical costs during the exploration phases, (ii) a second log T factor from the doubling/restart schedule that controls the number of phases, and (iii) a third log T factor stemming from the per-phase invocation of the offline local-search procedure (whose running time is treated as a constant relative to T). This decomposition will make clear that the cubic dependence is an artifact of composing standard concentration, doubling, and local-search primitives; while we do not claim it is information-theoretically tight, the structure suggests that any improvement would require a qualitatively different exploration schedule. revision: yes

  2. Referee: [§5] §5 (Scheduling Application): The reduction assumes the offline local search returns an α-approximate solution, yet the regret statement is stated as O(log³ T) approximate regret without quantifying how the approximation factor α propagates into the final bound. This detail is needed to confirm the bound remains logarithmic when α > 1.

    Authors: The referee is correct that the dependence on α should be stated explicitly. In our framework the notion of “approximate regret” is defined relative to the α-approximate offline optimum returned by local search; consequently the generic bound is O(α log³ T) approximate regret. Because α is a fixed constant independent of T, the dependence on the time horizon remains logarithmic for any constant α > 1. In the revision we will add this explicit propagation both in the statement of the generic theorem (Theorem 3.1) and in the scheduling application (Corollary 5.1), together with a short sentence confirming that the bound stays O(log³ T) whenever α is constant. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper presents a generic algorithmic conversion from any offline local search procedure that terminates at an approximate optimum into an online stochastic combinatorial bandit algorithm achieving O(log^3 T) approximate regret under i.i.d. costs. The derivation relies on standard stochastic regret analysis and does not reduce any central claim to a fitted parameter, self-referential definition, or load-bearing self-citation chain. The three example applications are consistent demonstrations rather than the source of the bound. The framework is self-contained and externally falsifiable via standard bandit techniques.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard domain assumptions for stochastic bandits and combinatorial optimization; no free parameters, new entities, or ad-hoc axioms are mentioned in the abstract.

axioms (1)
  • domain assumption Costs are drawn i.i.d. from fixed but unknown distributions; regret is measured against the single best fixed action in hindsight under full information.
    This is the standard stochastic bandit model invoked by the problem statement.

pith-pipeline@v0.9.0 · 5523 in / 1176 out tokens · 46013 ms · 2026-05-10T18:05:26.594271+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

53 extracted references · 31 canonical work pages

  1. [1]

    51 Shay Solomon, Amitai Uzrad, and Tianyi Zhang

    Arpit Agarwal, Rohan Ghuge, and Viswanath Nagarajan. Semi-bandit learning for monotone stochastic optimization. In65th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2024, Chicago, IL, USA, October 27-30, 2024, pages 1260–1274. IEEE, 2024. doi: 10.1109/FOCS61266.2024.00083. URLhttps://doi.org/10.1109/FOCS61266.2024.00083

  2. [3]

    Local search heuristic for k-median and facility location problems

    Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh Munagala, and Vinayaka Pandit. Local search heuristic for k-median and facility location problems. InProceedings of the thirty-third annual ACM symposium on Theory of computing, pages 21–29, 2001

  3. [4]

    Regret in online combinatorial optimization.Math

    Jean-Yves Audibert, Sébastien Bubeck, and Gábor Lugosi. Regret in online combinatorial optimization.Math. Oper. Res., 39(1):31–45, 2014. doi: 10.1287/MOOR.2013.0598. URL https://doi.org/10.1287/moor.2013.0598

  4. [5]

    Finite-time analysis of the multiarmed bandit problem.Mach

    Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem.Mach. Learn., 47(2-3):235–256, 2002. doi: 10.1023/A:1013689704352. URL https://doi.org/10.1023/A:1013689704352

  5. [6]

    The nonstochastic multiarmed bandit problem,

    Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. The nonstochastic mul- tiarmed bandit problem.SIAM J. Comput., 32(1):48–77, 2002. doi: 10.1137/S0097539701398375. URLhttps://doi.org/10.1137/S0097539701398375

  6. [7]

    Brief announcement: Stochastic parallel scheduling with bandit feedback

    Gerdus Benadè, Rathish Das, and Thomas Lavastida. Brief announcement: Stochastic parallel scheduling with bandit feedback. InProceedings of the 37th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2025, Portland, OR, USA, 28 July 2025 - 1 August 2025, pages 618–622. ACM, 2025. doi: 10.1145/3694906.3743344. URLhttps://doi.org/10.1145/ 36...

  7. [8]

    Randomized local search for real-life inventory routing.Transportation Science, 45(3):381–398, 2011

    Thierry Benoist, Frédéric Gardi, Antoine Jeanjean, and Bertrand Estellon. Randomized local search for real-life inventory routing.Transportation Science, 45(3):381–398, 2011

  8. [9]

    Balls into bins via local search

    Paul Bogdan, Thomas Sauerwald, Alexandre Stauffer, and He Sun. Balls into bins via local search. In Sanjeev Khanna, editor,Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013, New Orleans, Louisiana, USA, January 6-8, 21 2013, pages 16–34. SIAM, 2013. doi: 10.1137/1.9781611973105.2. URLhttps://doi.org/10. 1137/1.97...

  9. [10]

    Sébastien Bubeck, Nicolò Cesa-Bianchi, and Sham M. Kakade. Towards minimax policies for online linear optimization with bandit feedback. In Shie Mannor, Nathan Srebro, and Robert C. Williamson, editors,COLT 2012 - The 25th Annual Conference on Learning Theory, June 25-27, 2012, Edinburgh, Scotland, volume 23 ofJMLR Proceedings, pages 41.1–41.14. JMLR.org,

  10. [11]

    URLhttp://proceedings.mlr.press/v23/bubeck12a/bubeck12a.pdf

  11. [12]

    Bandits with heavy tail.IEEE Trans

    Sébastien Bubeck, Nicolò Cesa-Bianchi, and Gábor Lugosi. Bandits with heavy tail.IEEE Trans. Inf. Theory, 59(11):7711–7717, 2013. doi: 10.1109/TIT.2013.2277869. URLhttps: //doi.org/10.1109/TIT.2013.2277869

  12. [13]

    Local search for a multi-drop multi-container loading problem

    Sara Ceschia and Andrea Schaerf. Local search for a multi-drop multi-container loading problem. Journal of Heuristics, 19(2):275–294, 2013

  13. [14]

    20 Michel X

    Ke Chen. A constant factor approximation algorithm fork-median clustering with outliers. In Shang-Hua Teng, editor,Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2008, San Francisco, California, USA, January 20-22, 2008, pages 826–835. SIAM, 2008. URLhttp://dl.acm.org/citation.cfm?id=1347082.1347173

  14. [15]

    Combinatorial multi-armed bandit: General framework and applications

    Wei Chen, Yajun Wang, and Yang Yuan. Combinatorial multi-armed bandit: General framework and applications. InProceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013, volume 28 ofJMLR Workshop and Conference Proceedings, pages 151–159. JMLR.org, 2013. URLhttp://proceedings.mlr.press/v28/ chen13a.html

  15. [16]

    An improved local search algorithm for k-median

    Vincent Cohen-Addad, Anupam Gupta, Lunjia Hu, Hoon Oh, and David Saulpic. An improved local search algorithm for k-median. In Joseph (Seffi) Naor and Niv Buchbinder, editors, Proceedings of the 2022 ACM-SIAM Symposium on Discrete Algorithms, SODA 2022, Virtual Conference / Alexandria, VA, USA, January 9 - 12, 2022, pages 1556–1612. SIAM, 2022. doi: 10.113...

  16. [17]

    Com- binatorial bandits revisited

    Richard Combes, Mohammad Sadegh Talebi, Alexandre Proutière, and Marc Lelarge. Com- binatorial bandits revisited. In Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett, editors,Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montre...

  17. [18]

    Approximation algorithms for clustering uncertain data

    Graham Cormode and Andrew McGregor. Approximation algorithms for clustering uncertain data. InProceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 191–200, 2008

  18. [19]

    Statistically efficient, polynomial- time algorithms for combinatorial semi-bandits.Proc

    Thibaut Cuvelier, Richard Combes, and Eric Gourdin. Statistically efficient, polynomial- time algorithms for combinatorial semi-bandits.Proc. ACM Meas. Anal. Comput. Syst., 5(1): 09:1–09:31, 2021. doi: 10.1145/3447387. URLhttps://doi.org/10.1145/3447387

  19. [20]

    The price of bandit information for online optimization

    Varsha Dani, Sham M Kakade, and Thomas Hayes. The price of bandit information for online optimization. InAdvances in Neural Information Processing Systems, volume 20. Curran Associates, Inc., 2007. URL https://proceedings.neurips.cc/paper_files/paper/2007/ file/bf62768ca46b6c3b5bea9515d1a1fc45-Paper.pdf. 22

  20. [21]

    Dubhashi and Alessandro Panconesi.Concentration of Measure for the Analysis of Randomized Algorithms

    Devdatt P. Dubhashi and Alessandro Panconesi.Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press, 2009

  21. [22]

    Schapire, Vasilis Syrgkanis, and Jennifer Wortman Vaughan

    Miroslav Dudík, Nika Haghtalab, Haipeng Luo, Robert E. Schapire, Vasilis Syrgkanis, and Jennifer Wortman Vaughan. Oracle-efficient online learning and auction design.J. ACM, 67(5): 26:1–26:57, 2020. doi: 10.1145/3402203. URLhttps://doi.org/10.1145/3402203

  22. [23]

    PAC bounds for multi-armed bandit and markov decision processes

    Eyal Even-Dar, Shie Mannor, and Yishay Mansour. PAC bounds for multi-armed bandit and markov decision processes. In Jyrki Kivinen and Robert H. Sloan, editors,Computational Learning Theory, 15th Annual Conference on Computational Learning Theory, COLT 2002, Sydney, Australia, July 8-10, 2002, Proceedings, volume 2375 ofLecture Notes in Computer Science, p...

  23. [24]

    Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems.J

    Eyal Even-Dar, Shie Mannor, and Yishay Mansour. Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems.J. Mach. Learn. Res., 7: 1079–1105, 2006. URLhttps://jmlr.org/papers/v7/evendar06a.html

  24. [25]

    Combina- torial stochastic-greedy bandit

    Fares Fourati, Christopher John Quinn, Mohamed-Slim Alouini, and Vaneet Aggarwal. Combina- torial stochastic-greedy bandit. In Michael J. Wooldridge, Jennifer G. Dy, and Sriraam Natarajan, editors,Thirty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024, Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence, IAAI 2024, F...

  25. [26]

    Oxford University Press Oxford, 2011

    András Frank.Connections in combinatorial optimization, volume 38. Oxford University Press Oxford, 2011

  26. [27]

    Partitioning into expanders

    Shayan Oveis Gharan and Luca Trevisan. Partitioning into expanders. In Chandra Chekuri, editor,Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Portland, Oregon, USA, January 5-7, 2014, pages 1256–1266. SIAM, 2014. doi: 10.1137/1.9781611973402.93. URLhttps://doi.org/10.1137/1.9781611973402.93

  27. [28]

    Tight bounds forγ-regret via the decision-estimation coefficient, 2023

    Margalit Glasgow and Alexander Rakhlin. Tight bounds forγ-regret via the decision-estimation coefficient, 2023

  28. [29]

    Exceeding expectations and clustering uncertain data

    Sudipto Guha and Kamesh Munagala. Exceeding expectations and clustering uncertain data. In Jan Paredaens and Jianwen Su, editors,Proceedings of the Twenty-Eigth ACM SIGMOD- SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2009, June 19 - July 1, 2009, Providence, Rhode Island, USA, pages 269–278. ACM, 2009. doi: 10.1145/1559795.1559836. URL...

  29. [30]

    9781611976465.165

    Anupam Gupta, Euiwoong Lee, and Jason Li. A local search-based approach for set covering. In Telikepalli Kavitha and Kurt Mehlhorn, editors,2023 Symposium on Simplicity in Algorithms, SOSA 2023, Florence, Italy, January 23-25, 2023, pages 1–11. SIAM, 2023. doi: 10.1137/1. 9781611977585.CH1. URLhttps://doi.org/10.1137/1.9781611977585.ch1

  30. [31]

    Logarithmic regret algorithms for online convex optimization.Machine Learning, 69(2–3):169–192, 2007

    Elad Hazan, Amit Agarwal, and Satyen Kale. Logarithmic regret algorithms for online convex optimization.Mach. Learn., 69(2-3):169–192, 2007. doi: 10.1007/S10994-007-5016-8. URL https://doi.org/10.1007/s10994-007-5016-8. 23

  31. [32]

    A 1.875: approximation algorithm for the stable marriage problem

    Kazuo Iwama, Shuichi Miyazaki, and Naoya Yamauchi. A 1.875: approximation algorithm for the stable marriage problem. In Nikhil Bansal, Kirk Pruhs, and Clifford Stein, editors, Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, New Orleans, Louisiana, USA, January 7-9, 2007, pages 288–297. SIAM, 2007. URL http://dl.a...

  32. [33]

    Kakade, Adam Tauman Kalai, and Katrina Ligett

    Sham M. Kakade, Adam Tauman Kalai, and Katrina Ligett. Playing games with approximation algorithms.SIAM J. Comput., 39(3):1088–1106, 2009. doi: 10.1137/070701704. URLhttps: //doi.org/10.1137/070701704

  33. [34]

    Adam Tauman Kalai and Santosh S. Vempala. Efficient algorithms for online decision problems. J. Comput. Syst. Sci., 71(3):291–307, 2005. doi: 10.1016/J.JCSS.2004.10.016. URLhttps: //doi.org/10.1016/j.jcss.2004.10.016

  34. [35]

    Multi-armed bandit-based hyper-heuristics for combinatorial optimizationproblems.Eur

    Felipe Lagos and Jordi Pereira. Multi-armed bandit-based hyper-heuristics for combinatorial optimizationproblems.Eur. J. Oper. Res., 312(1):70–91, 2024. doi: 10.1016/J.EJOR.2023.06.016. URLhttps://doi.org/10.1016/j.ejor.2023.06.016

  35. [36]

    Advances in Applied Mathematics , volume =

    T.L Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules.Adv. Appl. Math., 6(1):4–22, March 1985. ISSN 0196-8858. doi: 10.1016/0196-8858(85)90002-8. URL https://doi.org/10.1016/0196-8858(85)90002-8

  36. [37]

    Cambridge University Press, 2020

    Tor Lattimore and Csaba Szepesvári.Bandit Algorithms. Cambridge University Press, 2020

  37. [38]

    Schapire

    Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. A contextual-bandit approach to personalized news article recommendation. In Michael Rappa, Paul Jones, Juliana Freire, and Soumen Chakrabarti, editors,Proceedings of the 19th International Conference on World Wide Web, WWW 2010, Raleigh, North Carolina, USA, April 26-30, 2010, pages 661–670. ACM,

  38. [39]

    URLhttps://doi.org/10.1145/1772690.1772758

    doi: 10.1145/1772690.1772758. URLhttps://doi.org/10.1145/1772690.1772758

  39. [40]

    Permutation predictions for non-clairvoyant scheduling

    Alexander Lindermayr and Nicole Megow. Permutation predictions for non-clairvoyant scheduling. In Kunal Agrawal and I-Ting Angelina Lee, editors,SPAA ’22: 34th ACM Symposium on Parallelism in Algorithms and Architectures, Philadelphia, PA, USA, July 11 - 14, 2022, pages 357–368. ACM, 2022. doi: 10.1145/3490148.3538579. URL https: //doi.org/10.1145/3490148.3538579

  40. [41]

    Stochastic local search and machine learning: From theory to applications and vice versa

    Ole Jakob Mengshoel, Tong Yu, and Ming Zeng. Stochastic local search and machine learning: From theory to applications and vice versa. In Giuseppe De Giacomo, Alejandro Catalá, Bistra Dilkina, Michela Milano, Senén Barro, Alberto Bugarín, and Jérôme Lang, editors, ECAI 2020 - 24th European Conference on Artificial Intelligence, 29 August-8 September 2020,...

  41. [42]

    Local search for shift design.European journal of operational research, 153(1):51–64, 2004

    Nysret Musliu, Andrea Schaerf, and Wolfgang Slany. Local search for shift design.European journal of operational research, 153(1):51–64, 2004

  42. [43]

    Importance weighting without importance weights: An efficient algorithm for combinatorial semi-bandits.J

    Gergely Neu and Gábor Bartók. Importance weighting without importance weights: An efficient algorithm for combinatorial semi-bandits.J. Mach. Learn. Res., 17:154:1–154:21, 2016. URL https://jmlr.org/papers/v17/15-091.html. 24

  43. [44]

    Wang, Fransisca Susan, and Ashwinkumar Badani- diyuru

    Rad Niazadeh, Negin Golrezaei, Joshua R. Wang, Fransisca Susan, and Ashwinkumar Badani- diyuru. Online learning via offline greedy algorithms: Applications in market design and optimization.Manag. Sci., 69(7):3797–3817, 2023. doi: 10.1287/MNSC.2022.4558. URL https://doi.org/10.1287/mnsc.2022.4558

  44. [45]

    Exploiting structure of uncertainty for efficient matroid semi-bandits

    Pierre Perrault, Vianney Perchet, and Michal Valko. Exploiting structure of uncertainty for efficient matroid semi-bandits. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors,Pro- ceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 ofProceedings of Machine Learning Resea...

  45. [46]

    Rothkopf

    Michael H. Rothkopf. Scheduling with random service times.Management Science, 12(9): 707–713, 1966. ISSN 00251909, 15265501. URLhttp://www.jstor.org/stable/2627947

  46. [47]

    Shai Shalev-Shwartz and Sham M. Kakade. Mind the duality gap: Logarithmic regret algorithms for online optimization. In Daphne Koller, Dale Schuurmans, Yoshua Ben- gio, and Léon Bottou, editors,Advances in Neural Information Processing Systems 21, Proceedings of the Twenty-Second Annual Conference on Neural Information Processing Systems, Vancouver, Briti...

  47. [48]

    Wayne E. Smith. Various optimizers for single-stage production.Naval Research Logistics Quarterly, 3(1-2):59–66, 1956. doi: https://doi.org/10.1002/nav.3800030106. URL https: //onlinelibrary.wiley.com/doi/abs/10.1002/nav.3800030106

  48. [49]

    Online allocation and pricing: Constant regret via bellman inequalities.Oper

    Alberto Vera, Siddhartha Banerjee, and Itai Gurvich. Online allocation and pricing: Constant regret via bellman inequalities.Oper. Res., 69(3):821–840, 2021. doi: 10.1287/OPRE.2020.2061. URLhttps://doi.org/10.1287/opre.2020.2061

  49. [50]

    Efficient learning in large-scale combina- torial semi-bandits

    Zheng Wen, Branislav Kveton, and Azin Ashkan. Efficient learning in large-scale combina- torial semi-bandits. In Francis R. Bach and David M. Blei, editors,Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, volume 37 ofJMLR Workshop and Conference Proceedings, pages 1113–1122. JMLR.org, 2015. UR...

  50. [51]

    Logarithmic regret in feature-based dynamic pricing

    Jianyu Xu and Yu-Xiang Wang. Logarithmic regret in feature-based dynamic pricing. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wort- man Vaughan, editors,Advances in Neural Information Processing Systems 34: Annual Con- ference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtu...

  51. [52]

    Dhillon, and Sujay Sanghavi

    Shuo Yang, Tongzheng Ren, Sanjay Shakkottai, Eric Price, Inderjit S. Dhillon, and Sujay Sanghavi. Linear bandit algorithms with sublinear time complexity. InInternational Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 ofProceedings of Machine Learning Research, pages 25241–25260. PMLR, 2022. URL https://pr...

  52. [53]

    Mengshoel

    Tong Yu, Branislav Kveton, and Ole J. Mengshoel. Thompson sampling for optimizing stochastic local search. In Michelangelo Ceci, Jaakko Hollmén, Ljupco Todorovski, Celine Vens, and 25 Saso Dzeroski, editors,Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2017, Skopje, Macedonia, September 18-22, 2017, Proceedings, Pa...

  53. [54]

    Bandmaxsat: A local search maxsat solver with multi-armed bandit

    Jiongzhi Zheng, Kun He, Jianrong Zhou, Yan Jin, Chu-Min Li, and Felip Manyà. Bandmaxsat: A local search maxsat solver with multi-armed bandit. In Luc De Raedt, editor,Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022, pages 1901–1907. ijcai.org, 2022. doi: 10.24963/IJCAI...