arxiv: 2605.10151 · v1 · submitted 2026-05-11 · 💻 cs.LG · cs.SY· eess.SY· math.OC

Recognition: 2 theorem links

· Lean Theorem

Learning to Sparsify Stochastic Linear Bandits

Carla Fabiana Chiasserini, Lintao Ye, Ming Chi, Zhengmiao Wang, Zhi-Wei Liu

Pith reviewed 2026-05-12 02:51 UTC · model grok-4.3

classification 💻 cs.LG cs.SYeess.SYmath.OC

keywords stochastic linear banditssparse actionsregret minimizationphased explorationordinary least squaresgreedy approximationconvex action sets

0 comments

The pith

An adaptively phased algorithm for stochastic linear bandits with sparsity constraints achieves Õ(d √T) regret when sparse actions are efficiently selectable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework that phases between exploring with ordinary least squares estimation of the linear parameter and exploiting with sparse action selections. For action sets where the optimal sparse action is computable, such as the Euclidean ball, it proves a regret bound of order d times the square root of the time horizon. In harder cases, it uses a greedy method to approximate the sparse action, yielding an alpha-approximate regret of similar order for strongly convex sets and a worse T to the two-thirds power for others. This addresses the challenge of combinatorial optimization in bandit action selection while keeping cumulative regret sublinear in T. Readers interested in high-dimensional decision making under uncertainty would care because sparsity constraints arise naturally in applications like feature selection or resource allocation.

Core claim

The central discovery is that by adaptively alternating between OLS-based parameter learning and specialized subroutines for selecting sparse actions, one can achieve sublinear regret in stochastic linear bandits subject to sparsity constraints on the action vectors. When the action set permits efficient computation of the optimal sparse action, such as a Euclidean ball, the regret is Õ(d √T). For general strongly convex compact action sets, a greedy subroutine yields Õ(d √T) α-regret, while for general compact sets the bound is Õ(d T^{2/3}) α-regret, where α is the approximation ratio of the greedy algorithm.

What carries the argument

The adaptively phased exploration-exploitation framework that integrates ordinary least squares estimation with exact or greedy-based sparse action selection subroutines.

If this is right

For Euclidean ball action sets, the regret bound is optimal up to logarithmic factors.
The α-regret bounds extend the framework to intractable sparse selection problems via approximation.
Extensive experiments validate the approach including in recommendation systems.
The bounds assume standard stochastic linear bandit conditions like sub-Gaussian noise.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This method could be extended to other combinatorial constraints if suitable approximation algorithms are available for action selection.
The gap between the √T and T^{2/3} bounds suggests potential for better approximations in non-strongly convex settings.
The approach connects sparse bandits to sparse regression techniques through the use of OLS and greedy selection.

Load-bearing premise

The key assumption is that sparse action selection is either exactly solvable in polynomial time or admits a constant-factor approximation via the greedy algorithm for the given action set.

What would settle it

Observing that the empirical regret on a Euclidean ball instance exceeds order d √T for large T, or that the greedy subroutine fails to achieve the claimed approximation ratio α on a strongly convex set, would falsify the regret guarantees.

Figures

Figures reproduced from arXiv: 2605.10151 by Carla Fabiana Chiasserini, Lintao Ye, Ming Chi, Zhengmiao Wang, Zhi-Wei Liu.

**Figure 2.** Figure 2: Cumulative α-regret of the APSEE-G algorithm under different choices of basis actions in the exploration phase. Empirical Verification of Support Recovery. While the recovery of the true support S ⋆ is theoretically justified in Lemmas 2 and 3, we show that this guarantee holds in practical evaluations. To this end, we conduct additional experiments to track the real-time support recovery process of the AP… view at source ↗

**Figure 3.** Figure 3: Support recovery performance of the APSEE algorithm over time under standard and adversarial [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: α-RT /T vs. T under the ellipsoid case. Case 1: Euclidean Action Sets (ℓ2-Ball) Consider the action set X is the unit Euclidean ball {x ∈ R d : ∥x∥2 ≤ 1}. In this setting, we implement the APSEE algorithm (Algorithm 1). The exact oracle efficiently selects the top-H indices of the estimated parameter ˆθc based on absolute magnitude [PITH_FULL_IMAGE:figures/full_fig_p036_4.png] view at source ↗

**Figure 5.** Figure 5: Additional numerical simulation results. [PITH_FULL_IMAGE:figures/full_fig_p037_5.png] view at source ↗

**Figure 6.** Figure 6: Results of the Video Recommendation scenario. [PITH_FULL_IMAGE:figures/full_fig_p038_6.png] view at source ↗

read the original abstract

This paper addresses the problem of learning to sparsify stochastic linear bandits, where a decision-maker sequentially selects actions from a high-dimensional space subject to a sparsity constraint on the number of nonzero elements in the action vector. The key challenge lies in minimizing cumulative regret while tackling the potential NP-hardness of finding optimal sparse actions due to the inherent combinatorial structure of the problem. We propose an adaptively phased exploration and exploitation algorithmic framework, utilizing ordinary least squares for parameter learning and specialized subroutines for sparse action selection. When the action set is a Euclidean ball, optimal sparse actions can be efficiently computed, enabling us to establish a $\tilde{\mathcal{O}}(d\sqrt{T})$ regret, where $d$ is the dimension of the action vector and $T$ is the time horizon length. For general convex and compact action sets where finding optimal sparse actions is intractable, we employ a greedy subroutine. For general strongly convex action sets, we derive a $\tilde{\mathcal{O}}(d \sqrt{T})$ $\alpha$-regret; for general compact sets lacking strong convexity, we establish a $\tilde{\mathcal{O}}(d T^{2/3})$ $\alpha$-regret, where $\alpha$ pertains to the approximation ratio of the greedy algorithm. Finally, we validate the performance of our algorithms using extensive experiments including an application to recommendation system.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a phased OLS-plus-subroutine algorithm for sparse linear bandits with regret bounds that split by action-set geometry, getting Õ(d√T) when exact sparse selection is easy and worse alpha-regret otherwise.

read the letter

The core contribution is an adaptively phased algorithm that learns the linear parameter via ordinary least squares and then calls either an exact solver or a greedy subroutine to pick sparse actions. For Euclidean-ball action sets this yields the standard Õ(d√T) regret; for strongly convex sets it gives Õ(d√T) alpha-regret; for general compact sets the rate drops to Õ(d T^{2/3}) alpha-regret. That split is the main new piece and it lines up with the combinatorial difficulty of sparse selection.

Referee Report

2 major / 3 minor

Summary. The manuscript proposes an adaptively phased exploration-exploitation framework for stochastic linear bandits in which each action vector must satisfy a sparsity constraint (at most a fixed number of non-zero coordinates). Parameter estimation is performed via ordinary least squares, while sparse action selection is handled by an exact solver when the action set is a Euclidean ball and by a greedy α-approximation subroutine otherwise. The paper derives a Õ(d√T) regret bound for the Euclidean-ball case, a Õ(d√T) α-regret bound for strongly convex compact sets, and a Õ(d T^{2/3}) α-regret bound for general compact sets; the claims are supported by experiments on synthetic instances and a recommendation-system application.

Significance. If the stated regret bounds hold under the usual sub-Gaussian noise and bounded-reward assumptions, the work provides a clean algorithmic template that extends classical linear-bandit analysis to the practically relevant setting of sparse actions. The distinction between efficiently solvable (Euclidean ball) and approximable (greedy) cases, together with the resulting √T versus T^{2/3} regimes, is technically coherent and could serve as a reference point for subsequent sparse-bandit research. The experimental validation on a recommendation task adds modest empirical support.

major comments (2)

[§4.2] §4.2 (Euclidean-ball case): the claim that the optimal sparse action can be recovered in polynomial time is central to obtaining the Õ(d√T) regret; the manuscript should explicitly state the time complexity of the subroutine (in terms of d and the sparsity level s) and verify that it does not introduce an extra logarithmic factor that would affect the final bound.
[Theorem 5.3] Theorem 5.3 (general compact sets): the T^{2/3} α-regret bound relies on the phased elimination schedule and the α-approximation guarantee of the greedy subroutine; the proof sketch should clarify how the approximation error accumulates across phases and whether the constant hidden in Õ(·) depends on α in a way that remains sub-linear in T.

minor comments (3)

[Abstract / §1] The abstract and introduction should explicitly define the sparsity level s (or k) and state whether it is known to the algorithm or must be learned; this parameter appears in all regret statements but is not introduced in the provided summary.
[Definition 2.4 / Theorems 5.1–5.3] Notation for the α-regret (Definition 2.4) should be cross-referenced in the statement of Theorems 5.1–5.3 so that readers can immediately see how the approximation ratio enters the final bound.
[Figure 3] Figure 3 (recommendation-system experiment) would benefit from an additional baseline that ignores the sparsity constraint entirely, allowing direct visual comparison of the cost of enforcing sparsity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and constructive comments on our work. We address each major comment below and will revise the manuscript accordingly to improve clarity.

read point-by-point responses

Referee: [§4.2] §4.2 (Euclidean-ball case): the claim that the optimal sparse action can be recovered in polynomial time is central to obtaining the Õ(d√T) regret; the manuscript should explicitly state the time complexity of the subroutine (in terms of d and the sparsity level s) and verify that it does not introduce an extra logarithmic factor that would affect the final bound.

Authors: We agree that an explicit statement of the time complexity strengthens the presentation. For the Euclidean-ball case with sparsity level s, the optimal sparse action is recovered by sorting the absolute values of the estimated parameter vector's coordinates and selecting the s largest (followed by normalization to satisfy the ball constraint), which requires O(d log d) time via standard sorting. This complexity is polynomial in d and independent of T. Since the number of exploration phases is O(log T) and the regret analysis accounts only for statistical estimation error (not computational overhead), no additional logarithmic factor enters the Õ(d√T) regret bound. We will add this explicit statement and verification to §4.2 in the revised manuscript. revision: yes
Referee: [Theorem 5.3] Theorem 5.3 (general compact sets): the T^{2/3} α-regret bound relies on the phased elimination schedule and the α-approximation guarantee of the greedy subroutine; the proof sketch should clarify how the approximation error accumulates across phases and whether the constant hidden in Õ(·) depends on α in a way that remains sub-linear in T.

Authors: We acknowledge that the current proof sketch is concise and would benefit from additional detail on error accumulation. In the revised version we will expand the proof of Theorem 5.3 to show that the per-phase α-approximation error (from the greedy subroutine) contributes an additive term that, under the geometrically increasing phase lengths of the elimination schedule, sums to a total of O(α d T^{2/3}). The hidden constants in the Õ notation depend on α only multiplicatively and are independent of T, so the bound remains sub-linear in T. We will include a short derivation of the error propagation to make this transparent. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper derives regret bounds for its phased OLS-based algorithm by applying standard linear bandit concentration and optimization properties of the sparse selection subroutines. The Euclidean-ball case uses exact efficient sparse optimization to obtain Õ(d√T) regret; the general cases use the stated greedy α-approximation ratio together with strong-convexity or compactness assumptions to obtain the respective √T or T^{2/3} α-regret bounds. None of these steps reduce by definition to the inputs, rename a fitted quantity as a prediction, or rest on a load-bearing self-citation chain; the analysis is self-contained against external benchmarks once the sub-Gaussian noise and bounded-reward assumptions are granted.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claims rest on standard domain assumptions for stochastic linear bandits (sub-Gaussian rewards, compact action sets) and the computational properties of the chosen subroutines. No free parameters are introduced or fitted in the abstract description, and no new entities are postulated.

axioms (1)

domain assumption Standard stochastic linear bandit model with sub-Gaussian noise and bounded actions
Required for regret analysis but only implied by the problem statement in the abstract.

pith-pipeline@v0.9.0 · 5557 in / 1400 out tokens · 49965 ms · 2026-05-12T02:51:03.467641+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
We propose an adaptively phased exploration and exploitation algorithmic framework, utilizing ordinary least squares for parameter learning and specialized subroutines for sparse action selection... ˜O(d√T) regret... α-regret
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
Proposition 1 (Lipschitz Continuity)... strongly convex... submodularity ratio γ

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages

[1]

Finite-time analysis of the multiarmed bandit problem.Machine Learning, 47(2-3):235–256, 2002

Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem.Machine Learning, 47(2-3):235–256, 2002

work page 2002
[2]

Cambridge University Press, 2020

Tor Lattimore and Csaba Szepesvári.Bandit algorithms. Cambridge University Press, 2020

work page 2020
[3]

Improved algorithms for linear stochastic bandits

Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. Improved algorithms for linear stochastic bandits. InProc. of Conference on Neural Information Processing Systems, pages 2312–2320, 2011. 13

work page 2011
[4]

A contextual-bandit approach to personal- ized news article recommendation

Lihong Li, Wei Chu, John Langford, and Robert E Schapire. A contextual-bandit approach to personal- ized news article recommendation. InProc. of International Conference on World Wide Web, pages 661–670, 2010

work page 2010
[5]

Contextual bandits with linear payoff functions

Wei Chu, Lihong Li, Lev Reyzin, and Robert Schapire. Contextual bandits with linear payoff functions. InProc. of International Conference on Artificial Intelligence and Statistics, pages 208–214, 2011

work page 2011
[6]

Thompson sampling for contextual bandits with linear payoffs

Shipra Agrawal and Navin Goyal. Thompson sampling for contextual bandits with linear payoffs. In Proc. of International conference on machine learning, pages 127–135, 2013

work page 2013
[7]

Mostly exploration-free algorithms for contextual bandits.Management Science, 67(3):1329–1349, 2021

Hamsa Bastani, Mohsen Bayati, and Khashayar Khosravi. Mostly exploration-free algorithms for contextual bandits.Management Science, 67(3):1329–1349, 2021

work page 2021
[8]

Online-to-confidence-set conversions and application to sparse stochastic bandits

Yasin Abbasi-Yadkori, David Pal, and Csaba Szepesvari. Online-to-confidence-set conversions and application to sparse stochastic bandits. InProc. of Artificial Intelligence and Statistics, pages 1–9, 2012

work page 2012
[9]

A tutorial on thompson sampling.Foundations and Trends® in Machine Learning, 11(1):1–96, 2018

Daniel J Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, Zheng Wen, et al. A tutorial on thompson sampling.Foundations and Trends® in Machine Learning, 11(1):1–96, 2018

work page 2018
[10]

Sparsity-agnostic lasso bandit

Min-hwan Oh, Garud Iyengar, and Assaf Zeevi. Sparsity-agnostic lasso bandit. InProc. of International Conference on Machine Learning, pages 8271–8280, 2021

work page 2021
[11]

High-dimensional sparse linear bandits

Botao Hao, Tor Lattimore, and Mengdi Wang. High-dimensional sparse linear bandits. InProc. of Conference on Neural Information Processing Systems, pages 10753–10763, 2020

work page 2020
[12]

Variance-aware sparse linear bandits

Yan Dai, Ruosong Wang, and Simon Shaolei Du. Variance-aware sparse linear bandits. InProc. of International Conference on Learning Representations, pages 13657–13692, 2023

work page 2023
[13]

Best of both worlds model selection

Aldo Pacchiano, Christoph Dann, and Claudio Gentile. Best of both worlds model selection. InProc. of Conference on Neural Information Processing Systems, pages 1883–1895, 2022

work page 2022
[14]

Sparsity-agnostic linear bandits with adaptive adversaries

Tianyuan Jin, Kyoungseok Jang, and Nicolò Cesa-Bianchi. Sparsity-agnostic linear bandits with adaptive adversaries. InProc. of Conference on Neural Information Processing Systems, pages 42015–42047, 2024

work page 2024
[15]

New classes of the greedy-applicable arm feature distributions in the sparse linear bandit problem

Koji Ichikawa, Shinji Ito, Daisuke Hatano, Hanna Sumita, Takuro Fukunaga, Naonori Kakimura, and Ken-ichi Kawarabayashi. New classes of the greedy-applicable arm feature distributions in the sparse linear bandit problem. InProc. of the AAAI Conference on Artificial Intelligence, pages 12708–12716, 2024

work page 2024
[16]

Combinatorial multi-armed bandit: General framework and applications

Wei Chen, Yajun Wang, and Yang Yuan. Combinatorial multi-armed bandit: General framework and applications. InProc. of International Conference on Machine learning, pages 151–159, 2013

work page 2013
[17]

Combinatorial multi-armed bandit with general reward functions

Wei Chen, Wei Hu, Fu Li, Jian Li, Yu Liu, and Pinyan Lu. Combinatorial multi-armed bandit with general reward functions. InProc. of Conference on Neural Information Processing Systems, pages 1659–1667, 2016

work page 2016
[18]

Yi Gai, Bhaskar Krishnamachari, and Rahul Jain. Combinatorial network optimization with unknown variables: Multi-armed bandits with linear rewards and individual observations.IEEE/ACM Transactions on Networking, 20(5):1466–1478, 2012. 14

work page 2012
[19]

Combinatorial stochastic-greedy bandit

Fares Fourati, Christopher John Quinn, Mohamed-Slim Alouini, and Vaneet Aggarwal. Combinatorial stochastic-greedy bandit. InProc. of the AAAI Conference on Artificial Intelligence, pages 12052–12060, 2024

work page 2024
[20]

A framework for adapting offline algorithms to solve combinatorial multi-armed bandit problems with bandit feedback

Guanyu Nie, Yididiya Y Nadew, Yanhui Zhu, Vaneet Aggarwal, and Christopher John Quinn. A framework for adapting offline algorithms to solve combinatorial multi-armed bandit problems with bandit feedback. InProc. of International Conference on Machine Learning, pages 26166–26198, 2023

work page 2023
[21]

Statistical and computational trade-off in multi-agent multi-armed bandits

Filippo Vannella, Alexandre Protiuere, and Jaeseong Jeong. Statistical and computational trade-off in multi-agent multi-armed bandits. InProc. of Conference on Neural Information Processing Systems, pages 69021–69031, 2023

work page 2023
[22]

Best arm identification in multi-agent multi-armed bandits

Filippo Vannella, Alexandre Proutiere, and Jaeseong Jeong. Best arm identification in multi-agent multi-armed bandits. InProc. of International Conference on Machine Learning, pages 34875–34907, 2023

work page 2023
[23]

An online algorithm for maximizing submodular functions

Matthew Streeter and Daniel Golovin. An online algorithm for maximizing submodular functions. In Proc. of Conference on Neural Information Processing Systems, pages 1577–1584, 2008

work page 2008
[24]

Bandit multi-linear dr- submodular maximization and its applications on adversarial submodular bandits

Zongqi Wan, Jialin Zhang, Wei Chen, Xiaoming Sun, and Zhijie Zhang. Bandit multi-linear dr- submodular maximization and its applications on adversarial submodular bandits. InProc. of Interna- tional Conference on Machine Learning, pages 35491–35524, 2023

work page 2023
[25]

Online submodular maximization under a matroid constraint with application to learning assignments.arXiv preprint arXiv:1407.1082, 2014

Daniel Golovin, Andreas Krause, and Matthew Streeter. Online submodular maximization under a matroid constraint with application to learning assignments.arXiv preprint arXiv:1407.1082, 2014

work page arXiv 2014
[26]

Approximate submodularity and its applications: Subset selection, sparse approximation and dictionary selection.Journal of Machine Learning Research, 19(3):1–34, 2018

Abhimanyu Das and David Kempe. Approximate submodularity and its applications: Subset selection, sparse approximation and dictionary selection.Journal of Machine Learning Research, 19(3):1–34, 2018

work page 2018
[27]

The price of bandit information for online optimization

Varsha Dani, Thomas P Hayes, and Sham M Kakade. The price of bandit information for online optimization. InProc. of Conference on Neural Information Processing Systems, pages 345–352, 2007

work page 2007
[28]

Online mixed discrete and continuous optimization: Algorithms, regret analysis and applications.Automatica, 175:112189, 2025

Lintao Ye, Ming Chi, Zhi-Wei Liu, Xiaoling Wang, and Vijay Gupta. Online mixed discrete and continuous optimization: Algorithms, regret analysis and applications.Automatica, 175:112189, 2025

work page 2025
[29]

Efficient algorithms for online decision problems.Journal of Computer and System Sciences, 71(3):291–307, 2005

Adam Kalai and Santosh Vempala. Efficient algorithms for online decision problems.Journal of Computer and System Sciences, 71(3):291–307, 2005

work page 2005
[30]

Cambridge University Press, 2006

Nicolo Cesa-Bianchi and Gábor Lugosi.Prediction, learning, and games. Cambridge University Press, 2006

work page 2006
[31]

Towards minimax policies for online linear optimization with bandit feedback

Sébastien Bubeck, Nicolo Cesa-Bianchi, and Sham M Kakade. Towards minimax policies for online linear optimization with bandit feedback. InProc. of Conference on Learning Theory, pages 41–1, 2012

work page 2012
[32]

Linearly parameterized bandits.Mathematics of Operations Research, 35(2):395–411, 2010

Paat Rusmevichientong and John N Tsitsiklis. Linearly parameterized bandits.Mathematics of Operations Research, 35(2):395–411, 2010

work page 2010
[33]

Strongly convex analysis.Sbornik: Mathematics, 187(2):259, 1996

Evgenii Sergeevich Polovinkin. Strongly convex analysis.Sbornik: Mathematics, 187(2):259, 1996. 15

work page 1996
[34]

Strong convexity of sets and functions.Journal of Mathematical Economics, 9(1-2):187–205, 1982

Jean-Philippe Vial. Strong convexity of sets and functions.Journal of Mathematical Economics, 9(1-2):187–205, 1982

work page 1982
[35]

Sparse approximate solutions to linear systems.SIAM Journal on Computing, 24(2):227–234, 1995

Balas Kausik Natarajan. Sparse approximate solutions to linear systems.SIAM Journal on Computing, 24(2):227–234, 1995

work page 1995
[36]

An analysis of approximations for maximizing submodular set functions—i.Mathematical Programming, 14(1):265–294, 1978

George L Nemhauser, Laurence A Wolsey, and Marshall L Fisher. An analysis of approximations for maximizing submodular set functions—i.Mathematical Programming, 14(1):265–294, 1978

work page 1978
[37]

Athena Scientific, 2015

Dimitri Bertsekas.Convex optimization algorithms. Athena Scientific, 2015

work page 2015
[38]

Minimax optimization: The case of convex- submodular

Arman Adibi, Aryan Mokhtari, and Hamed Hassani. Minimax optimization: The case of convex- submodular. InProc. of International Conference on Artificial Intelligence and Statistics, pages 3556–3580, 2022

work page 2022
[39]

Joint continuous and discrete model selection via submodularity

Jonathan Bunton and Paulo Tabuada. Joint continuous and discrete model selection via submodularity. Journal of Machine Learning Research, 23(329):1–42, 2022

work page 2022
[40]

Bandit theory meets compressed sensing for high dimensional stochastic linear bandit

Alexandra Carpentier and Rémi Munos. Bandit theory meets compressed sensing for high dimensional stochastic linear bandit. InProc. of Artificial Intelligence and Statistics, pages 190–198, 2012

work page 2012
[41]

Minimax rate-optimal algorithms for high-dimensional stochastic linear bandits.arXiv preprint arXiv:2505.17400, 2025

Jingyu Liu and Yanglei Song. Minimax rate-optimal algorithms for high-dimensional stochastic linear bandits.arXiv preprint arXiv:2505.17400, 2025

work page arXiv 2025
[42]

Doubly-robust lasso bandit

Gi-Soo Kim and Myunghee Cho Paik. Doubly-robust lasso bandit. InProc. of Conference on Neural Information Processing Systems, pages 5877–5887, 2019

work page 2019
[43]

Submodular meets spectral: Greedy algorithms for subset selection, sparse approximation and dictionary selection

Abhimanyu Das and David Kempe. Submodular meets spectral: Greedy algorithms for subset selection, sparse approximation and dictionary selection. InProc. of International Conference on Machine Learning, pages 1057–1064, 2011

work page 2011
[44]

Springer Science & Business Media, 2012

John M Danskin.The theory of max-min and its application to weapons allocation problems, volume 5. Springer Science & Business Media, 2012

work page 2012
[45]

Cambridge University Press, 2012

Roger A Horn and Charles R Johnson.Matrix analysis. Cambridge University Press, 2012

work page 2012
[46]

Guarantees for greedy maximization of non-submodular functions with applications

Andrew An Bian, Joachim M Buhmann, Andreas Krause, and Sebastian Tschiatschek. Guarantees for greedy maximization of non-submodular functions with applications. InProc. of International Conference on Machine Learning, pages 498–507, 2017

work page 2017
[47]

Cambridge University Press, 2019

Martin J Wainwright.High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge University Press, 2019. 16 Organization of the Appendix and Notations Organization.More related work can be found in Appendix A. Appendices B, C and D supplement the omitted proofs of Sections 2, 3, 4 and 5, respectively. Appendix E provides extensions to tim...

work page 2019
[48]

More recently, [19] and [20] have extended these results to more general stochastic submodular rewards

introduced the concept of utilizing an offline (α, β)-approximation oracle to achieve α-regret. More recently, [19] and [20] have extended these results to more general stochastic submodular rewards. While CMAB deals with discrete subset selection, our problem involves a hybrid decision: selecting a support set S (discrete)andassigning continuous values t...

work page
[49]

noise” indexj∈S ⋆ c \S ⋆ and one “signal

Consider the differenceh(S;θ 1)−h(S;θ 2): h(S;θ 1)−h(S;θ 2) =θ ⊤ 1 x⋆ 1 −sup x∈X(S) θ⊤ 2 x (a) ≤θ ⊤ 1 x⋆ 1 −θ ⊤ 2 x⋆ 1 = (θ1 −θ 2)⊤x⋆ 1 (b) ≤ ∥θ 1 −θ 2∥2 ∥x⋆ 1∥2 , where (a) holds because x⋆ 1 ∈ X(S) is a feasible candidate for θ2, and (b) follows from the Cauchy- Schwarz inequality. By the definition of LX , for any x∈ X (and thus any x∈ X(S) ), we have ...

work page