pith. machine review for the scientific record. sign in

arxiv: 2605.10151 · v1 · submitted 2026-05-11 · 💻 cs.LG · cs.SY· eess.SY· math.OC

Recognition: 2 theorem links

· Lean Theorem

Learning to Sparsify Stochastic Linear Bandits

Carla Fabiana Chiasserini, Lintao Ye, Ming Chi, Zhengmiao Wang, Zhi-Wei Liu

Pith reviewed 2026-05-12 02:51 UTC · model grok-4.3

classification 💻 cs.LG cs.SYeess.SYmath.OC
keywords stochastic linear banditssparse actionsregret minimizationphased explorationordinary least squaresgreedy approximationconvex action sets
0
0 comments X

The pith

An adaptively phased algorithm for stochastic linear bandits with sparsity constraints achieves Õ(d √T) regret when sparse actions are efficiently selectable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework that phases between exploring with ordinary least squares estimation of the linear parameter and exploiting with sparse action selections. For action sets where the optimal sparse action is computable, such as the Euclidean ball, it proves a regret bound of order d times the square root of the time horizon. In harder cases, it uses a greedy method to approximate the sparse action, yielding an alpha-approximate regret of similar order for strongly convex sets and a worse T to the two-thirds power for others. This addresses the challenge of combinatorial optimization in bandit action selection while keeping cumulative regret sublinear in T. Readers interested in high-dimensional decision making under uncertainty would care because sparsity constraints arise naturally in applications like feature selection or resource allocation.

Core claim

The central discovery is that by adaptively alternating between OLS-based parameter learning and specialized subroutines for selecting sparse actions, one can achieve sublinear regret in stochastic linear bandits subject to sparsity constraints on the action vectors. When the action set permits efficient computation of the optimal sparse action, such as a Euclidean ball, the regret is Õ(d √T). For general strongly convex compact action sets, a greedy subroutine yields Õ(d √T) α-regret, while for general compact sets the bound is Õ(d T^{2/3}) α-regret, where α is the approximation ratio of the greedy algorithm.

What carries the argument

The adaptively phased exploration-exploitation framework that integrates ordinary least squares estimation with exact or greedy-based sparse action selection subroutines.

If this is right

  • For Euclidean ball action sets, the regret bound is optimal up to logarithmic factors.
  • The α-regret bounds extend the framework to intractable sparse selection problems via approximation.
  • Extensive experiments validate the approach including in recommendation systems.
  • The bounds assume standard stochastic linear bandit conditions like sub-Gaussian noise.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This method could be extended to other combinatorial constraints if suitable approximation algorithms are available for action selection.
  • The gap between the √T and T^{2/3} bounds suggests potential for better approximations in non-strongly convex settings.
  • The approach connects sparse bandits to sparse regression techniques through the use of OLS and greedy selection.

Load-bearing premise

The key assumption is that sparse action selection is either exactly solvable in polynomial time or admits a constant-factor approximation via the greedy algorithm for the given action set.

What would settle it

Observing that the empirical regret on a Euclidean ball instance exceeds order d √T for large T, or that the greedy subroutine fails to achieve the claimed approximation ratio α on a strongly convex set, would falsify the regret guarantees.

Figures

Figures reproduced from arXiv: 2605.10151 by Carla Fabiana Chiasserini, Lintao Ye, Ming Chi, Zhengmiao Wang, Zhi-Wei Liu.

Figure 1
Figure 1. Figure 1: Numerical results for the APSEE-G algorithm on the ellipsoid action set. [PITH_FULL_IMAGE:figures/full_fig_p012_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Cumulative α-regret of the APSEE-G algorithm under different choices of basis actions in the exploration phase. Empirical Verification of Support Recovery. While the recovery of the true support S ⋆ is theoretically justified in Lemmas 2 and 3, we show that this guarantee holds in practical evaluations. To this end, we conduct additional experiments to track the real-time support recovery process of the AP… view at source ↗
Figure 3
Figure 3. Figure 3: Support recovery performance of the APSEE algorithm over time under standard and adversarial [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: α-RT /T vs. T under the ellipsoid case. Case 1: Euclidean Action Sets (ℓ2-Ball) Consider the action set X is the unit Euclidean ball {x ∈ R d : ∥x∥2 ≤ 1}. In this setting, we implement the APSEE algorithm (Algorithm 1). The exact oracle efficiently selects the top-H indices of the estimated parameter ˆθc based on absolute magnitude [PITH_FULL_IMAGE:figures/full_fig_p036_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Additional numerical simulation results. [PITH_FULL_IMAGE:figures/full_fig_p037_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Results of the Video Recommendation scenario. [PITH_FULL_IMAGE:figures/full_fig_p038_6.png] view at source ↗
read the original abstract

This paper addresses the problem of learning to sparsify stochastic linear bandits, where a decision-maker sequentially selects actions from a high-dimensional space subject to a sparsity constraint on the number of nonzero elements in the action vector. The key challenge lies in minimizing cumulative regret while tackling the potential NP-hardness of finding optimal sparse actions due to the inherent combinatorial structure of the problem. We propose an adaptively phased exploration and exploitation algorithmic framework, utilizing ordinary least squares for parameter learning and specialized subroutines for sparse action selection. When the action set is a Euclidean ball, optimal sparse actions can be efficiently computed, enabling us to establish a $\tilde{\mathcal{O}}(d\sqrt{T})$ regret, where $d$ is the dimension of the action vector and $T$ is the time horizon length. For general convex and compact action sets where finding optimal sparse actions is intractable, we employ a greedy subroutine. For general strongly convex action sets, we derive a $\tilde{\mathcal{O}}(d \sqrt{T})$ $\alpha$-regret; for general compact sets lacking strong convexity, we establish a $\tilde{\mathcal{O}}(d T^{2/3})$ $\alpha$-regret, where $\alpha$ pertains to the approximation ratio of the greedy algorithm. Finally, we validate the performance of our algorithms using extensive experiments including an application to recommendation system.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript proposes an adaptively phased exploration-exploitation framework for stochastic linear bandits in which each action vector must satisfy a sparsity constraint (at most a fixed number of non-zero coordinates). Parameter estimation is performed via ordinary least squares, while sparse action selection is handled by an exact solver when the action set is a Euclidean ball and by a greedy α-approximation subroutine otherwise. The paper derives a Õ(d√T) regret bound for the Euclidean-ball case, a Õ(d√T) α-regret bound for strongly convex compact sets, and a Õ(d T^{2/3}) α-regret bound for general compact sets; the claims are supported by experiments on synthetic instances and a recommendation-system application.

Significance. If the stated regret bounds hold under the usual sub-Gaussian noise and bounded-reward assumptions, the work provides a clean algorithmic template that extends classical linear-bandit analysis to the practically relevant setting of sparse actions. The distinction between efficiently solvable (Euclidean ball) and approximable (greedy) cases, together with the resulting √T versus T^{2/3} regimes, is technically coherent and could serve as a reference point for subsequent sparse-bandit research. The experimental validation on a recommendation task adds modest empirical support.

major comments (2)
  1. [§4.2] §4.2 (Euclidean-ball case): the claim that the optimal sparse action can be recovered in polynomial time is central to obtaining the Õ(d√T) regret; the manuscript should explicitly state the time complexity of the subroutine (in terms of d and the sparsity level s) and verify that it does not introduce an extra logarithmic factor that would affect the final bound.
  2. [Theorem 5.3] Theorem 5.3 (general compact sets): the T^{2/3} α-regret bound relies on the phased elimination schedule and the α-approximation guarantee of the greedy subroutine; the proof sketch should clarify how the approximation error accumulates across phases and whether the constant hidden in Õ(·) depends on α in a way that remains sub-linear in T.
minor comments (3)
  1. [Abstract / §1] The abstract and introduction should explicitly define the sparsity level s (or k) and state whether it is known to the algorithm or must be learned; this parameter appears in all regret statements but is not introduced in the provided summary.
  2. [Definition 2.4 / Theorems 5.1–5.3] Notation for the α-regret (Definition 2.4) should be cross-referenced in the statement of Theorems 5.1–5.3 so that readers can immediately see how the approximation ratio enters the final bound.
  3. [Figure 3] Figure 3 (recommendation-system experiment) would benefit from an additional baseline that ignores the sparsity constraint entirely, allowing direct visual comparison of the cost of enforcing sparsity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and constructive comments on our work. We address each major comment below and will revise the manuscript accordingly to improve clarity.

read point-by-point responses
  1. Referee: [§4.2] §4.2 (Euclidean-ball case): the claim that the optimal sparse action can be recovered in polynomial time is central to obtaining the Õ(d√T) regret; the manuscript should explicitly state the time complexity of the subroutine (in terms of d and the sparsity level s) and verify that it does not introduce an extra logarithmic factor that would affect the final bound.

    Authors: We agree that an explicit statement of the time complexity strengthens the presentation. For the Euclidean-ball case with sparsity level s, the optimal sparse action is recovered by sorting the absolute values of the estimated parameter vector's coordinates and selecting the s largest (followed by normalization to satisfy the ball constraint), which requires O(d log d) time via standard sorting. This complexity is polynomial in d and independent of T. Since the number of exploration phases is O(log T) and the regret analysis accounts only for statistical estimation error (not computational overhead), no additional logarithmic factor enters the Õ(d√T) regret bound. We will add this explicit statement and verification to §4.2 in the revised manuscript. revision: yes

  2. Referee: [Theorem 5.3] Theorem 5.3 (general compact sets): the T^{2/3} α-regret bound relies on the phased elimination schedule and the α-approximation guarantee of the greedy subroutine; the proof sketch should clarify how the approximation error accumulates across phases and whether the constant hidden in Õ(·) depends on α in a way that remains sub-linear in T.

    Authors: We acknowledge that the current proof sketch is concise and would benefit from additional detail on error accumulation. In the revised version we will expand the proof of Theorem 5.3 to show that the per-phase α-approximation error (from the greedy subroutine) contributes an additive term that, under the geometrically increasing phase lengths of the elimination schedule, sums to a total of O(α d T^{2/3}). The hidden constants in the Õ notation depend on α only multiplicatively and are independent of T, so the bound remains sub-linear in T. We will include a short derivation of the error propagation to make this transparent. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper derives regret bounds for its phased OLS-based algorithm by applying standard linear bandit concentration and optimization properties of the sparse selection subroutines. The Euclidean-ball case uses exact efficient sparse optimization to obtain Õ(d√T) regret; the general cases use the stated greedy α-approximation ratio together with strong-convexity or compactness assumptions to obtain the respective √T or T^{2/3} α-regret bounds. None of these steps reduce by definition to the inputs, rename a fitted quantity as a prediction, or rest on a load-bearing self-citation chain; the analysis is self-contained against external benchmarks once the sub-Gaussian noise and bounded-reward assumptions are granted.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claims rest on standard domain assumptions for stochastic linear bandits (sub-Gaussian rewards, compact action sets) and the computational properties of the chosen subroutines. No free parameters are introduced or fitted in the abstract description, and no new entities are postulated.

axioms (1)
  • domain assumption Standard stochastic linear bandit model with sub-Gaussian noise and bounded actions
    Required for regret analysis but only implied by the problem statement in the abstract.

pith-pipeline@v0.9.0 · 5557 in / 1400 out tokens · 49965 ms · 2026-05-12T02:51:03.467641+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages

  1. [1]

    Finite-time analysis of the multiarmed bandit problem.Machine Learning, 47(2-3):235–256, 2002

    Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem.Machine Learning, 47(2-3):235–256, 2002

  2. [2]

    Cambridge University Press, 2020

    Tor Lattimore and Csaba Szepesvári.Bandit algorithms. Cambridge University Press, 2020

  3. [3]

    Improved algorithms for linear stochastic bandits

    Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. Improved algorithms for linear stochastic bandits. InProc. of Conference on Neural Information Processing Systems, pages 2312–2320, 2011. 13

  4. [4]

    A contextual-bandit approach to personal- ized news article recommendation

    Lihong Li, Wei Chu, John Langford, and Robert E Schapire. A contextual-bandit approach to personal- ized news article recommendation. InProc. of International Conference on World Wide Web, pages 661–670, 2010

  5. [5]

    Contextual bandits with linear payoff functions

    Wei Chu, Lihong Li, Lev Reyzin, and Robert Schapire. Contextual bandits with linear payoff functions. InProc. of International Conference on Artificial Intelligence and Statistics, pages 208–214, 2011

  6. [6]

    Thompson sampling for contextual bandits with linear payoffs

    Shipra Agrawal and Navin Goyal. Thompson sampling for contextual bandits with linear payoffs. In Proc. of International conference on machine learning, pages 127–135, 2013

  7. [7]

    Mostly exploration-free algorithms for contextual bandits.Management Science, 67(3):1329–1349, 2021

    Hamsa Bastani, Mohsen Bayati, and Khashayar Khosravi. Mostly exploration-free algorithms for contextual bandits.Management Science, 67(3):1329–1349, 2021

  8. [8]

    Online-to-confidence-set conversions and application to sparse stochastic bandits

    Yasin Abbasi-Yadkori, David Pal, and Csaba Szepesvari. Online-to-confidence-set conversions and application to sparse stochastic bandits. InProc. of Artificial Intelligence and Statistics, pages 1–9, 2012

  9. [9]

    A tutorial on thompson sampling.Foundations and Trends® in Machine Learning, 11(1):1–96, 2018

    Daniel J Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, Zheng Wen, et al. A tutorial on thompson sampling.Foundations and Trends® in Machine Learning, 11(1):1–96, 2018

  10. [10]

    Sparsity-agnostic lasso bandit

    Min-hwan Oh, Garud Iyengar, and Assaf Zeevi. Sparsity-agnostic lasso bandit. InProc. of International Conference on Machine Learning, pages 8271–8280, 2021

  11. [11]

    High-dimensional sparse linear bandits

    Botao Hao, Tor Lattimore, and Mengdi Wang. High-dimensional sparse linear bandits. InProc. of Conference on Neural Information Processing Systems, pages 10753–10763, 2020

  12. [12]

    Variance-aware sparse linear bandits

    Yan Dai, Ruosong Wang, and Simon Shaolei Du. Variance-aware sparse linear bandits. InProc. of International Conference on Learning Representations, pages 13657–13692, 2023

  13. [13]

    Best of both worlds model selection

    Aldo Pacchiano, Christoph Dann, and Claudio Gentile. Best of both worlds model selection. InProc. of Conference on Neural Information Processing Systems, pages 1883–1895, 2022

  14. [14]

    Sparsity-agnostic linear bandits with adaptive adversaries

    Tianyuan Jin, Kyoungseok Jang, and Nicolò Cesa-Bianchi. Sparsity-agnostic linear bandits with adaptive adversaries. InProc. of Conference on Neural Information Processing Systems, pages 42015–42047, 2024

  15. [15]

    New classes of the greedy-applicable arm feature distributions in the sparse linear bandit problem

    Koji Ichikawa, Shinji Ito, Daisuke Hatano, Hanna Sumita, Takuro Fukunaga, Naonori Kakimura, and Ken-ichi Kawarabayashi. New classes of the greedy-applicable arm feature distributions in the sparse linear bandit problem. InProc. of the AAAI Conference on Artificial Intelligence, pages 12708–12716, 2024

  16. [16]

    Combinatorial multi-armed bandit: General framework and applications

    Wei Chen, Yajun Wang, and Yang Yuan. Combinatorial multi-armed bandit: General framework and applications. InProc. of International Conference on Machine learning, pages 151–159, 2013

  17. [17]

    Combinatorial multi-armed bandit with general reward functions

    Wei Chen, Wei Hu, Fu Li, Jian Li, Yu Liu, and Pinyan Lu. Combinatorial multi-armed bandit with general reward functions. InProc. of Conference on Neural Information Processing Systems, pages 1659–1667, 2016

  18. [18]

    Yi Gai, Bhaskar Krishnamachari, and Rahul Jain. Combinatorial network optimization with unknown variables: Multi-armed bandits with linear rewards and individual observations.IEEE/ACM Transactions on Networking, 20(5):1466–1478, 2012. 14

  19. [19]

    Combinatorial stochastic-greedy bandit

    Fares Fourati, Christopher John Quinn, Mohamed-Slim Alouini, and Vaneet Aggarwal. Combinatorial stochastic-greedy bandit. InProc. of the AAAI Conference on Artificial Intelligence, pages 12052–12060, 2024

  20. [20]

    A framework for adapting offline algorithms to solve combinatorial multi-armed bandit problems with bandit feedback

    Guanyu Nie, Yididiya Y Nadew, Yanhui Zhu, Vaneet Aggarwal, and Christopher John Quinn. A framework for adapting offline algorithms to solve combinatorial multi-armed bandit problems with bandit feedback. InProc. of International Conference on Machine Learning, pages 26166–26198, 2023

  21. [21]

    Statistical and computational trade-off in multi-agent multi-armed bandits

    Filippo Vannella, Alexandre Protiuere, and Jaeseong Jeong. Statistical and computational trade-off in multi-agent multi-armed bandits. InProc. of Conference on Neural Information Processing Systems, pages 69021–69031, 2023

  22. [22]

    Best arm identification in multi-agent multi-armed bandits

    Filippo Vannella, Alexandre Proutiere, and Jaeseong Jeong. Best arm identification in multi-agent multi-armed bandits. InProc. of International Conference on Machine Learning, pages 34875–34907, 2023

  23. [23]

    An online algorithm for maximizing submodular functions

    Matthew Streeter and Daniel Golovin. An online algorithm for maximizing submodular functions. In Proc. of Conference on Neural Information Processing Systems, pages 1577–1584, 2008

  24. [24]

    Bandit multi-linear dr- submodular maximization and its applications on adversarial submodular bandits

    Zongqi Wan, Jialin Zhang, Wei Chen, Xiaoming Sun, and Zhijie Zhang. Bandit multi-linear dr- submodular maximization and its applications on adversarial submodular bandits. InProc. of Interna- tional Conference on Machine Learning, pages 35491–35524, 2023

  25. [25]

    Online submodular maximization under a matroid constraint with application to learning assignments.arXiv preprint arXiv:1407.1082, 2014

    Daniel Golovin, Andreas Krause, and Matthew Streeter. Online submodular maximization under a matroid constraint with application to learning assignments.arXiv preprint arXiv:1407.1082, 2014

  26. [26]

    Approximate submodularity and its applications: Subset selection, sparse approximation and dictionary selection.Journal of Machine Learning Research, 19(3):1–34, 2018

    Abhimanyu Das and David Kempe. Approximate submodularity and its applications: Subset selection, sparse approximation and dictionary selection.Journal of Machine Learning Research, 19(3):1–34, 2018

  27. [27]

    The price of bandit information for online optimization

    Varsha Dani, Thomas P Hayes, and Sham M Kakade. The price of bandit information for online optimization. InProc. of Conference on Neural Information Processing Systems, pages 345–352, 2007

  28. [28]

    Online mixed discrete and continuous optimization: Algorithms, regret analysis and applications.Automatica, 175:112189, 2025

    Lintao Ye, Ming Chi, Zhi-Wei Liu, Xiaoling Wang, and Vijay Gupta. Online mixed discrete and continuous optimization: Algorithms, regret analysis and applications.Automatica, 175:112189, 2025

  29. [29]

    Efficient algorithms for online decision problems.Journal of Computer and System Sciences, 71(3):291–307, 2005

    Adam Kalai and Santosh Vempala. Efficient algorithms for online decision problems.Journal of Computer and System Sciences, 71(3):291–307, 2005

  30. [30]

    Cambridge University Press, 2006

    Nicolo Cesa-Bianchi and Gábor Lugosi.Prediction, learning, and games. Cambridge University Press, 2006

  31. [31]

    Towards minimax policies for online linear optimization with bandit feedback

    Sébastien Bubeck, Nicolo Cesa-Bianchi, and Sham M Kakade. Towards minimax policies for online linear optimization with bandit feedback. InProc. of Conference on Learning Theory, pages 41–1, 2012

  32. [32]

    Linearly parameterized bandits.Mathematics of Operations Research, 35(2):395–411, 2010

    Paat Rusmevichientong and John N Tsitsiklis. Linearly parameterized bandits.Mathematics of Operations Research, 35(2):395–411, 2010

  33. [33]

    Strongly convex analysis.Sbornik: Mathematics, 187(2):259, 1996

    Evgenii Sergeevich Polovinkin. Strongly convex analysis.Sbornik: Mathematics, 187(2):259, 1996. 15

  34. [34]

    Strong convexity of sets and functions.Journal of Mathematical Economics, 9(1-2):187–205, 1982

    Jean-Philippe Vial. Strong convexity of sets and functions.Journal of Mathematical Economics, 9(1-2):187–205, 1982

  35. [35]

    Sparse approximate solutions to linear systems.SIAM Journal on Computing, 24(2):227–234, 1995

    Balas Kausik Natarajan. Sparse approximate solutions to linear systems.SIAM Journal on Computing, 24(2):227–234, 1995

  36. [36]

    An analysis of approximations for maximizing submodular set functions—i.Mathematical Programming, 14(1):265–294, 1978

    George L Nemhauser, Laurence A Wolsey, and Marshall L Fisher. An analysis of approximations for maximizing submodular set functions—i.Mathematical Programming, 14(1):265–294, 1978

  37. [37]

    Athena Scientific, 2015

    Dimitri Bertsekas.Convex optimization algorithms. Athena Scientific, 2015

  38. [38]

    Minimax optimization: The case of convex- submodular

    Arman Adibi, Aryan Mokhtari, and Hamed Hassani. Minimax optimization: The case of convex- submodular. InProc. of International Conference on Artificial Intelligence and Statistics, pages 3556–3580, 2022

  39. [39]

    Joint continuous and discrete model selection via submodularity

    Jonathan Bunton and Paulo Tabuada. Joint continuous and discrete model selection via submodularity. Journal of Machine Learning Research, 23(329):1–42, 2022

  40. [40]

    Bandit theory meets compressed sensing for high dimensional stochastic linear bandit

    Alexandra Carpentier and Rémi Munos. Bandit theory meets compressed sensing for high dimensional stochastic linear bandit. InProc. of Artificial Intelligence and Statistics, pages 190–198, 2012

  41. [41]

    Minimax rate-optimal algorithms for high-dimensional stochastic linear bandits.arXiv preprint arXiv:2505.17400, 2025

    Jingyu Liu and Yanglei Song. Minimax rate-optimal algorithms for high-dimensional stochastic linear bandits.arXiv preprint arXiv:2505.17400, 2025

  42. [42]

    Doubly-robust lasso bandit

    Gi-Soo Kim and Myunghee Cho Paik. Doubly-robust lasso bandit. InProc. of Conference on Neural Information Processing Systems, pages 5877–5887, 2019

  43. [43]

    Submodular meets spectral: Greedy algorithms for subset selection, sparse approximation and dictionary selection

    Abhimanyu Das and David Kempe. Submodular meets spectral: Greedy algorithms for subset selection, sparse approximation and dictionary selection. InProc. of International Conference on Machine Learning, pages 1057–1064, 2011

  44. [44]

    Springer Science & Business Media, 2012

    John M Danskin.The theory of max-min and its application to weapons allocation problems, volume 5. Springer Science & Business Media, 2012

  45. [45]

    Cambridge University Press, 2012

    Roger A Horn and Charles R Johnson.Matrix analysis. Cambridge University Press, 2012

  46. [46]

    Guarantees for greedy maximization of non-submodular functions with applications

    Andrew An Bian, Joachim M Buhmann, Andreas Krause, and Sebastian Tschiatschek. Guarantees for greedy maximization of non-submodular functions with applications. InProc. of International Conference on Machine Learning, pages 498–507, 2017

  47. [47]

    Cambridge University Press, 2019

    Martin J Wainwright.High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge University Press, 2019. 16 Organization of the Appendix and Notations Organization.More related work can be found in Appendix A. Appendices B, C and D supplement the omitted proofs of Sections 2, 3, 4 and 5, respectively. Appendix E provides extensions to tim...

  48. [48]

    More recently, [19] and [20] have extended these results to more general stochastic submodular rewards

    introduced the concept of utilizing an offline (α, β)-approximation oracle to achieve α-regret. More recently, [19] and [20] have extended these results to more general stochastic submodular rewards. While CMAB deals with discrete subset selection, our problem involves a hybrid decision: selecting a support set S (discrete)andassigning continuous values t...

  49. [49]

    noise” indexj∈S ⋆ c \S ⋆ and one “signal

    Consider the differenceh(S;θ 1)−h(S;θ 2): h(S;θ 1)−h(S;θ 2) =θ ⊤ 1 x⋆ 1 −sup x∈X(S) θ⊤ 2 x (a) ≤θ ⊤ 1 x⋆ 1 −θ ⊤ 2 x⋆ 1 = (θ1 −θ 2)⊤x⋆ 1 (b) ≤ ∥θ 1 −θ 2∥2 ∥x⋆ 1∥2 , where (a) holds because x⋆ 1 ∈ X(S) is a feasible candidate for θ2, and (b) follows from the Cauchy- Schwarz inequality. By the definition of LX , for any x∈ X (and thus any x∈ X(S) ), we have ...