pith. sign in

arxiv: 2605.15411 · v1 · pith:4UBO4OFWnew · submitted 2026-05-14 · 📊 stat.ML · cs.LG· math.OC

Harnessing Unimodality in Semiparametric Contextual Pricing via Oracle Price Map Learning

Pith reviewed 2026-05-19 15:15 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.OC
keywords contextual pricingsemiparametric modeloracle price mapdynamic pricingbandit convex optimizationregret boundsunimodalityscalar index
0
0 comments X

The pith

In semiparametric contextual pricing, a scalar-index pilot reduces the problem to learning a one-dimensional smooth oracle price map whose nonparametric cost is minimax sharp.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper considers contextual dynamic pricing where customer values follow a scalar-index model with unknown utility map and additive noise. The optimal price as a function of the scalar index forms an oracle price map that becomes smoother than the noise tail itself under a revenue-geometry condition ensuring unique interior revenue maximizers. ORBIT exploits this by first obtaining a scalar pilot index, then using local polynomial approximations inside trust regions solved via bandit convex optimization to learn the map. The resulting policy attains regret that separates a nonparametric term in the smoothness parameter from the usual parametric term in context dimension. A matching lower bound for fixed dimension shows the nonparametric term cannot be improved without stronger assumptions on the noise or geometry.

Core claim

Under β-Hölder smoothness of the tail function for β ≥ 2 and the revenue-geometry condition, the oracle price map u ↦ p^*(u) is itself (β-1)-smooth. The ORBIT policy takes a scalar pilot index, localizes benchmark prices per bin, and learns local polynomial approximations of the map inside trust regions via bandit convex optimization. For the linear utility model, an adaptive elliptical exploration scheme constructs the pilot online. This yields regret Õ(T^{(2β-1)/(4β-3)} + √(dT)), with a matching lower bound in the horizon dependence for fixed d that establishes minimax sharpness of the nonparametric term. The same scalar-pilot interface extends to sparse high-dimensional linear and fully H

What carries the argument

The one-dimensional oracle price map u ↦ p^*(u) induced by the scalar index and noise tail, which carries the reduction from high-dimensional contextual pricing to univariate nonparametric learning while preserving unimodality of the revenue function.

If this is right

  • The policy achieves the stated regret bound for linear utility models via adaptive elliptical exploration without context distributional assumptions.
  • A matching lower bound confirms the nonparametric oracle-map term is minimax sharp in the horizon for fixed dimension.
  • The scalar-pilot interface carries over directly to sparse high-dimensional linear utility and nonparametric H

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same index-reduction approach could simplify other semiparametric decision problems that possess revenue-like unimodal structure.
  • Trust-region local polynomial learning inside a coarse pilot might transfer to unimodal nonparametric bandits outside pricing.
  • Estimating the smoothness parameter β from observed revenue curves could allow the policy to adapt its local approximation order in practice.

Load-bearing premise

The revenue-geometry condition that produces a unique, stable, interior maximizer of expected revenue for each scalar index value u.

What would settle it

An experiment that measures the empirical regret exponent for large T with fixed d and varying known β, checking whether it tracks (2β-1)/(4β-3) or deviates from it.

read the original abstract

We study contextual dynamic pricing in a semiparametric scalar-index valuation model where the latent value is $v_t=\mu_\ast(\mathsf c_t)+\xi_t$, with an unknown utility map $\mu_\ast$ and an unknown additive noise distribution. The key decision object is the one-dimensional oracle price map $u\mapsto p^\ast(u)$ induced by the scalar index $u=\mu_\ast(\mathsf c)$ and the noise tail. Under the $\beta$-H\"older smoothness of the tail function for $\beta\geq 2$ and a revenue-geometry condition that gives a unique, stable, interior maximizer, this oracle map is itself $(\beta-1)$-smooth. We exploit such structure through $\mathsf{ORBIT}$, a modular coarse-to-fine policy that takes a scalar pilot index as input, localizes a benchmark price in each active bin, and learns a local polynomial approximation of the oracle map inside a trust region via bandit convex optimization. For the baseline linear utility model $\mu_\ast(\mathsf c)=\mathsf c^\top\theta_\ast$, an adaptive elliptical exploration scheme constructs the required scalar pilot online without distributional assumptions on the contexts. The resulting policy achieves regret $\widetilde{O}\big(T^{\frac{2\beta-1}{4\beta-3}}+\sqrt{dT}\big)$. For fixed $d$, we establish a matching lower bound in the horizon dependence, unveiling that the nonparametric oracle-map learning term is minimax sharp. The same scalar-pilot interface also yields extensions to sparse high-dimensional linear utility and nonparametric H\"older utility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper studies contextual dynamic pricing under a semiparametric scalar-index model v_t = μ_*(c_t) + ξ_t. It defines the one-dimensional oracle price map p^*(u) induced by the index u = μ_*(c) and the noise tail. Under β-Hölder smoothness (β ≥ 2) of the tail and a revenue-geometry condition guaranteeing a unique stable interior maximizer for each expected-revenue curve r_u(p), the map p^*(u) is (β-1)-smooth. The ORBIT policy uses a scalar pilot index, coarse-to-fine binning, and local polynomial approximation of p^* inside trust regions via bandit convex optimization (BCO). For linear μ_*(c) = c^⊤ θ_*, an adaptive elliptical exploration constructs the pilot online. The policy attains regret Õ(T^{(2β-1)/(4β-3)} + √(dT)); a matching lower bound in the T-exponent is shown for fixed d, establishing minimax sharpness of the nonparametric term. Extensions to sparse high-d linear and nonparametric Hölder utility are outlined.

Significance. If the revenue-geometry condition is shown to deliver uniform (β-1)-Hölder smoothness of p^*(u) with explicit curvature margins, the result supplies a sharp rate that cleanly separates the nonparametric oracle-map learning cost from the parametric pilot cost. The matching lower bound and the modular scalar-pilot interface (enabling extensions) are notable strengths. The work advances semiparametric contextual pricing by exploiting unimodality of revenue curves without requiring full nonparametric estimation of the value function.

major comments (3)
  1. [Assumption 2] Assumption 2 (Revenue-Geometry Condition): The statement guarantees a unique stable interior maximizer but does not supply an explicit uniform lower bound on the second derivative of r_u(p) away from the boundary or a margin condition on the maximizer location. Without this, the derivation that p^*(u) is exactly (β-1)-Hölder (invoked for bias control of the local polynomial inside the trust region) may only hold pointwise rather than uniformly over the relevant range of u; this directly affects the localization error in the coarse-to-fine binning step and the claimed regret exponent.
  2. [§4.2] §4.2 (Local Polynomial Approximation via BCO): The trust-region analysis assumes that the pilot index localizes the active bin tightly enough for the BCO subroutine to achieve the nonparametric rate. If the curvature margin in Assumption 2 is not uniform, the bias term in the local polynomial fit can degrade from O(h^β) to a slower rate, undermining the overall Õ(T^{(2β-1)/(4β-3)}) bound; the proof sketch does not quantify the propagation of this bias through the coarse-to-fine schedule.
  3. [Theorem 4.1] Theorem 4.1 (Regret Upper Bound): The upper bound is stated under the (β-1)-smoothness of p^*; however, the localization argument for the pilot index (elliptical exploration) and the subsequent trust-region radius selection appear to rely on the same smoothness without an intermediate lemma establishing uniform Hölder continuity from the geometry condition. A gap here would make the rate non-sharp even if the lower bound holds.
minor comments (3)
  1. [§3] Notation: the pilot index is sometimes denoted û_t and sometimes ũ_t; consistent use would improve readability.
  2. [Theorem 5.1] The abstract claims the lower bound is 'matching in the horizon dependence'; the precise statement in Theorem 5.1 should clarify whether the constant factors and the d-dependence are also matched or only the T-exponent.
  3. [Figure 1] Figure 1 (schematic of ORBIT): the trust-region visualization would benefit from an explicit annotation of the bin width h and the local polynomial degree.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thorough and constructive review. The comments correctly identify the need for greater explicitness regarding uniformity in the revenue-geometry condition and for additional intermediate steps in the analysis to support the claimed rates. We address each major comment below and will incorporate the suggested clarifications and lemmas in the revision.

read point-by-point responses
  1. Referee: [Assumption 2] Assumption 2 (Revenue-Geometry Condition): The statement guarantees a unique stable interior maximizer but does not supply an explicit uniform lower bound on the second derivative of r_u(p) away from the boundary or a margin condition on the maximizer location. Without this, the derivation that p^*(u) is exactly (β-1)-Hölder (invoked for bias control of the local polynomial inside the trust region) may only hold pointwise rather than uniformly over the relevant range of u; this directly affects the localization error in the coarse-to-fine binning step and the claimed regret exponent.

    Authors: We agree that the current statement of Assumption 2 would benefit from an explicit uniform curvature margin to guarantee uniformity. In the revision we will augment Assumption 2 with a uniform lower bound on |r_u''(p^*(u))| (derived from the existing revenue-geometry condition and the interior-maximizer requirement) that holds over the compact range of u relevant to the problem. A new supporting lemma will then establish that this margin, together with the β-Hölder smoothness of the tail, implies uniform (β-1)-Hölder continuity of p^*(u). This directly controls the localization error in the coarse-to-fine schedule and removes any ambiguity between pointwise and uniform smoothness. revision: yes

  2. Referee: [§4.2] §4.2 (Local Polynomial Approximation via BCO): The trust-region analysis assumes that the pilot index localizes the active bin tightly enough for the BCO subroutine to achieve the nonparametric rate. If the curvature margin in Assumption 2 is not uniform, the bias term in the local polynomial fit can degrade from O(h^β) to a slower rate, undermining the overall Õ(T^{(2β-1)/(4β-3)}) bound; the proof sketch does not quantify the propagation of this bias through the coarse-to-fine schedule.

    Authors: We thank the referee for noting the need to quantify bias propagation. With the uniform curvature margin added to Assumption 2, the bias of the local polynomial estimator remains O(h^β) uniformly inside each trust region. In the revised §4.2 we will expand the error decomposition to track how the pilot localization error determines the trust-region radius and how this radius interacts with the binning schedule. The resulting bounds will show that the nonparametric term stays Õ(T^{(2β-1)/(4β-3)}) without degradation, provided the pilot index satisfies the localization rate already established for the linear case. revision: yes

  3. Referee: [Theorem 4.1] Theorem 4.1 (Regret Upper Bound): The upper bound is stated under the (β-1)-smoothness of p^*; however, the localization argument for the pilot index (elliptical exploration) and the subsequent trust-region radius selection appear to rely on the same smoothness without an intermediate lemma establishing uniform Hölder continuity from the geometry condition. A gap here would make the rate non-sharp even if the lower bound holds.

    Authors: We acknowledge that an explicit intermediate step is desirable for clarity. We will insert a new lemma (placed after the definition of p^* and before the policy description) that derives uniform (β-1)-Hölder continuity of p^*(u) directly from the (revised) revenue-geometry condition and the β-Hölder tail smoothness. The proof of Theorem 4.1 will then invoke this lemma to justify both the elliptical-exploration localization and the trust-region radius choice. With this addition the upper-bound argument is self-contained and the matching lower bound continues to establish minimax sharpness of the nonparametric term. revision: yes

Circularity Check

0 steps flagged

No circularity: theoretical derivation is self-contained

full rationale

The paper derives the oracle price map smoothness from the stated β-Hölder tail and revenue-geometry condition, then builds the ORBIT policy and regret bound from that smoothness via local polynomial approximation and bandit convex optimization. No step reduces the target regret expression to a fitted quantity or self-citation by construction; the pilot index, binning, and trust-region localization are constructed from external assumptions rather than from the final bound. The matching lower bound is established separately for fixed d. This is a standard non-circular theoretical analysis.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on two domain assumptions (Hölder smoothness of the noise tail and the revenue-geometry condition) plus the standard technical machinery of local polynomial regression and bandit convex optimization; no free parameters are fitted inside the regret bound itself and no new entities are postulated.

axioms (2)
  • domain assumption The tail function of the additive noise is β-Hölder smooth for β ≥ 2.
    Stated in the abstract as the condition that makes the oracle price map itself (β-1)-smooth.
  • domain assumption Revenue-geometry condition ensuring a unique, stable, interior maximizer for each scalar index u.
    Invoked to guarantee that the oracle map is well-defined and inherits the stated smoothness.

pith-pipeline@v0.9.0 · 5841 in / 1559 out tokens · 38049 ms · 2026-05-19T15:15:31.851902+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    Under the β-Hölder smoothness of the tail function for β≥2 and a revenue-geometry condition that gives a unique, stable, interior maximizer, this oracle map is itself (β−1)-smooth.

  • IndisputableMonolith/Foundation/BranchSelection.lean branch_selection echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    the global quadratic growth bounds hold that for all p∈[0,pmax], σ_r/2 |p−p∗(u)|² ≤ r(u,p∗(u))−r(u,p) ≤ L_r/2 |p−p∗(u)|²

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 1 internal anchor

  1. [1]

    Proceedings of IEEE 36th Annual Foundations of Computer Science , pages=

    Gambling in a rigged casino: The adversarial multi-armed bandit problem , author=. Proceedings of IEEE 36th Annual Foundations of Computer Science , pages=. 1995 , organization=

  2. [2]

    SIAM Journal on Computing , volume=

    The nonstochastic multiarmed bandit problem , author=. SIAM Journal on Computing , volume=

  3. [3]

    Revenue Maximization Under Sequential Price Competition Via The Estimation Of s-Concave Demand Functions

    Revenue Maximization Under Sequential Price Competition Via The Estimation Of s-Concave Demand Functions , author=. arXiv preprint arXiv:2503.16737 , year=

  4. [4]

    A Distribution-Free Theory of Nonparametric Regression , publisher =

    L. A Distribution-Free Theory of Nonparametric Regression , publisher =

  5. [5]

    arXiv preprint arXiv:2405.06866 , year=

    Dynamic contextual pricing with doubly non-parametric random utility models , author=. arXiv preprint arXiv:2405.06866 , year=

  6. [6]

    Operations Research , volume=

    Online decision making with high-dimensional covariates , author=. Operations Research , volume=. 2020 , publisher=

  7. [7]

    Conference on Learning Theory , pages=

    Smooth contextual bandits: Bridging the parametric and non-differentiable regret regimes , author=. Conference on Learning Theory , pages=. 2020 , organization=

  8. [8]

    A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers , author=

  9. [9]

    2020 IEEE International Symposium on Information Theory (ISIT) , pages=

    Multi-product dynamic pricing in high-dimensions with heterogeneous price sensitivity , author=. 2020 IEEE International Symposium on Information Theory (ISIT) , pages=. 2020 , organization=

  10. [10]

    2011 , publisher=

    Statistics for high-dimensional data: methods, theory and applications , author=. 2011 , publisher=

  11. [11]

    Advances in Neural Information Processing Systems , volume=

    High-dimensional sparse linear bandits , author=. Advances in Neural Information Processing Systems , volume=

  12. [12]

    Proceedings of the fourteenth international conference on artificial intelligence and statistics , pages=

    Contextual bandits with linear payoff functions , author=. Proceedings of the fourteenth international conference on artificial intelligence and statistics , pages=. 2011 , organization=

  13. [13]

    Bandit convex optimisation,

    Bandit convex optimisation , author=. arXiv preprint arXiv:2402.06535 , year=

  14. [14]

    arXiv preprint arXiv:2406.06506 , year=

    Online Newton method for bandit convex optimisation , author=. arXiv preprint arXiv:2406.06506 , year=

  15. [15]

    Conference on Learning Theory , pages=

    Multi-scale exploration of convex functions and bandit convex optimization , author=. Conference on Learning Theory , pages=. 2016 , organization=

  16. [16]

    Journal of the ACM (JACM) , volume=

    Kernel-based methods for bandit convex optimization , author=. Journal of the ACM (JACM) , volume=. 2021 , publisher=

  17. [17]

    Advances in Neural Information Processing Systems , volume=

    Bandit convex optimization: Towards tight bounds , author=. Advances in Neural Information Processing Systems , volume=

  18. [18]

    arXiv preprint arXiv:2106.00444 , year=

    Minimax regret for bandit convex optimisation of ridge functions , author=. arXiv preprint arXiv:2106.00444 , year=

  19. [19]

    Tsybakov , title =

    Alexandre B. Tsybakov , title =

  20. [20]

    , title =

    den Boer, Arnoud V. , title =. Surveys in Operations Research and Management Science , volume =. 2015 , doi =

  21. [21]

    Management Science , volume =

    Lobel, Ilan , title =. Management Science , volume =. 2020 , doi =

  22. [22]

    Service Science , volume =

    Chen, Ningyuan and Hu, Ming , title =. Service Science , volume =. 2023 , doi =

  23. [23]

    Operations Research , volume =

    Broder, Josef and Rusmevichientong, Paat , title =. Operations Research , volume =. 2012 , doi =

  24. [24]

    Journal of Machine Learning Research , volume =

    Javanmard, Adel and Nazerzadeh, Hamid , title =. Journal of Machine Learning Research , volume =. 2019 , url =

  25. [25]

    and Lobel, Ilan and Paes Leme, Renato , title =

    Cohen, Maxime C. and Lobel, Ilan and Paes Leme, Renato , title =. Management Science , volume =. 2020 , doi =

  26. [26]

    Bora , title =

    Ban, Gah-Yi and Keskin, N. Bora , title =. Management Science , volume =. 2021 , doi =

  27. [27]

    Operations Research , volume =

    Chen, Ningyuan and Gallego, Guillermo , title =. Operations Research , volume =. 2021 , doi =

  28. [28]

    Operations Research , volume =

    Gong, Xueping and You, Wei and Zhang, Jiheng , title =. Operations Research , volume =. 2025 , doi =

  29. [29]

    Advances in Neural Information Processing Systems , volume =

    Xu, Jianyu and Wang, Yu-Xiang , title =. Advances in Neural Information Processing Systems , volume =. 2021 , url =

  30. [30]

    Proceedings of The 25th International Conference on Artificial Intelligence and Statistics , series =

    Xu, Jianyu and Wang, Yu-Xiang , title =. Proceedings of The 25th International Conference on Artificial Intelligence and Statistics , series =. 2022 , publisher =

  31. [31]

    Proceedings of the 40th International Conference on Machine Learning , series =

    Choi, Young-Geun and Kim, Gi-Soo and Choi, Yunseo and Cho, Wooseong and Paik, Myunghee Cho and Oh, Min-Hwan , title =. Proceedings of the 40th International Conference on Machine Learning , series =. 2023 , publisher =

  32. [32]

    Mathematics of Operations Research , volume =

    Luo, Yiyun and Sun, Will Wei and Liu, Yufeng , title =. Mathematics of Operations Research , volume =. 2023 , doi =

  33. [33]

    Advances in Neural Information Processing Systems , volume =

    Luo, Yiyun and Sun, Will Wei and Liu, Yufeng , title =. Advances in Neural Information Processing Systems , volume =. 2022 , url =

  34. [34]

    Advances in Neural Information Processing Systems , volume =

    Tullii, Matilde and Gaucher, Solenne and Merlis, Nadav and Perchet, Vianney , title =. Advances in Neural Information Processing Systems , volume =. 2024 , doi =

  35. [35]

    Journal of the American Statistical Association , volume =

    Fan, Jianqing and Guo, Yongyi and Yu, Mengxin , title =. Journal of the American Statistical Association , volume =. 2024 , doi =

  36. [36]

    2025 , note =

    Wang, Yining and Chen, Boxiao , title =. 2025 , note =. doi:10.2139/ssrn.5133677 , url =

  37. [37]

    International Conference on Learning Representations , year =

    Han, Yuxuan and Xu, Xiaocong and Wen, Yuxiao and Han, Yanjun and Lobel, Ilan and Zhou, Zhengyuan , title =. International Conference on Learning Representations , year =

  38. [38]

    Management Science , volume =

    Chen, Xi and Liu, Quanquan and Wang, Yining , title =. Management Science , volume =. 2022 , doi =

  39. [39]

    Proceedings of Thirty Fourth Conference on Learning Theory , series =

    Lattimore, Tor and Gyorgy, Andras , title =. Proceedings of Thirty Fourth Conference on Learning Theory , series =. 2021 , publisher =

  40. [40]

    Advances in Neural Information Processing Systems , volume =

    Shah, Virag and Johari, Ramesh and Blanchet, Jose , title =. Advances in Neural Information Processing Systems , volume =. 2019 , url =

  41. [41]

    2020 , publisher=

    Bandit algorithms , author=. 2020 , publisher=

  42. [42]

    Advances in Neural Information Processing Systems , volume=

    Eluder dimension and the sample complexity of optimistic exploration , author=. Advances in Neural Information Processing Systems , volume=

  43. [43]

    Operations Research , volume=

    Close the gaps: A learning-while-doing algorithm for single-product revenue management problems , author=. Operations Research , volume=. 2014 , publisher=

  44. [44]

    44th Annual IEEE Symposium on Foundations of Computer Science, 2003

    The value of knowing a demand curve: Bounds on regret for online posted-price auctions , author=. 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings. , pages=. 2003 , organization=

  45. [45]

    International Conference on Algorithmic Learning Theory , pages=

    Efficient local planning with linear function approximation , author=. International Conference on Algorithmic Learning Theory , pages=. 2022 , organization=

  46. [46]

    International Conference on Machine Learning , pages=

    Provably optimal algorithms for generalized linear contextual bandits , author=. International Conference on Machine Learning , pages=. 2017 , organization=

  47. [47]

    Management Science , volume=

    Multimodal dynamic pricing , author=. Management Science , volume=. 2021 , publisher=

  48. [48]

    Mathematics of Operations Research , volume=

    Nonparametric self-adjusting control for joint learning and optimization of multiproduct pricing with finite resource capacity , author=. Mathematics of Operations Research , volume=. 2019 , publisher=

  49. [49]

    International Conference on Artificial Intelligence and Statistics , pages=

    Smooth bandit optimization: generalization to holder space , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2021 , organization=

  50. [50]

    Operations Research , volume=

    Smoothness-adaptive contextual bandits , author=. Operations Research , volume=. 2022 , publisher=

  51. [51]

    Mathematics of Operations Research , volume=

    Bypassing the monster: A faster and simpler optimal algorithm for contextual bandits under realizability , author=. Mathematics of Operations Research , volume=. 2022 , publisher=

  52. [52]

    International Conference on Machine Learning , pages=

    Practical contextual bandits with regression oracles , author=. International Conference on Machine Learning , pages=. 2018 , organization=

  53. [53]

    Advances in Neural Information Processing Systems , volume=

    Stochastic convex optimization with bandit feedback , author=. Advances in Neural Information Processing Systems , volume=

  54. [54]

    Proceedings of the fourteenth international conference on artificial intelligence and statistics , pages=

    Improved regret guarantees for online smooth convex optimization with bandit feedback , author=. Proceedings of the fourteenth international conference on artificial intelligence and statistics , pages=. 2011 , organization=

  55. [55]

    Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms , pages=

    Online convex optimization in the bandit setting: Gradient descent without a gradient , author=. Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms , pages=

  56. [56]

    arXiv preprint arXiv:2502.05776 , year=

    Dynamic pricing in the linear valuation model using shape constraints , author=. arXiv preprint arXiv:2502.05776 , year=