arxiv: 2605.04207 · v1 · submitted 2026-05-05 · 📊 stat.ME · econ.GN· q-fin.EC

Recognition: unknown

Optimal Semiparametric Dynamic Pricing with Feature Diversity

Jinhang Chai , Yaqi Duan , Jianqing Fan , Kaizheng Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-08 17:30 UTC · model grok-4.3

classification 📊 stat.ME econ.GNq-fin.EC

keywords dynamic pricingsemiparametric modelregret boundslocal polynomial regressionfeature diversitycontextual pricinggreedy algorithmnonparametric estimation

0 comments

The pith

The stagewise greedy pricing algorithm achieves optimal regret rates in semiparametric contextual dynamic pricing by iteratively estimating the unknown noise distribution using local polynomials on reused samples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper studies dynamic pricing where buyers purchase with probability 1 minus F of price minus a mean utility m of features and covariates. The authors develop a stagewise greedy algorithm that prices based on current estimates while refining the estimate of the unknown F through local polynomial regression. Feature diversity allows the algorithm to use data collected while exploiting current prices for this estimation, sidestepping the need for separate random exploration phases that prior approaches required. For cases where m is linear, the resulting regret is on the order of T to the max of 1/2 and 3 over 2 beta plus 1, with beta the smoothness of F, and this rate is proven optimal by a matching lower bound. When F is smooth enough that beta is at least 5/2, the regret reaches the ideal parametric rate of square root of T.

Core claim

In the semiparametric model where the purchase probability is 1 - F(p - m(x)), with F an unknown distribution and m a known class of functions, the stagewise greedy pricing algorithm iteratively refines the estimate of F via local polynomial regression while pricing greedily with the current estimate of m and F. By exploiting feature diversity, the algorithm reuses endogenous samples collected during exploitation for the nonparametric estimation, avoiding costly global random exploration. This yields a general regret bound applicable to any estimator of m, with explicit rates for linear, nonparametric additive, and sparse linear classes; for linear m the regret is T to the power max{1/2, 3/(

What carries the argument

The stagewise greedy pricing algorithm that alternates greedy pricing decisions with local polynomial regression updates to the market noise distribution F, made possible by the feature diversity condition that permits reuse of exploitation samples for estimation.

Load-bearing premise

The covariates and features must satisfy a diversity condition that ensures endogenous samples collected during greedy pricing are sufficient to estimate the noise distribution F nonparametrically without bias or extra exploration.

What would settle it

A controlled experiment or simulation in which the observed cumulative regret grows faster than T to the power of 3/(2 beta +1) for a linear utility function and smoothness beta =1 would indicate that the regret upper bound does not hold.

Figures

Figures reproduced from arXiv: 2605.04207 by Jianqing Fan, Jinhang Chai, Kaizheng Wang, Yaqi Duan.

**Figure 1.** Figure 1: Algorithmic diagram, where ϕ(u) = u− 1−F(u) F′(u) is a crutical component in optimal pricing. 4 view at source ↗

**Figure 2.** Figure 2: Regret exponent as a function of smoothness β ∈ [2, 5]. Lower is better. Our rate saturates at the parametric regime T 1/2 for β ≥ 2.5. Wang & Chen (2025) gives the same rate T 3/5 only at β = 2. Also define the left and right v-shrink points: lv[z, z] := z + v(z − z), rv[z, z] := z + (1 − v)(z − z). To avoid overburdening notation, we do not mark parameters with ⋆ for ground truth. 2 Stagewise Dynamic Pri… view at source ↗

**Figure 3.** Figure 3: Ground truth functions of the noise CDF, with smoothness β = 2, and 10 knots Evaluation For each horizon T and smoothness parameter β, we report the average cumulative regret over Ntrials = 200 independent runs. We plot the curves on a log–log scale and estimate the slope using linear regression. In addition, we apply the cluster bootstrap method (Cameron et al., 2008), performing 2,000 bootstrap refits t… view at source ↗

**Figure 4.** Figure 4: Known utility The x-axis stands for total time horizon, and the y-axis stands for total regret. The solid lines represent average regrets, and the shaded areas reflect the standard deviations across trials. Here, blue, orange, blue, red, purple, and brown components correspond to the cases with β = 2, 2.25, 2.5, 2.75, 3, 3.25 respectively. β slope Theory CI [ 2.00 0.595 0.600 [0.540, 0.649] 2.25 0.552 0.54… view at source ↗

**Figure 5.** Figure 5: Unknown utility. The solid lines represent average regrets, and the shaded areas reflect the standard deviations across trials. β slope Theory CI [ 2.00 0.582 0.600 [0.555, 0.609] 2.25 0.533 0.545 [0.499, 0.568] 2.50 0.512 0.500 [0.488, 0.536] 2.75 0.518 0.500 [0.494, 0.542] 3.00 0.516 0.500 [0.492, 0.540] 3.25 0.509 0.500 [0.486, 0.532] view at source ↗

**Figure 6.** Figure 6: Unknown utility. Regret comparison of ILPR and kernel-based policy (Fan et al., 2024)in simulation. Our ILPR achieves tremendous benefit. 4.2 Semi-real data We next construct a semi-real online pricing environment using the real data available at the INFORMS 2023 BSS Data Challenge Competition. We define a binary demand indicator that is equal to one if the daily units ordered are non-zero. To ensure relia… view at source ↗

**Figure 7.** Figure 7: Unknown utility. Regret comparison of ILPR and kernel-based policy (Fan et al., 2024) in a semi-real environment with real data. Our ILPR outperforms the kernelbased policy and DIP policy. For product-level performance comparison, view at source ↗

**Figure 8.** Figure 8: Histogram of regret improvement at T = 700. The improvement is huge across products. would be of interest to relax the feature diversity condition or develop robust variants under weaker support assumptions, especially since effective learning primarily relies on regions of the covariate space that are frequently visited. Finally, extending the framework to richer settings—such as more general demand model… view at source ↗

read the original abstract

We study contextual dynamic pricing under a semiparametric demand model in which the purchase probability is $1-F(p-m(\mathbf{x}))$, where $m(\mathbf{x})$ captures mean utility as a function of product features and buyer covariates, and $F$ is an unknown market-noise distribution. Existing methods either incur suboptimal regret or rely on restrictive structural assumptions. We propose a stagewise greedy pricing algorithm that iteratively refines the estimate of $F$ via local polynomial regression while pricing greedily with current estimates. By exploiting feature diversity, the algorithm reuses endogenous samples collected during exploitation for nonparametric estimation, avoiding costly global random exploration used in prior work. We establish a general regret bound that applies to any estimator $\hat m$ of the utility function, and derive explicit rates for linear, nonparametric additive, and sparse linear classes of $m$. For the linear class, our regret scales as $T^{\max\{1/2,\,3/(2\beta+1)\}}$, where $\beta$ is the smoothness of $F$ and $T$ is the time horizon. This improves the best known rates for semiparametric contextual pricing and achieves the parametric $\sqrt{T}$ rate when $\beta \ge 5/2$. We further prove a matching lower bound, showing the optimality of our rate, and present numerical experiments that corroborate the theory and demonstrate the practical advantages of iterative refinement.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gets a stagewise greedy algorithm for semiparametric contextual pricing that reuses endogenous data via feature diversity to hit T to the max of 1/2 and 3/(2 beta +1) regret for linear m, plus a matching lower bound.

read the letter

The core advance is the stagewise refinement: run greedy pricing with current estimates of m and F, then update the nonparametric estimate of F with local polynomials on the collected samples. Feature diversity lets them skip separate exploration rounds that earlier semiparametric work needed. They give a general regret bound that holds for any m-hat, then specialize to linear, additive, and sparse linear cases, and close with a lower bound that matches the upper bound when beta is not too large. That combination is cleaner than most prior rates in this area. The numerical experiments line up with the theory and show the iterative version beats non-refined baselines in practice. The math looks formally grounded on the surface, with explicit dependence on smoothness beta and no obvious circularity in the rate derivations. The main soft spot is the feature diversity condition itself. It has to guarantee enough spread in the realized (p - m(x)) values so the local polynomial gets adequate local mass in each stage; if the x-support or variance of m is marginal, the effective sample size for F drops and the 3/(2 beta +1) term can degrade. The paper formalizes the condition, but its strength relative to typical e-commerce feature sets is not obvious from the abstract alone. This work is for people already working on regret bounds for dynamic pricing or semiparametric online learning. Readers who care about explicit rates and lower bounds in revenue management will get direct value from the algorithm and the optimality result. It is worth sending to a serious referee because the upper-plus-lower-bound package is a clear step beyond existing semiparametric pricing papers, even if the diversity assumption needs close scrutiny in review.

Referee Report

2 major / 2 minor

Summary. The paper studies contextual dynamic pricing under the semiparametric demand model with purchase probability 1-F(p-m(x)), where m(x) is the mean utility and F is an unknown noise distribution. It proposes a stagewise greedy pricing algorithm that iteratively refines the estimate of F via local polynomial regression while pricing greedily, exploiting feature diversity to reuse endogenous samples and avoid global random exploration. The authors establish a general regret bound applicable to any estimator of m, derive explicit rates including T^{max{1/2, 3/(2β+1)}} for linear m (with β the smoothness of F), prove a matching lower bound establishing optimality, and present numerical experiments.

Significance. If the central claims hold, the work improves the best known regret rates for semiparametric contextual pricing and achieves the parametric √T rate when β ≥ 5/2. The matching lower bound provides an independent optimality check, and the feature-diversity mechanism for reusing endogenous data offers a practical advance over prior exploration-heavy approaches.

major comments (2)

[§2.3 (Feature Diversity Assumption)] §2.3 (Feature Diversity Assumption): The condition is load-bearing for the explicit rate in the linear case, as it must guarantee that the induced marginal on the argument of F has positive density and sufficient local mass at each stage so that local polynomial regression of F achieves the 3/(2β+1) rate without extra exploration cost. The current formulation does not explicitly quantify the minimal local sample size relative to the price grid or variance of m(x); if this mass is too small, the bias-variance tradeoff degrades and the general regret bound no longer yields the claimed T^{max{1/2, 3/(2β+1)}} rate.
[Theorem 4.1 and §4.2 (Stagewise Analysis)] Theorem 4.1 and §4.2 (Stagewise Analysis): The derivation of the explicit rate for linear m invokes the diversity condition to control the nonparametric estimation error of F across stages. A more precise accounting of how the greedy price updates affect the local density around each evaluation point of F, and how this interacts with the bandwidth choice in local polynomials, is needed to confirm that the max{1/2, 3/(2β+1)} term is attained without hidden logarithmic or constant factors that would alter the optimality claim.

minor comments (2)

[Abstract] Abstract: The description of the stagewise algorithm could briefly note how the iterative refinement of F interacts with the greedy pricing step to make the contribution clearer to readers.
[§6 (Numerical Experiments)] §6 (Numerical Experiments): Additional details on bandwidth selection for the local polynomial estimator and sensitivity to the feature diversity parameter would help reproducibility and illustrate robustness.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on the feature diversity assumption and the stagewise analysis. We address each point below and will incorporate the suggested clarifications into the revised manuscript.

read point-by-point responses

Referee: [§2.3 (Feature Diversity Assumption)] The condition is load-bearing for the explicit rate in the linear case, as it must guarantee that the induced marginal on the argument of F has positive density and sufficient local mass at each stage so that local polynomial regression of F achieves the 3/(2β+1) rate without extra exploration cost. The current formulation does not explicitly quantify the minimal local sample size relative to the price grid or variance of m(x); if this mass is too small, the bias-variance tradeoff degrades and the general regret bound no longer yields the claimed T^{max{1/2, 3/(2β+1)}} rate.

Authors: We agree that the implications of Assumption 2.3 can be made more explicit. The assumption ensures that the features are diverse enough for the induced distribution on p - m(x) to have density bounded away from zero over the relevant range, which guarantees that the local sample size for estimating F grows linearly with stage length. In the proof of Theorem 4.1 we invoke this lower bound to obtain the standard local-polynomial rate 3/(2β+1) without additional exploration. To address the referee’s concern we will add a remark immediately after Assumption 2.3 that quantifies the minimal local mass in terms of the diversity parameter and the bounded variance of m(x), and we will verify that this mass suffices for the bias-variance tradeoff used in the regret analysis. revision: yes
Referee: [Theorem 4.1 and §4.2 (Stagewise Analysis)] The derivation of the explicit rate for linear m invokes the diversity condition to control the nonparametric estimation error of F across stages. A more precise accounting of how the greedy price updates affect the local density around each evaluation point of F, and how this interacts with the bandwidth choice in local polynomials, is needed to confirm that the max{1/2, 3/(2β+1)} term is attained without hidden logarithmic or constant factors that would alter the optimality claim.

Authors: We appreciate the request for a more granular accounting. In §4.2 the proof proceeds by induction over stages: the estimation errors of m̂ and F̂ control the deviation of the greedy price from the myopic optimum, which in turn keeps sampled points inside a neighborhood where the density of the argument to F remains uniformly bounded below by a positive constant derived from Assumption 2.3. The bandwidth is set to h = n^{-1/(2β+1)} with n the cumulative sample size up to the current stage, and standard local-polynomial bounds are applied under this density lower bound. We will expand the appendix with an auxiliary lemma that explicitly tracks the evolution of the minimal local density across stages and shows that the resulting regret terms produce exactly the claimed T^{max{1/2, 3/(2β+1)}} rate with no extra logarithmic factors. This will also confirm that the matching lower bound remains valid. revision: yes

Circularity Check

0 steps flagged

No circularity: general bound + independent lower bound

full rationale

The derivation begins with a general regret bound that holds for arbitrary estimators of m(x) and then specializes the rate under explicit classes of m by invoking standard local-polynomial convergence under the stated feature-diversity assumption. The matching lower bound is established separately and does not rely on the upper-bound construction. No equation reduces a claimed prediction to a fitted parameter by definition, no load-bearing premise collapses to a self-citation, and the diversity condition is an external modeling assumption rather than a quantity defined in terms of the regret itself.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the semiparametric demand model and a feature diversity condition that enables consistent estimation from endogenous data. Smoothness of F is parameterized by beta but treated as given.

axioms (2)

domain assumption Purchase probability follows 1 - F(p - m(x)) with F unknown but beta-smooth and m belonging to linear, additive, or sparse linear classes.
Core modeling assumption stated in the abstract.
domain assumption Feature diversity is sufficient to allow nonparametric estimation of F from samples collected under greedy pricing without global random exploration.
Key technical assumption enabling the regret analysis and avoidance of prior exploration costs.

pith-pipeline@v0.9.0 · 5562 in / 1500 out tokens · 32872 ms · 2026-05-08T17:30:05.135352+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

80 extracted references · 15 canonical work pages

[1]

2020 , publisher=

Statistical foundations of data science , author=. 2020 , publisher=

2020
[2]

Journal of the American Statistical Association , volume=

Convexity, classification, and risk bounds , author=. Journal of the American Statistical Association , volume=. 2006 , publisher=

2006
[3]

Operations Research , volume=

Close the gaps: A learning-while-doing algorithm for single-product revenue management problems , author=. Operations Research , volume=. 2014 , publisher=

2014
[4]

The review of economics and statistics , volume=

Bootstrap-based improvements for inference with clustered errors , author=. The review of economics and statistics , volume=. 2008 , publisher=

2008
[5]

arXiv preprint arXiv:2502.05776 , year=

Dynamic Pricing in the Linear Valuation Model using Shape Constraints , author=. arXiv preprint arXiv:2502.05776 , year=

work page arXiv
[6]

Mathematics of Operations Research , volume=

Distribution-free contextual dynamic pricing , author=. Mathematics of Operations Research , volume=. 2024 , publisher=

2024
[7]

2020 , publisher=

Bandit algorithms , author=. 2020 , publisher=

2020
[8]

Management Science , volume=

Mostly exploration-free algorithms for contextual bandits , author=. Management Science , volume=. 2021 , publisher=

2021
[9]

Annals of Statistics , year=

Simultaneous analysis of Lasso and Dantzig selector , author=. Annals of Statistics , year=
[10]

2012 , journal=

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers , author=. 2012 , journal=

2012
[11]

Tsybakov , title =

Alexandre B. Tsybakov , title =. 2009 , publisher =

2009
[12]

2019 , publisher=

High-dimensional statistics: A non-asymptotic viewpoint , author=. 2019 , publisher=

2019
[13]

The Journal of Machine Learning Research , volume=

Minimax-optimal rates for sparse additive models over kernel classes via convex programming , author=. The Journal of Machine Learning Research , volume=. 2012 , publisher=

2012
[14]

Foundations and Trends

An introduction to matrix concentration inequalities , author=. Foundations and Trends. 2015 , publisher=

2015
[15]

Econometric theory , volume=

Nonparametric instrumental regression with errors in variables , author=. Econometric theory , volume=. 2018 , publisher=

2018
[16]

The Annals of Statistics , pages=

Nonparametric regression with errors in variables , author=. The Annals of Statistics , pages=. 1993 , publisher=

1993
[17]

Available at SSRN 5133677 , year=

Tight Regret Bounds in Contextual Pricing with Semi-parametric Demand Learning , author=. Available at SSRN 5133677 , year=
[18]

Fan, Jianqing and Gijbels, Irene , title =
[19]

arXiv preprint arXiv:2110.01602 , year=

Clustering a mixture of gaussians with unknown covariance , author=. arXiv preprint arXiv:2110.01602 , year=

work page arXiv
[20]

Introduction to the non-asymptotic analysis of random matrices

Introduction to the non-asymptotic analysis of random matrices , author=. arXiv preprint arXiv:1011.3027 , year=

work page Pith review arXiv
[21]

arXiv preprint arXiv:2302.10160 , year=

Pseudo-labeling for Kernel Ridge Regression under Covariate Shift , author=. arXiv preprint arXiv:2302.10160 , year=

work page arXiv
[22]

arXiv preprint arXiv:2111.08221 , year=

Fairness-aware online price discrimination with nonparametric demand models , author=. arXiv preprint arXiv:2111.08221 , year=

work page arXiv
[23]

International Conference on Artificial Intelligence and Statistics , pages=

Doubly Fair Dynamic Pricing , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2023 , organization=

2023
[24]

Production and Operations Management (Forthcoming) , year=

Network revenue management with demand learning and fair resource-consumption balancing , author=. Production and Operations Management (Forthcoming) , year=
[25]

International Conference on Machine Learning , pages=

Regularized online allocation problems: Fairness and beyond , author=. International Conference on Machine Learning , pages=. 2021 , organization=

2021
[26]

Operations research , volume=

Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies , author=. Operations research , volume=. 2014 , publisher=

2014
[27]

Management Science , volume=

Personalized dynamic pricing with machine learning: High-dimensional features and heterogeneous elasticity , author=. Management Science , volume=. 2021 , publisher=

2021
[28]

Management Science , volume=

Multimodal dynamic pricing , author=. Management Science , volume=. 2021 , publisher=

2021
[29]

Management Science , volume=

Stochastic optimization forests , author=. Management Science , volume=. 2023 , publisher=

2023
[30]

Available at SSRN 3930622 , year=

Dynamic pricing with fairness constraints , author=. Available at SSRN 3930622 , year=
[31]

Conference on learning theory , pages=

Non-stationary reinforcement learning without prior knowledge: An optimal black-box approach , author=. Conference on learning theory , pages=. 2021 , organization=

2021
[32]

Temporal Fairness in Learning and Earning: Price Protection Guarantee and Phase Transitions: Feng, Qing| uZhu, Ruihao| uJasin, Stefanus , year=

Temporal Fairness in Learning and Earning: Price Protection Guarantee and Phase Transitions , author=. Temporal Fairness in Learning and Earning: Price Protection Guarantee and Phase Transitions: Feng, Qing| uZhu, Ruihao| uJasin, Stefanus , year=
[33]

arXiv preprint arXiv:2010.02521 , year=

Doubly robust covariate shift regression with semi-nonparametric nuisance models , author=. arXiv preprint arXiv:2010.02521 , year=

work page arXiv 2010
[34]

arXiv preprint arXiv:2208.05134 , year=

Doubly robust augmented model accuracy transfer inference with high dimensional features , author=. arXiv preprint arXiv:2208.05134 , year=

work page arXiv
[35]

arXiv preprint arXiv:2209.04977 , year=

Semi-supervised Triply Robust Inductive Transfer Learning , author=. arXiv preprint arXiv:2209.04977 , year=

work page arXiv
[36]

Journal of the American Statistical Association , volume=

Embracing the blessing of dimensionality in factor models , author=. Journal of the American Statistical Association , volume=. 2018 , publisher=

2018
[37]

2023 , eprint=

Utility Fairness in Contextual Dynamic Pricing with Demand Learning , author=. 2023 , eprint=

2023
[38]

International Conference on Machine Learning , pages=

Online pricing with offline data: Phase transition and inverse square law , author=. International Conference on Machine Learning , pages=. 2020 , organization=

2020
[39]

The Annals of Statistics , volume=

Transfer learning for contextual multi-armed bandits , author=. The Annals of Statistics , volume=. 2024 , publisher=

2024
[40]

Advances in neural information processing systems , volume=

Improved algorithms for linear stochastic bandits , author=. Advances in neural information processing systems , volume=
[41]

Applications of the van Trees inequality: a Bayesian Cram

Gill, Richard D and Levit, Boris Y , journal=. Applications of the van Trees inequality: a Bayesian Cram. 1995 , publisher=

1995
[42]

Surveys in operations research and management science , volume=

Dynamic pricing and learning: historical origins, current research, and new directions , author=. Surveys in operations research and management science , volume=. 2015 , publisher=

2015
[43]

Management Science , volume=

A statistical learning approach to personalization in revenue management , author=. Management Science , volume=. 2022 , publisher=

2022
[44]

Available at SSRN 4140550 , year=

Context-based dynamic pricing with separable demand models , author=. Available at SSRN 4140550 , year=
[45]

Management Science , volume=

On the (surprising) sufficiency of linear models for dynamic pricing with demand learning , author=. Management Science , volume=. 2015 , publisher=

2015
[46]

International Conference on Artificial Intelligence and Statistics , pages=

Towards agnostic feature-based dynamic pricing: Linear policies vs linear valuation with unknown noise , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2022 , organization=

2022
[47]

Operations Research , volume=

Nonparametric pricing analytics with customer covariates , author=. Operations Research , volume=. 2021 , publisher=

2021
[48]

arXiv preprint arXiv:1604.07463 , year=

Dynamic pricing with demand covariates , author=. arXiv preprint arXiv:1604.07463 , year=

work page arXiv
[49]

Journal of the American Statistical Association , pages=

Policy optimization using semiparametric models for dynamic pricing , author=. Journal of the American Statistical Association , pages=. 2024 , publisher=

2024
[50]

Operations Research , volume=

Dynamic pricing under a general parametric choice model , author=. Operations Research , volume=. 2012 , publisher=

2012
[51]

Advances in neural information processing systems , volume=

A smoothed analysis of the greedy algorithm for the linear contextual bandit problem , author=. Advances in neural information processing systems , volume=
[52]

CoRR , volume=

Practical Evaluation and Optimization of Contextual Bandit Algorithms , author=. CoRR , volume=
[53]

International Conference on Artificial Intelligence and Statistics , pages=

Stochastic linear contextual bandits with diverse contexts , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2020 , organization=

2020
[54]

1990 , institution=

Efficient memory-based learning for robot control , author=. 1990 , institution=

1990
[55]

arXiv preprint arXiv:2109.00157 , year=

A survey of exploration methods in reinforcement learning , author=. arXiv preprint arXiv:2109.00157 , year=

work page arXiv
[56]

Journal of Machine Learning Research , volume=

Dynamic pricing in high-dimensions , author=. Journal of Machine Learning Research , volume=
[57]

Management Science , volume=

Dynamic pricing with demand learning and reference effects , author=. Management Science , volume=. 2022 , publisher=

2022
[58]

Management Science , volume=

Meta dynamic pricing: Transfer learning across experiments , author=. Management Science , volume=. 2022 , publisher=

2022
[59]

2018 , publisher=

High-dimensional probability: An introduction with applications in data science , author=. 2018 , publisher=

2018
[60]

Randomized sketches for kernels: Fast and optimal nonparametric regression , author=
[61]

Wainwright , TITLE =

Yaqi Duan and Mengdi Wang and Martin J. Wainwright , TITLE =. Ann. Statist. , FJOURNAL =. 2024 , VOLUME =. doi:10.1214/24-AOS2399 , SICI =

work page doi:10.1214/24-aos2399 2024
[62]

arXiv preprint arXiv:2211.03899 , year=

Policy evaluation from a single path: Multi-step methods, mixing and mis-specification , author=. arXiv preprint arXiv:2211.03899 , year=

work page arXiv
[63]

Annals of Statistics , pages=

Local Rademacher Complexities , author=. Annals of Statistics , pages=. 2005 , publisher=

2005
[64]

Journal of Machine Learning Research , volume=

Using confidence bounds for exploitation-exploration trade-offs , author=. Journal of Machine Learning Research , volume=
[65]

Advances in neural information processing systems , volume=

Bounded regret for finite-armed structured bandits , author=. Advances in neural information processing systems , volume=
[66]

Mobile health: sensors, analytic methods, and applications , pages=

From ads to interventions: Contextual bandits in mobile health , author=. Mobile health: sensors, analytic methods, and applications , pages=. 2017 , publisher=

2017
[67]

Management Science , volume=

Feature-based dynamic pricing , author=. Management Science , volume=. 2020 , publisher=

2020
[68]

Operations Research , volume=

Nonstationary bandits with habituation and recovery dynamics , author=. Operations Research , volume=. 2020 , publisher=

2020
[69]

Manufacturing & Service Operations Management , volume=

Learning personalized product recommendations with customer disengagement , author=. Manufacturing & Service Operations Management , volume=. 2022 , publisher=

2022
[70]

Operations Research , volume=

MNL-bandit: A dynamic learning approach to assortment selection , author=. Operations Research , volume=. 2019 , publisher=

2019
[71]

Operations Research , volume=

Online decision making with high-dimensional covariates , author=. Operations Research , volume=. 2020 , publisher=

2020
[72]

Management Science , volume=

Dynamic learning and pricing with model misspecification , author=. Management Science , volume=. 2019 , publisher=

2019
[73]

Mathematics of Operations Research , volume=

A primal--dual learning algorithm for personalized dynamic pricing with an inventory constraint , author=. Mathematics of Operations Research , volume=. 2022 , publisher=

2022
[74]

arXiv preprint arXiv:2303.07570 , year=

High-Dimensional Dynamic Pricing under Non-Stationarity: Learning and Earning with Change-Point Detection , author=. arXiv preprint arXiv:2303.07570 , year=

work page arXiv
[75]

Proceedings of IEEE 36th annual foundations of computer science , pages=

Gambling in a rigged casino: The adversarial multi-armed bandit problem , author=. Proceedings of IEEE 36th annual foundations of computer science , pages=. 1995 , organization=

1995
[76]

arXiv preprint arXiv:1803.06971 , year=

What doubling tricks can and can't do for multi-armed bandits , author=. arXiv preprint arXiv:1803.06971 , year=

work page arXiv
[77]

Journal of the American Statistical Association , volume=

Are latent factor regression and sparse regression adequate? , author=. Journal of the American Statistical Association , volume=. 2024 , publisher=

2024
[78]

Journal of Econometrics , volume=

Factor-adjusted regularized model selection , author=. Journal of Econometrics , volume=. 2020 , publisher=

2020
[79]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Large covariance estimation by thresholding principal orthogonal complements , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2013 , publisher=

2013
[80]

arXiv preprint arXiv:2412.19252 , year=

Localized exploration in contextual dynamic pricing achieves dimension-free regret , author=. arXiv preprint arXiv:2412.19252 , year=

work page arXiv