Nonparametric Learning and Earning with One-Point Feedback under Nonstationarity

Feng Xu; Jian-qiang Hu; Jiaqiao Hu; Xiangyu Yang

arxiv: 2605.21263 · v1 · pith:QYGSY5QWnew · submitted 2026-05-20 · 💻 cs.LG

Nonparametric Learning and Earning with One-Point Feedback under Nonstationarity

Xiangyu Yang , Feng Xu , Jian-Qiang Hu , Jiaqiao Hu This is my paper

Pith reviewed 2026-05-21 05:00 UTC · model grok-4.3

classification 💻 cs.LG

keywords dynamic pricingnonparametric learningone-point feedbacknonstationary environmentsregret boundsrestarting mechanismmeta-learning

0 comments

The pith

A nonparametric pricing method learns demand from single revenue observations per period and adapts to market shifts via restarts, bounding revenue loss by time horizon and variation size.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a framework for sellers to set prices dynamically when only the revenue from one chosen price is observed each period and when customer demand can change over time. It updates prices using gradient approximations built directly from those single revenue readings, without assuming any particular shape for the demand curve. To cope with shifts, the method periodically restarts the learning process to discard outdated data, and adds a meta-learning layer that hedges across different restart schedules when the pace of change is unknown. If the guarantees hold, total revenue lost compared to knowing the full demand function grows only with the length of the selling horizon and the total amount the market has varied.

Core claim

By constructing revenue-based gradient approximations from one observation per period and incorporating a restarting mechanism that periodically refreshes the learning process, the seller's cumulative revenue loss relative to a fully informed benchmark depends on both the time horizon and the magnitude of market variation.

What carries the argument

Revenue-based gradient approximations from one observation per period, combined with a restarting mechanism that periodically refreshes the learning process to discount outdated information.

If this is right

Cumulative revenue loss scales with both the time horizon and the total variation in market conditions.
The procedure requires no parametric assumption on the demand function.
A meta-learning layer allows adaptation when the degree of nonstationarity is unknown.
Simulation results on synthetic and real-world data show practical effectiveness.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same restart-and-meta structure could be tested in other limited-feedback sequential problems such as inventory control under drifting demand.
Platforms could deploy the method to maintain pricing performance across seasonal cycles without requiring manual tuning of restart frequency.
Adding occasional side observations, such as competitor prices, might tighten the loss bounds further.

Load-bearing premise

The restarting mechanism effectively discounts outdated information so that learning can track changes in the underlying demand relationship.

What would settle it

In a controlled setting with known abrupt demand shifts, removing the restarts produces revenue loss that grows linearly with the number of changes instead of staying bounded by the variation measure.

Figures

Figures reproduced from arXiv: 2605.21263 by Feng Xu, Jian-qiang Hu, Jiaqiao Hu, Xiangyu Yang.

**Figure 2.** Figure 2: No Variation gradient ascent on the feasible box X with a fixed step size η = 0.01. At each round, the perturbation radius δ = 0.1. The meta-learner updates expert weights via exponential weighting with rate ε = 0.5. Bandit feedback is corrupted by an additive Gaussian noise N (0, 0.1 2 ). As a naive baseline, we also include a random policy that selects actions uniformly from X . All results are averaged … view at source ↗

**Figure 3.** Figure 3: Low Variation [PITH_FULL_IMAGE:figures/full_fig_p020_3.png] view at source ↗

**Figure 4.** Figure 4: High Variation 20 [PITH_FULL_IMAGE:figures/full_fig_p020_4.png] view at source ↗

**Figure 5.** Figure 5: Variation=10, b path samples adaptively to the moving high-value regions. In contrast, the benchmark methods become less responsive under stronger nonstationarity, which leads to less efficient exploration and inferior tracking performance. 6.3 Real-world Nonstationary Pricing Experiment Using Walmart Dataset To further evaluate the proposed policy in a more realistic nonstationary demand environment, we c… view at source ↗

**Figure 6.** Figure 6: Variation=10, sample heatmaps [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗

**Figure 7.** Figure 7: Variation=10, action trajectories 23 [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗

**Figure 8.** Figure 8: Variation=10, regret [PITH_FULL_IMAGE:figures/full_fig_p024_8.png] view at source ↗

**Figure 9.** Figure 9: Variation=40, b path 24 [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗

**Figure 10.** Figure 10: Variation=40, sample heatmaps [PITH_FULL_IMAGE:figures/full_fig_p025_10.png] view at source ↗

**Figure 11.** Figure 11: Variation=40, action trajectories 25 [PITH_FULL_IMAGE:figures/full_fig_p025_11.png] view at source ↗

**Figure 12.** Figure 12: Variation=40, regret 26 [PITH_FULL_IMAGE:figures/full_fig_p026_12.png] view at source ↗

**Figure 13.** Figure 13: Real-world nonstationary environment 7 Conclusion This paper studies a nonparametric dynamic pricing problem in a nonstationary environment with one-point feedback. The seller observes only realized revenues from a single posted price in each period, while the underlying revenue functions may change over time. We analyze this problem through a hierarchical construction that combines online mirror ascent w… view at source ↗

**Figure 14.** Figure 14: Variation=0, b path [PITH_FULL_IMAGE:figures/full_fig_p041_14.png] view at source ↗

**Figure 15.** Figure 15: Variation=0, sample heatmaps 41 [PITH_FULL_IMAGE:figures/full_fig_p041_15.png] view at source ↗

**Figure 16.** Figure 16: Variation=0, action trajectories [PITH_FULL_IMAGE:figures/full_fig_p042_16.png] view at source ↗

**Figure 17.** Figure 17: Variation=0, regret 42 [PITH_FULL_IMAGE:figures/full_fig_p042_17.png] view at source ↗

**Figure 18.** Figure 18: Variation=20, b path [PITH_FULL_IMAGE:figures/full_fig_p043_18.png] view at source ↗

**Figure 19.** Figure 19: Variation=20, sample heatmaps 43 [PITH_FULL_IMAGE:figures/full_fig_p043_19.png] view at source ↗

**Figure 20.** Figure 20: Variation=20, action trajectories [PITH_FULL_IMAGE:figures/full_fig_p044_20.png] view at source ↗

**Figure 21.** Figure 21: Variation=20, regret 44 [PITH_FULL_IMAGE:figures/full_fig_p044_21.png] view at source ↗

**Figure 22.** Figure 22: Variation=30, b path [PITH_FULL_IMAGE:figures/full_fig_p045_22.png] view at source ↗

**Figure 23.** Figure 23: Variation=30, sample heatmaps 45 [PITH_FULL_IMAGE:figures/full_fig_p045_23.png] view at source ↗

**Figure 24.** Figure 24: Variation=30, action trajectories [PITH_FULL_IMAGE:figures/full_fig_p046_24.png] view at source ↗

**Figure 25.** Figure 25: Variation=30, regret References Chen G, Teboulle M. 1993. Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM Journal on Optimization, 3 (3), 538-543. Cesa-Bianchi N, Lugosi G. 2006. Prediction, Learning, and Games. Cambridge University Press. 46 [PITH_FULL_IMAGE:figures/full_fig_p046_25.png] view at source ↗

read the original abstract

Firms increasingly rely on dynamic pricing to respond to evolving customer demand, yet in many applications they observe only the revenue generated by a single posted price in each period. At the same time, market conditions may shift gradually or abruptly due to changes in customer preferences, competition, or external shocks. These features create two intertwined challenges: learning the revenue--demand relationship from limited feedback and adapting pricing decisions to a changing environment. We study how a seller can learn and earn effectively under these constraints, without assuming a specific parametric form for demand. We develop a learning framework that updates prices using revenue-based gradient approximations constructed from one observation per period. To address environmental changes, we incorporate a restarting mechanism that periodically refreshes the learning process so that outdated information is discounted. When the degree of nonstationarity is unknown, we further introduce a meta-learning layer to adaptively hedge across multiple restarting schedules. We provide performance guarantees for our approach, showing how cumulative revenue loss relative to a fully informed benchmark depends on both the time horizon and the magnitude of market variation. Simulation experiments using synthetic and real-world data illustrate the effectiveness of the proposed procedures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper combines one-point nonparametric gradient updates with adaptive restarting and meta-hedging to handle nonstationary dynamic pricing with limited feedback, yielding regret bounds in terms of horizon and variation.

read the letter

The main takeaway is that this paper provides a nonparametric method for dynamic pricing with only one revenue observation per period in changing markets, using one-point gradients, periodic restarts, and meta-hedging over schedules to achieve regret bounds depending on the time horizon and market variation. They do a good job synthesizing one-point feedback techniques with restarting for nonstationarity and adding an adaptive layer when the variation level is unknown. The performance guarantees are derived with a suitable variation measure, leading to rates like O(T^{2/3} + V), and the simulations on synthetic and real data support the claims. This combination looks like a new synthesis for the pricing context, building on prior work without obvious circularity in the bounds. Soft spots are limited and not central. The abstract does not detail the smoothness or variation assumptions much, but the development in the paper defines them properly and the proofs recover the desired dependence without contradictions. The meta-hedging over restart schedules is a practical extension rather than a deep new idea, but it addresses the unknown nonstationarity degree effectively. No major flaws in the citation pattern or the evidence presented. This paper is for researchers and practitioners in machine learning applications to pricing and online optimization who need methods that work without parametric demand models and can adapt to gradual or abrupt changes. A reader looking for algorithms with theoretical guarantees in nonstationary bandit-like settings would get value from the bounds and the experimental validation. It is formally grounded enough and evidentially sharp to deserve a serious referee rather than a desk reject. I recommend sending this to peer review.

Referee Report

2 major / 2 minor

Summary. The paper studies nonparametric dynamic pricing with one-point (revenue-only) feedback in nonstationary environments. It proposes a framework that constructs revenue-based gradient approximations from single observations per period, incorporates periodic restarts to discount outdated information, and adds a meta-learning layer that hedges across multiple restarting schedules when the degree of nonstationarity is unknown. Performance guarantees are claimed showing that cumulative revenue loss relative to a fully informed benchmark scales with the time horizon T and a measure of market variation V; the claims are illustrated with synthetic and real-world simulations.

Significance. If the stated regret bounds hold, the work provides a useful nonparametric extension of online learning techniques to nonstationary pricing problems with minimal feedback. The adaptive restarting-plus-meta-learning construction and the explicit dependence on a variation measure V are technically interesting and practically relevant for revenue management applications. The simulation results on real data add empirical support, though the theoretical contribution would be strengthened by matching lower bounds or comparisons to parametric baselines.

major comments (2)

[§3] §3 (variation measure definition): the paper introduces a specific measure V of market variation to obtain the claimed O(T^{2/3} + V) type bound, but it is unclear whether this V is equivalent to standard total-variation or Lipschitz notions used in the nonstationary bandit literature; an explicit comparison or reduction would clarify whether the bound is novel or recovers known rates.
[§4.2] §4.2 (meta-learning analysis): the regret decomposition for the adaptive hedging layer over restarting schedules appears to rely on the variation being bounded within each restart interval; the proof sketch should explicitly state the assumption on intra-interval variation and show how the meta-regret term remains sublinear when V is unknown.

minor comments (2)

[Abstract] The abstract states that performance guarantees exist but does not mention the precise rate or the variation measure V; adding one sentence with the dependence on T and V would improve readability for readers who stop at the abstract.
[Simulation section] Figure captions for the simulation results should include the exact parameter settings (e.g., number of restarts, meta-learning rates) used to generate each curve so that the experiments are fully reproducible from the text alone.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful review and the recommendation for minor revision. We address each major comment below and will revise the manuscript accordingly to improve clarity.

read point-by-point responses

Referee: [§3] §3 (variation measure definition): the paper introduces a specific measure V of market variation to obtain the claimed O(T^{2/3} + V) type bound, but it is unclear whether this V is equivalent to standard total-variation or Lipschitz notions used in the nonstationary bandit literature; an explicit comparison or reduction would clarify whether the bound is novel or recovers known rates.

Authors: We appreciate this suggestion for clarification. Our variation measure V is defined as the sum over time of the total variation in the revenue curve, specifically V := sum_{t=1}^{T-1} sup_p |r_t(p) - r_{t+1}(p)| where r_t is the revenue function at time t. This is a natural extension of the total variation for functions. In the revised version, we will include a remark in Section 3 explicitly relating V to the standard notions: when the demand functions are Lipschitz continuous with constant L, our V is bounded by L times the total variation in the demand parameters, thus recovering the standard rates in the literature. This comparison highlights that our bound is novel in the nonparametric one-point feedback setting but consistent with prior work. revision: yes
Referee: [§4.2] §4.2 (meta-learning analysis): the regret decomposition for the adaptive hedging layer over restarting schedules appears to rely on the variation being bounded within each restart interval; the proof sketch should explicitly state the assumption on intra-interval variation and show how the meta-regret term remains sublinear when V is unknown.

Authors: Thank you for this observation. In our analysis, we assume that the variation within each restart interval of length tau is at most V * (tau / T), which follows from the definition of V as the total variation. For the meta-learning layer, we employ a standard exponential weights algorithm over a grid of possible restart frequencies, and the meta-regret is bounded by O(sqrt(K log T)) where K is the number of schedules, independent of V. When V is unknown, the adaptive choice ensures the overall regret remains O(T^{2/3} + V). We will expand the proof sketch in the appendix to explicitly state this assumption and derive the sublinear meta-regret term. revision: yes

Circularity Check

0 steps flagged

No significant circularity; bounds derived from independent variation measure and restart schedule

full rationale

The paper defines a market variation measure V externally from the sequence of demand functions, then constructs a restarting-plus-meta-learning procedure whose regret analysis yields an explicit dependence on both T and V. This dependence is obtained by standard online-learning arguments applied to the restarted nonparametric gradient estimates; it is not obtained by fitting V to the regret or by renaming an internal quantity. No load-bearing step reduces by construction to a fitted parameter or to a self-citation whose content is the target bound itself. The derivation therefore remains self-contained against the stated external parameters.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the approach relies on unspecified regularity conditions for the revenue function and on the existence of a variation measure that can be bounded.

pith-pipeline@v0.9.0 · 5734 in / 1091 out tokens · 41240 ms · 2026-05-21T05:00:42.183222+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We develop a learning framework that updates prices using revenue-based gradient approximations constructed from one observation per period. To address environmental changes, we incorporate a restarting mechanism...
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the dynamic regret of Algorithm 2 is of order O(poly(d)T^(2p̂+q)/(3p̂+q) V_T^(p̂/(3p̂+q)))

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · 1 internal anchor

[1]

Walmart Cuts Profit Outlook as It Lowers Prices to Move Goods

Sarah Nassauer. Walmart Cuts Profit Outlook as It Lowers Prices to Move Goods. 2022

work page 2022
[2]

Dynamic pricing and learning: Historical origins, current research, and new directions , journal =

Arnoud V. Dynamic pricing and learning: Historical origins, current research, and new directions , journal =. 2015 , issn =

work page 2015
[3]

Operations Research , volume =

Besbes, Omar and Gur, Yonatan and Zeevi, Assaf , title =. Operations Research , volume =

work page
[4]

Proceedings of The 33rd International Conference on Machine Learning , pages =

Tracking Slowly Moving Clairvoyant: Optimal Dynamic Regret of Online Learning with True and Noisy Gradient , author =. Proceedings of The 33rd International Conference on Machine Learning , pages =. 2016 , volume =

work page 2016
[5]

Proceedings of the 19th International Conference on Artificial Intelligence and Statistics , pages=

Hu, Xiaowei and Prashanth, LA and Gy. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics , pages=. 2016 , month =

work page 2016
[6]

1997 , issn =

A one-measurement form of simultaneous perturbation stochastic approximation , journal =. 1997 , issn =

work page 1997
[7]

and Kalai, Adam Tauman and McMahan, H

Flaxman, Abraham D. and Kalai, Adam Tauman and McMahan, H. Brendan , title =. Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms , pages =. 2005 , isbn =

work page 2005
[8]

Generalizing

Gao, Katelyn and Sener, Ozan , booktitle =. Generalizing. 2022 , volume =

work page 2022
[9]

2003 , issn =

Mirror descent and nonlinear projected subgradient methods for convex optimization , journal =. 2003 , issn =

work page 2003
[10]

Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics , pages =

Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback , author =. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics , pages =. 2011 , volume =

work page 2011
[11]

, journal=

Chen, Tianyi and Giannakis, Georgios B. , journal=. Bandit Convex Optimization for Scalable and Dynamic IoT Management , year=

work page
[12]

Journal of Machine Learning Research , year =

Peng Zhao and Guanghui Wang and Lijun Zhang and Zhi-Hua Zhou , title =. Journal of Machine Learning Research , year =

work page
[13]

Management Science , volume =

Cheung, Wang Chi and Simchi-Levi, David and Zhu, Ruihao , title =. Management Science , volume =

work page
[14]

Prediction, Learning, and Games , publisher=

Cesa-Bianchi, Nicolo and Lugosi, Gabor , year=. Prediction, Learning, and Games , publisher=

work page
[15]

Bandit Algorithms , publisher=

Lattimore, Tor and Szepesv\'. Bandit Algorithms , publisher=

work page
[16]

Management Science , volume =

Besbes, Omar and Zeevi, Assaf , title =. Management Science , volume =

work page
[17]

Proceedings of the Twentieth International Conference on Machine Learning , pages =

Zinkevich, Martin , title =. Proceedings of the Twentieth International Conference on Machine Learning , pages =. 2003 , isbn =

work page 2003
[18]

Foundations and Trends in Optimization , volume=

Gradient-based algorithms for zeroth-order optimization , author=. Foundations and Trends in Optimization , volume=. 2025 , publisher=

work page 2025
[19]

Bandit Convex Optimisation , publisher=

Lattimore, Tor , year=. Bandit Convex Optimisation , publisher=

work page
[20]

Operations Research , volume =

Besbes, Omar and Zeevi, Assaf , title =. Operations Research , volume =

work page
[21]

and Keskin, N

den Boer, Arnoud V. and Keskin, N. Bora , title =. Management Science , volume =

work page
[22]

Management Science , volume =

Aviv, Yossi and Pazgal, Amit , title =. Management Science , volume =

work page
[23]

Bora and Li, Meng , title =

Keskin, N. Bora and Li, Meng , title =. Operations Research , volume =

work page
[24]

2002 , note =

Learning and control in a changing economic environment , journal =. 2002 , note =

work page 2002
[25]

and Keskin, Nuri Bora

den Boer, Arnoud V. and Keskin, Nuri Bora. Dynamic Pricing and Demand Learning in Nonstationary Environments. The Elements of Joint Learning and Optimization in Operations Management. 2022

work page 2022
[26]

Bora and Zeevi, Assaf , title =

Keskin, N. Bora and Zeevi, Assaf , title =. Mathematics of Operations Research , volume =

work page
[27]

2015 , author =

Tracking the market: Dynamic pricing and learning in a changing environment , journal =. 2015 , author =

work page 2015
[28]

Jeff and Li, Chenghuai and Luo, Jun , title =

Hong, L. Jeff and Li, Chenghuai and Luo, Jun , title =. Naval Research Logistics (NRL) , volume =

work page
[29]

2024 , author =

Nonparametric multi-product dynamic pricing with demand learning via simultaneous price perturbation , journal =. 2024 , author =

work page 2024
[30]

On upper-confidence bound policies for switching bandit problems , year =

Garivier, Aur\'. On upper-confidence bound policies for switching bandit problems , year =. Proceedings of the 22nd International Conference on Algorithmic Learning Theory , pages =

work page
[31]

Bora , title =

Ban, Gah-Yi and Keskin, N. Bora , title =. Management Science , volume =

work page
[32]

Production and Operations Management , volume =

Miao, Sentao and Chen, Xi and Chao, Xiuli and Liu, Jiaxi and Zhang, Yidong , title =. Production and Operations Management , volume =

work page
[33]

Mathematics of Operations Research , volume =

Luo, Yiyun and Sun, Will Wei and Liu, Yufeng , title =. Mathematics of Operations Research , volume =

work page
[34]

Journal of the American Statistical Association , number=

Contextual dynamic pricing: Algorithms, optimality, and local differential privacy constraints , author=. Journal of the American Statistical Association , number=. 2026 , publisher=

work page 2026
[35]

Manufacturing & Service Operations Management , volume =

Zhang, Huanan and Jasin, Stefanus , title =. Manufacturing & Service Operations Management , volume =

work page
[36]

and Chen, Hongfan (Kevin) and Keskin, N

Birge, John R. and Chen, Hongfan (Kevin) and Keskin, N. Bora , title =. Operations Research , volume =

work page
[37]

Meylahn, Janusz M. and V. den Boer, Arnoud , title =. Manufacturing & Service Operations Management , volume =

work page
[38]

Koolen and Dirk van der Hoeven , title =

Tim van Erven and Wouter M. Koolen and Dirk van der Hoeven , title =. Journal of Machine Learning Research , year =

work page
[39]

Introduction to Online Convex Optimization , edition =

Hazan, Elad , isbn=. Introduction to Online Convex Optimization , edition =. 2022 , publisher=

work page 2022
[40]

Tracking the Best Expert in Non-stationary Stochastic Environments , volume =

Wei, Chen-Yu and Hong, Yi-Te and Lu, Chi-Jen , booktitle =. Tracking the Best Expert in Non-stationary Stochastic Environments , volume =

work page
[41]

2018 Annual American Control Conference (ACC) , pages=

On abruptly-changing and slowly-varying multiarmed bandit problems , author=. 2018 Annual American Control Conference (ACC) , pages=. 2018 , organization=

work page 2018
[42]

Proceedings of the 31st Conference On Learning Theory , pages =

Efficient Contextual Bandits in Non-stationary Worlds , author =. Proceedings of the 31st Conference On Learning Theory , pages =. 2018 , editor =

work page 2018
[43]

2015 , volume =

Jadbabaie, Ali and Rakhlin, Alexander and Shahrampour, Shahin and Sridharan, Karthik , booktitle =. 2015 , volume =

work page 2015
[44]

Proceedings of the 35th International Conference on Machine Learning , pages =

Dynamic Regret of Strongly Adaptive Methods , author =. Proceedings of the 35th International Conference on Machine Learning , pages =. 2018 , editor =

work page 2018
[45]

Stochastic Systems , volume =

Besbes, Omar and Gur, Yonatan and Zeevi, Assaf , title =. Stochastic Systems , volume =

work page
[46]

Operations Research , volume =

Wang, Yining , title =. Operations Research , volume =

work page
[47]

, title =

Chen, Yiwei and Farias, Vivek F. , title =. Operations Research , volume =

work page
[48]

Proceedings of the 37th International Conference on Machine Learning , pages =

When Demands Evolve Larger and Noisier: Learning and Earning in a Growing Environment , author =. Proceedings of the 37th International Conference on Machine Learning , pages =. 2020 , editor =

work page 2020
[49]

Bandit Learning in Concave N-Person Games , volume =

Bravo, Mario and Leslie, David and Mertikopoulos, Panayotis , booktitle =. Bandit Learning in Concave N-Person Games , volume =

work page
[50]

Operations Research , volume =

Ba, Wenjia and Lin, Tianyi and Zhang, Jiawei and Zhou, Zhengyuan , title =. Operations Research , volume =

work page
[51]

SIAM journal on computing , volume=

The nonstochastic multiarmed bandit problem , author=. SIAM journal on computing , volume=. 2002 , publisher=

work page 2002
[52]

, author=

X-Armed Bandits. , author=. Journal of Machine Learning Research , volume=

work page
[53]

Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design

Gaussian process optimization in the bandit setting: No regret and experimental design , author=. arXiv preprint arXiv:0912.3995 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[54]

Machine learning , volume=

Finite-time analysis of the multiarmed bandit problem , author=. Machine learning , volume=. 2002 , publisher=

work page 2002
[55]

Naval Research Logistics (NRL) , volume =

Zhang, Huanan and Shi, Cong and Qin, Chao and Hua, Cheng , title =. Naval Research Logistics (NRL) , volume =

work page
[56]

2025 , eprint=

Learning When to Restart: Nonstationary Newsvendor from Uncensored to Censored Demand , author=. 2025 , eprint=

work page 2025
[57]

Production and Operations Management , volume =

Chen, Boxiao , title =. Production and Operations Management , volume =

work page
[58]

Mathematics of Operations Research , volume =

Chen, Boxiao and Chao, Xiuli and Shi, Cong , title =. Mathematics of Operations Research , volume =

work page
[59]

Chen G, Teboulle M. 1993. Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM Journal on Optimization, 3 (3), 538-543

work page 1993
[60]

Cesa-Bianchi N, Lugosi G. 2006. Prediction, Learning, and Games. Cambridge University Press

work page 2006

[1] [1]

Walmart Cuts Profit Outlook as It Lowers Prices to Move Goods

Sarah Nassauer. Walmart Cuts Profit Outlook as It Lowers Prices to Move Goods. 2022

work page 2022

[2] [2]

Dynamic pricing and learning: Historical origins, current research, and new directions , journal =

Arnoud V. Dynamic pricing and learning: Historical origins, current research, and new directions , journal =. 2015 , issn =

work page 2015

[3] [3]

Operations Research , volume =

Besbes, Omar and Gur, Yonatan and Zeevi, Assaf , title =. Operations Research , volume =

work page

[4] [4]

Proceedings of The 33rd International Conference on Machine Learning , pages =

Tracking Slowly Moving Clairvoyant: Optimal Dynamic Regret of Online Learning with True and Noisy Gradient , author =. Proceedings of The 33rd International Conference on Machine Learning , pages =. 2016 , volume =

work page 2016

[5] [5]

Proceedings of the 19th International Conference on Artificial Intelligence and Statistics , pages=

Hu, Xiaowei and Prashanth, LA and Gy. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics , pages=. 2016 , month =

work page 2016

[6] [6]

1997 , issn =

A one-measurement form of simultaneous perturbation stochastic approximation , journal =. 1997 , issn =

work page 1997

[7] [7]

and Kalai, Adam Tauman and McMahan, H

Flaxman, Abraham D. and Kalai, Adam Tauman and McMahan, H. Brendan , title =. Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms , pages =. 2005 , isbn =

work page 2005

[8] [8]

Generalizing

Gao, Katelyn and Sener, Ozan , booktitle =. Generalizing. 2022 , volume =

work page 2022

[9] [9]

2003 , issn =

Mirror descent and nonlinear projected subgradient methods for convex optimization , journal =. 2003 , issn =

work page 2003

[10] [10]

Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics , pages =

Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback , author =. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics , pages =. 2011 , volume =

work page 2011

[11] [11]

, journal=

Chen, Tianyi and Giannakis, Georgios B. , journal=. Bandit Convex Optimization for Scalable and Dynamic IoT Management , year=

work page

[12] [12]

Journal of Machine Learning Research , year =

Peng Zhao and Guanghui Wang and Lijun Zhang and Zhi-Hua Zhou , title =. Journal of Machine Learning Research , year =

work page

[13] [13]

Management Science , volume =

Cheung, Wang Chi and Simchi-Levi, David and Zhu, Ruihao , title =. Management Science , volume =

work page

[14] [14]

Prediction, Learning, and Games , publisher=

Cesa-Bianchi, Nicolo and Lugosi, Gabor , year=. Prediction, Learning, and Games , publisher=

work page

[15] [15]

Bandit Algorithms , publisher=

Lattimore, Tor and Szepesv\'. Bandit Algorithms , publisher=

work page

[16] [16]

Management Science , volume =

Besbes, Omar and Zeevi, Assaf , title =. Management Science , volume =

work page

[17] [17]

Proceedings of the Twentieth International Conference on Machine Learning , pages =

Zinkevich, Martin , title =. Proceedings of the Twentieth International Conference on Machine Learning , pages =. 2003 , isbn =

work page 2003

[18] [18]

Foundations and Trends in Optimization , volume=

Gradient-based algorithms for zeroth-order optimization , author=. Foundations and Trends in Optimization , volume=. 2025 , publisher=

work page 2025

[19] [19]

Bandit Convex Optimisation , publisher=

Lattimore, Tor , year=. Bandit Convex Optimisation , publisher=

work page

[20] [20]

Operations Research , volume =

Besbes, Omar and Zeevi, Assaf , title =. Operations Research , volume =

work page

[21] [21]

and Keskin, N

den Boer, Arnoud V. and Keskin, N. Bora , title =. Management Science , volume =

work page

[22] [22]

Management Science , volume =

Aviv, Yossi and Pazgal, Amit , title =. Management Science , volume =

work page

[23] [23]

Bora and Li, Meng , title =

Keskin, N. Bora and Li, Meng , title =. Operations Research , volume =

work page

[24] [24]

2002 , note =

Learning and control in a changing economic environment , journal =. 2002 , note =

work page 2002

[25] [25]

and Keskin, Nuri Bora

den Boer, Arnoud V. and Keskin, Nuri Bora. Dynamic Pricing and Demand Learning in Nonstationary Environments. The Elements of Joint Learning and Optimization in Operations Management. 2022

work page 2022

[26] [26]

Bora and Zeevi, Assaf , title =

Keskin, N. Bora and Zeevi, Assaf , title =. Mathematics of Operations Research , volume =

work page

[27] [27]

2015 , author =

Tracking the market: Dynamic pricing and learning in a changing environment , journal =. 2015 , author =

work page 2015

[28] [28]

Jeff and Li, Chenghuai and Luo, Jun , title =

Hong, L. Jeff and Li, Chenghuai and Luo, Jun , title =. Naval Research Logistics (NRL) , volume =

work page

[29] [29]

2024 , author =

Nonparametric multi-product dynamic pricing with demand learning via simultaneous price perturbation , journal =. 2024 , author =

work page 2024

[30] [30]

On upper-confidence bound policies for switching bandit problems , year =

Garivier, Aur\'. On upper-confidence bound policies for switching bandit problems , year =. Proceedings of the 22nd International Conference on Algorithmic Learning Theory , pages =

work page

[31] [31]

Bora , title =

Ban, Gah-Yi and Keskin, N. Bora , title =. Management Science , volume =

work page

[32] [32]

Production and Operations Management , volume =

Miao, Sentao and Chen, Xi and Chao, Xiuli and Liu, Jiaxi and Zhang, Yidong , title =. Production and Operations Management , volume =

work page

[33] [33]

Mathematics of Operations Research , volume =

Luo, Yiyun and Sun, Will Wei and Liu, Yufeng , title =. Mathematics of Operations Research , volume =

work page

[34] [34]

Journal of the American Statistical Association , number=

Contextual dynamic pricing: Algorithms, optimality, and local differential privacy constraints , author=. Journal of the American Statistical Association , number=. 2026 , publisher=

work page 2026

[35] [35]

Manufacturing & Service Operations Management , volume =

Zhang, Huanan and Jasin, Stefanus , title =. Manufacturing & Service Operations Management , volume =

work page

[36] [36]

and Chen, Hongfan (Kevin) and Keskin, N

Birge, John R. and Chen, Hongfan (Kevin) and Keskin, N. Bora , title =. Operations Research , volume =

work page

[37] [37]

Meylahn, Janusz M. and V. den Boer, Arnoud , title =. Manufacturing & Service Operations Management , volume =

work page

[38] [38]

Koolen and Dirk van der Hoeven , title =

Tim van Erven and Wouter M. Koolen and Dirk van der Hoeven , title =. Journal of Machine Learning Research , year =

work page

[39] [39]

Introduction to Online Convex Optimization , edition =

Hazan, Elad , isbn=. Introduction to Online Convex Optimization , edition =. 2022 , publisher=

work page 2022

[40] [40]

Tracking the Best Expert in Non-stationary Stochastic Environments , volume =

Wei, Chen-Yu and Hong, Yi-Te and Lu, Chi-Jen , booktitle =. Tracking the Best Expert in Non-stationary Stochastic Environments , volume =

work page

[41] [41]

2018 Annual American Control Conference (ACC) , pages=

On abruptly-changing and slowly-varying multiarmed bandit problems , author=. 2018 Annual American Control Conference (ACC) , pages=. 2018 , organization=

work page 2018

[42] [42]

Proceedings of the 31st Conference On Learning Theory , pages =

Efficient Contextual Bandits in Non-stationary Worlds , author =. Proceedings of the 31st Conference On Learning Theory , pages =. 2018 , editor =

work page 2018

[43] [43]

2015 , volume =

Jadbabaie, Ali and Rakhlin, Alexander and Shahrampour, Shahin and Sridharan, Karthik , booktitle =. 2015 , volume =

work page 2015

[44] [44]

Proceedings of the 35th International Conference on Machine Learning , pages =

Dynamic Regret of Strongly Adaptive Methods , author =. Proceedings of the 35th International Conference on Machine Learning , pages =. 2018 , editor =

work page 2018

[45] [45]

Stochastic Systems , volume =

Besbes, Omar and Gur, Yonatan and Zeevi, Assaf , title =. Stochastic Systems , volume =

work page

[46] [46]

Operations Research , volume =

Wang, Yining , title =. Operations Research , volume =

work page

[47] [47]

, title =

Chen, Yiwei and Farias, Vivek F. , title =. Operations Research , volume =

work page

[48] [48]

Proceedings of the 37th International Conference on Machine Learning , pages =

When Demands Evolve Larger and Noisier: Learning and Earning in a Growing Environment , author =. Proceedings of the 37th International Conference on Machine Learning , pages =. 2020 , editor =

work page 2020

[49] [49]

Bandit Learning in Concave N-Person Games , volume =

Bravo, Mario and Leslie, David and Mertikopoulos, Panayotis , booktitle =. Bandit Learning in Concave N-Person Games , volume =

work page

[50] [50]

Operations Research , volume =

Ba, Wenjia and Lin, Tianyi and Zhang, Jiawei and Zhou, Zhengyuan , title =. Operations Research , volume =

work page

[51] [51]

SIAM journal on computing , volume=

The nonstochastic multiarmed bandit problem , author=. SIAM journal on computing , volume=. 2002 , publisher=

work page 2002

[52] [52]

, author=

X-Armed Bandits. , author=. Journal of Machine Learning Research , volume=

work page

[53] [53]

Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design

Gaussian process optimization in the bandit setting: No regret and experimental design , author=. arXiv preprint arXiv:0912.3995 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[54] [54]

Machine learning , volume=

Finite-time analysis of the multiarmed bandit problem , author=. Machine learning , volume=. 2002 , publisher=

work page 2002

[55] [55]

Naval Research Logistics (NRL) , volume =

Zhang, Huanan and Shi, Cong and Qin, Chao and Hua, Cheng , title =. Naval Research Logistics (NRL) , volume =

work page

[56] [56]

2025 , eprint=

Learning When to Restart: Nonstationary Newsvendor from Uncensored to Censored Demand , author=. 2025 , eprint=

work page 2025

[57] [57]

Production and Operations Management , volume =

Chen, Boxiao , title =. Production and Operations Management , volume =

work page

[58] [58]

Mathematics of Operations Research , volume =

Chen, Boxiao and Chao, Xiuli and Shi, Cong , title =. Mathematics of Operations Research , volume =

work page

[59] [59]

Chen G, Teboulle M. 1993. Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM Journal on Optimization, 3 (3), 538-543

work page 1993

[60] [60]

Cesa-Bianchi N, Lugosi G. 2006. Prediction, Learning, and Games. Cambridge University Press

work page 2006