Online Price Competition under Generalized Linear Demands

Cong Shi; Daniele Bracale; Moulinath Banerjee; Yuekai Sun

arxiv: 2511.10718 · v6 · submitted 2025-11-13 · 💻 cs.GT · math.ST· stat.ME· stat.TH

Online Price Competition under Generalized Linear Demands

Daniele Bracale , Moulinath Banerjee , Cong Shi , Yuekai Sun This is my paper

Pith reviewed 2026-05-17 21:50 UTC · model grok-4.3

classification 💻 cs.GT math.STstat.MEstat.TH

keywords online price competitiongeneralized linear modelsregret boundssingle index modeldecentralized learningdynamic pricingupper confidence bounds

0 comments

The pith

In competitive pricing, each seller can achieve optimal sqrt(T) regret without coordinating exploration using a new policy for generalized linear demands.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies sequential price setting by N sellers where each seller's demand follows a single-index model depending on the full price vector through a known increasing link function applied to an unknown linear combination. It introduces the PML-GLUCB policy that merges penalized maximum likelihood estimation with upper-confidence pricing to let sellers learn independently while observing only their own demands and rivals' prices. A sympathetic reader would care because this removes the requirement for joint exploration phases that earlier work needed, generalizes beyond linear demands, and handles both binary and continuous observations. If correct, sellers reach the same Õ(√T) regret rate known to be optimal for linear cases relative to a dynamic benchmark.

Core claim

Under the single-index demand model λ_i(p) = μ_i(<θ_{i,0}, p>) with known increasing μ_i, the PML-GLUCB policy enables each seller to achieve Õ(√T) regret against a dynamic benchmark without requiring coordinated front-loaded exploration phases across sellers, while generalizing prior linear-demand results and accommodating binary or real-valued demand observations.

What carries the argument

The PML-GLUCB policy that pairs penalized MLE for estimating the unknown parameter θ_{i,0} with an upper-confidence pricing rule to balance exploration and exploitation in the generalized linear demand setting.

If this is right

Each seller attains Õ(√T) regret independently relative to a dynamic benchmark.
The approach extends existing linear-demand methods to the broader single-index family.
The policy works for both binary and continuous demand realizations without modification.
Sellers no longer need synchronized exploration phases to reach the optimal regret rate.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar decentralized estimation-plus-confidence techniques could apply to other multi-agent settings where each player sees only private outcomes.
Markets with price-dependent nonlinear responses might adopt this structure to reduce coordination overhead in practice.
Performance could be tested by relaxing the known-link assumption to see how estimation error propagates.

Load-bearing premise

Each seller's demand is exactly an increasing function of the inner product between an unknown parameter vector and the vector of all sellers' prices.

What would settle it

A simulation or market experiment in which sellers following PML-GLUCB incur regret that grows faster than sqrt(T) over long horizons T under single-index demands with known links.

read the original abstract

We study a sequential price competition among $N$ sellers, each influenced by the pricing decisions of their rivals. Specifically, the demand function for each seller $i$ follows the single index model $\lambda_i(\mathbf p) = \mu_i(\langle \boldsymbol \theta_{i,0}, \mathbf p \rangle)$, with known increasing link $\mu_i$ and unknown parameter $\boldsymbol \theta_{i,0}$, where the vector $\mathbf{p}$ denotes the vector of prices offered by all the sellers simultaneously at a given instant. Each seller observes only their own realized demand - unobservable to competitors - and the prices set by rivals. We propose a novel decentralized policy, PML-GLUCB, that combines penalized MLE with an upper-confidence pricing rule. Our approach (i) \emph{removes the need for coordinated front-loaded exploration phases across sellers} - which is integral to previous models - making our method aligned with realistic market conditions; (ii) generalizes existing approaches that focus solely on linear demand models; (iii) accommodates both binary and real-valued demand observations. Relative to a dynamic benchmark policy, each seller achieves $\widetilde{O}(\sqrt{T})$ regret, which matches the optimal rate known in the linear setting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a decentralized PML-GLUCB policy for price competition under GLM demands that claims the same Õ(√T) regret as the linear case without coordinated exploration.

read the letter

The main point is a decentralized algorithm for sequential price competition where each seller's demand follows a single-index GLM. The PML-GLUCB policy combines penalized MLE with a UCB-style pricing rule and achieves Õ(√T) regret per seller against a dynamic benchmark, matching the linear-demand rate while dropping the front-loaded coordination that prior work required. It also handles both binary and real-valued observations, which widens the setting a bit beyond the usual linear models.

Referee Report

1 major / 2 minor

Summary. The manuscript studies sequential price competition among N sellers under generalized linear demand models, where each seller i has demand λ_i(p) = μ_i(⟨θ_{i,0}, p⟩) with known increasing link μ_i and unknown parameter θ_{i,0}. Sellers observe only their own realized demand and rivals' prices. The authors propose the decentralized PML-GLUCB policy, which combines penalized maximum likelihood estimation with an upper-confidence pricing rule. The policy eliminates coordinated front-loaded exploration and achieves Õ(√T) regret per seller relative to a dynamic benchmark, matching the optimal rate for linear demands while accommodating binary and real-valued observations.

Significance. If the regret analysis holds, the work is significant for generalizing competitive dynamic pricing results from linear to generalized linear demands while preserving the optimal Õ(√T) rate. The fully decentralized construction without coordinated exploration is a practical strength for realistic market settings. The approach also extends to both binary and continuous demand observations. Credit is due for the explicit removal of coordination requirements and the parameter-free flavor of the regret bound relative to the linear case.

major comments (1)

[Regret analysis / main theorem] The central Õ(√T) regret claim (abstract and main theorem) is load-bearing on the concentration of the penalized MLE estimator for θ_{i,0}. For GLMs the local Fisher information scales with μ'(⟨θ, p⟩) · p p^T. The model statement assumes only that μ_i is known and increasing; no uniform lower bound on μ' is stated. Without this, self-normalized concentration for the UCB bonus can fail when endogenous prices drive ⟨θ, p⟩ into regions where μ' ≈ 0, yielding linear regret on those trajectories. The analysis must either add the assumption inf μ' > 0 over the relevant compact price set or prove robustness to vanishing curvature.

minor comments (2)

[Abstract and Algorithm 1] The abstract refers to 'penalized MLE' without specifying the penalty form or tuning; this should be stated explicitly in the model or algorithm section for reproducibility.
[Model section] Notation for the joint price vector p and the inner product ⟨θ_i, p⟩ should clarify the dimension N and any compactness or boundedness assumptions on prices.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful and insightful comments on our manuscript. We address the major comment point by point below.

read point-by-point responses

Referee: The central Õ(√T) regret claim (abstract and main theorem) is load-bearing on the concentration of the penalized MLE estimator for θ_{i,0}. For GLMs the local Fisher information scales with μ'(⟨θ, p⟩) · p p^T. The model statement assumes only that μ_i is known and increasing; no uniform lower bound on μ' is stated. Without this, self-normalized concentration for the UCB bonus can fail when endogenous prices drive ⟨θ, p⟩ into regions where μ' ≈ 0, yielding linear regret on those trajectories. The analysis must either add the assumption inf μ' > 0 over the relevant compact price set or prove robustness to vanishing curvature.

Authors: We agree that the analysis relies on a uniform lower bound on μ' to ensure non-degenerate Fisher information and valid concentration bounds for the penalized MLE. The manuscript does not explicitly state this assumption, which is a valid point. In the revised manuscript, we will add the following assumption: there exists κ > 0 such that μ_i'(⟨θ_{i,0}, p⟩) ≥ κ for all p in the compact price set and all i. With this, the regret analysis goes through as before, yielding the claimed Õ(√T) bound. We note that this is standard for GLM bandits and pricing to avoid degenerate cases where the link is flat. revision: yes

Circularity Check

0 steps flagged

No circularity: regret bound derived independently from policy construction

full rationale

The paper constructs the decentralized PML-GLUCB policy using penalized MLE combined with an upper-confidence pricing rule for the single-index GLM demand model. The Õ(√T) regret claim relative to a dynamic benchmark is presented as following from this construction and matching the known linear rate, without any quoted reduction of the bound to a fitted parameter, self-citation chain, or definitional equivalence. The abstract explicitly notes removal of coordinated exploration and generalization beyond linear demands, indicating the derivation chain remains self-contained under the stated assumptions on the known increasing link function.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the single-index demand model with known link function and on the existence of a regret analysis for the proposed penalized MLE plus UCB rule; no free parameters or invented entities are explicitly quantified in the abstract.

axioms (1)

domain assumption Demand for seller i is given by the single-index model λ_i(p) = μ_i(<θ_{i,0}, p>) with known increasing link μ_i
Explicitly stated in the abstract as the demand model used throughout the work.

invented entities (1)

PML-GLUCB policy no independent evidence
purpose: Decentralized pricing algorithm combining penalized MLE and upper-confidence bound selection
New algorithm introduced to achieve the stated regret without coordinated exploration.

pith-pipeline@v0.9.0 · 5529 in / 1326 out tokens · 70414 ms · 2026-05-17T21:50:06.457085+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

demand function ... single index model λ_i(p) = μ_i(⟨θ_{i,0}, p⟩) with known increasing link μ_i

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Equilibrium and Pricing in Consumer Networks with Nonlinear Utilities: An Online Shape-Constrained Learning Approach
math.ST 2026-05 unverdicted novelty 7.0

The paper establishes equilibrium existence and uniqueness for nonlinear utility consumer networks under contraction conditions and proposes a shape-constrained isotonic regression approach with strict no-regret conve...

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages · cited by 1 Pith paper

[1]

[ by Lemma 2.11] ≤ 2(LΓ + 1)∥p(t) − p∗∥2

work page
[2]

(25) STEP 1: Bound for ∥p(t) − p∗∥2

work page
[3]

TX t=1 R(t) i # ≤ E

Consider the event G = ∩t∈[T] ∥bΓ (t) (p(t−1)) − Γ(p(t−1))∥2 2 ≤ C2σ(t−1)(δ) , (26) that has probability at least 1 − 2N δ by Proposition 4.2. Continuing from (25), we get that, on G, ∥p(t) − p∗∥2 = ∥bΓ (t) (p(t−1)) − p∗∥2 (⋆) ≤ ∥bΓ (t) (p(t−1)) − Γ(p(t−1))∥2 + ∥Γ(p(t−1)) − Γ(p∗)∥2 ≤ q C2σ(t−1)(δ) + LΓ∥p(t−1) − p∗∥2 ≤ q C2σ(t−1)(δ) + LΓ q C2σ(t−2)(δ) + LΓ...

work page 2020

[1] [1]

[ by Lemma 2.11] ≤ 2(LΓ + 1)∥p(t) − p∗∥2

work page

[2] [2]

(25) STEP 1: Bound for ∥p(t) − p∗∥2

work page

[3] [3]

TX t=1 R(t) i # ≤ E

Consider the event G = ∩t∈[T] ∥bΓ (t) (p(t−1)) − Γ(p(t−1))∥2 2 ≤ C2σ(t−1)(δ) , (26) that has probability at least 1 − 2N δ by Proposition 4.2. Continuing from (25), we get that, on G, ∥p(t) − p∗∥2 = ∥bΓ (t) (p(t−1)) − p∗∥2 (⋆) ≤ ∥bΓ (t) (p(t−1)) − Γ(p(t−1))∥2 + ∥Γ(p(t−1)) − Γ(p∗)∥2 ≤ q C2σ(t−1)(δ) + LΓ∥p(t−1) − p∗∥2 ≤ q C2σ(t−1)(δ) + LΓ q C2σ(t−2)(δ) + LΓ...

work page 2020