Recognition: no theorem link
Regret Minimization in Bilateral Trade With Perturbed Markets
Pith reviewed 2026-05-12 04:23 UTC · model grok-4.3
The pith
An algorithm for repeated bilateral trade achieves regret that scales with the total adversarial corruption in an otherwise stochastic market.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In perturbed markets consisting of an underlying stochastic distribution of valuations subject to adversarial corruption of total magnitude C, there exists an algorithm that attains Õ(T^{3/4}) + O(C log T) regret against the best budget-balanced distribution over prices while preserving the Õ(T^{3/4}) worst-case regret bound relative to a per-round budget-balanced baseline.
What carries the argument
An adaptive algorithm that scales its regret contribution with the known corruption bound C.
If this is right
- When C is small the algorithm recovers performance close to the stochastic optimum that uses a distribution over prices.
- When C grows to the order of T the bound collapses to the known adversarial guarantee of Õ(T^{3/4}).
- The same procedure works without prior knowledge of the underlying stochastic distribution.
- Budget balance can be enforced in expectation while still adapting to the realized corruption level.
Where Pith is reading between the lines
- Markets whose corruption level can be estimated from past data could achieve higher long-run gains by tuning the algorithm to the observed C.
- The same scaling idea might extend to other online mechanism-design problems that mix stochastic and adversarial elements, such as dynamic pricing with occasional shocks.
- Simulations that vary C while holding the stochastic part fixed would show how much performance improves as corruption decreases.
Load-bearing premise
The total size of adversarial changes to the stochastic valuations is bounded by a known constant C.
What would settle it
Running the algorithm on a sequence whose total corruption exactly equals the assumed bound C and observing that the realized regret substantially exceeds Õ(T^{3/4}) + O(C log T) would disprove the stated guarantee.
read the original abstract
We address the problem of maximizing Gain from Trade (GFT) in repeated buyer-seller exchanges subject to global budget balance constraints. While this problem is well-understood in purely adversarial and stochastic settings, these environments exhibit a sharp dichotomy: adversarial environments allow for no-regret learning against the best fixed-price mechanism, whereas stochastic environments allow for no-regret learning against the best distribution over prices that is budget balanced in expectation. This gap is significant, as policies balanced in expectation can increase the GFT by a multiplicative factor of two. In this work, we bridge these extremes by studying perturbed markets, where an underlying stochastic distribution is subject to an adversarial corruption $C$. We design an algorithm that adaptively scales with the level of corruption, achieving an $\tilde{\mathcal{O}}(T^{3/4}) + \mathcal{O}(C\log(T))$ regret bound against the best budget-balanced distribution over prices. Simultaneously, our algorithm maintains the worst-case $\tilde{\mathcal{O}}(T^{3/4})$ regret bound relative to a per-round budget-balanced baseline, ensuring optimality even in fully adversarial environments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript studies regret minimization for maximizing Gain from Trade (GFT) in repeated bilateral trade subject to global budget-balance constraints. It considers perturbed markets consisting of an underlying stochastic distribution of buyer/seller valuations that is corrupted by an adversary whose total corruption is bounded by a known parameter C. The central contribution is an algorithm that achieves Õ(T^{3/4}) + O(C log T) regret against the best budget-balanced distribution over prices while simultaneously guaranteeing Õ(T^{3/4}) regret against the best per-round budget-balanced mechanism, thereby interpolating between the stochastic and fully adversarial regimes.
Significance. If the stated regret bounds hold, the work meaningfully bridges the sharp dichotomy between stochastic and adversarial settings that currently exists for budget-balanced bilateral trade. The adaptive scaling with C and the retention of the worst-case adversarial bound are both valuable; the former permits substantially higher GFT when corruption is modest, while the latter ensures robustness. The dual-regret formulation against externally defined baselines (best distribution and per-round mechanism) is internally consistent and avoids circularity.
major comments (2)
- [§4] §4 (Algorithm and Regret Analysis): The proof that the additive O(C log T) term arises from the corruption while preserving the Õ(T^{3/4}) stochastic term must be made fully explicit. In particular, it is unclear how the algorithm's adaptive scaling interacts with the global budget-balance constraint when the corruption affects the support of the valuation distributions; a concrete expansion of the decomposition used in the regret analysis (currently only sketched) is required to confirm the bound is load-bearing.
- [§3.2] §3.2 (Perturbed Market Model): The assumption that the total corruption magnitude C is known to the algorithm is used to tune the exploration/exploitation parameters. If this knowledge is relaxed to an unknown C, the claimed adaptivity no longer holds; the manuscript should either prove a version with unknown C or state the known-C assumption as a modeling limitation.
minor comments (2)
- [Preliminaries] The notation Õ(·) is used throughout but never formally defined; a single sentence in the preliminaries clarifying that it hides polylog(T) factors would improve readability.
- [Experiments] Figure 1 (regret curves) lacks error bars or confidence intervals; adding them would make the empirical validation of the O(C log T) scaling clearer.
Simulated Author's Rebuttal
We thank the referee for their careful reading, positive assessment of the contribution, and constructive suggestions. We address each major comment below.
read point-by-point responses
-
Referee: [§4] §4 (Algorithm and Regret Analysis): The proof that the additive O(C log T) term arises from the corruption while preserving the Õ(T^{3/4}) stochastic term must be made fully explicit. In particular, it is unclear how the algorithm's adaptive scaling interacts with the global budget-balance constraint when the corruption affects the support of the valuation distributions; a concrete expansion of the decomposition used in the regret analysis (currently only sketched) is required to confirm the bound is load-bearing.
Authors: We agree that the regret analysis in Section 4 is currently only sketched and would benefit from a fully explicit decomposition. In the revised manuscript we will expand the proof to separate the stochastic regret, the additive corruption term, and the interaction with the global budget-balance constraint. The expansion will explicitly bound how the adaptive scaling (tuned to the known C) controls the effect of corruption on the support of the valuation distributions, showing that the O(C log T) term is additive and does not inflate the Õ(T^{3/4}) term. This will make the load-bearing nature of the bound transparent. revision: yes
-
Referee: [§3.2] §3.2 (Perturbed Market Model): The assumption that the total corruption magnitude C is known to the algorithm is used to tune the exploration/exploitation parameters. If this knowledge is relaxed to an unknown C, the claimed adaptivity no longer holds; the manuscript should either prove a version with unknown C or state the known-C assumption as a modeling limitation.
Authors: The algorithm does rely on knowledge of C to set its exploration/exploitation parameters. Extending the result to unknown C would require additional machinery such as doubling tricks or online parameter tuning, which is outside the current scope and could degrade the bounds. In the revision we will therefore explicitly state the known-C assumption as a modeling limitation in Section 3.2, discuss its practical relevance, and identify the unknown-C case as an interesting direction for future work. revision: partial
Circularity Check
No significant circularity; derivation self-contained against external baselines
full rationale
The paper's central claims consist of an algorithm achieving stated regret bounds against two externally defined comparators: the best budget-balanced distribution over prices (a fixed but unknown stochastic optimum) and a per-round budget-balanced baseline (an adversarial worst-case mechanism). These targets are defined independently of the algorithm's outputs or fitted parameters, with the corruption bound C serving as an explicit input rather than a derived quantity. No equations reduce a claimed prediction to a fitted input by construction, no uniqueness theorems are imported via self-citation, and no ansatz is smuggled through prior work. The dual guarantees are internally consistent with the perturbed-market model but do not rely on re-labeling or self-referential definitions.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption There exists a best budget-balanced distribution over prices for the underlying stochastic market.
- domain assumption The total adversarial corruption is bounded by a parameter C.
Reference graph
Works this paper leans on
-
[1]
Liad Blumrosen and Yehonatan Mizrahi
doi: 10.1145/2600057.2602843. Liad Blumrosen and Yehonatan Mizrahi. Approximating gains-from-trade in bilateral trading. In WINE, volume 10123 ofLecture Notes in Computer Science, pages 400–413. Springer,
-
[2]
doi: 10.1007/978-3-662-54110-4
-
[3]
Johannes Brustle, Yang Cai, Fa Wu, and Mingfei Zhao
ISBN 9798400704864. Johannes Brustle, Yang Cai, Fa Wu, and Mingfei Zhao. Approximating gains from trade in two-sided markets via simple mechanisms. InProceedings of the 2017 ACM Conference on Economics and Computation, pages 589–590,
work page 2017
-
[4]
Matteo Castiglioni, Andrea Celli, and Christian Kroer. Online learning under budget and roi constraints via weak adaptivity.arXiv preprint arXiv:2302.01203,
-
[5]
The Sample Complexity of Uniform Approximation for Multi-Dimensional CDFs and Fixed-Price Mechanisms
Matteo Castiglioni, Anna Lunghi, and Alberto Marchesi. The sample complexity of uniform approximation for multi-dimensional cdfs and fixed-price mechanisms.arXiv preprint arXiv:2602.10868,
work page internal anchor Pith review Pith/arXiv arXiv
-
[6]
Tight regret bounds for fixed-price bilateral trade.arXiv preprint arXiv:2504.04349,
Houshuang Chen, Yaonan Jin, Pinyan Lu, and Chihao Zhang. Tight regret bounds for fixed-price bilateral trade.arXiv preprint arXiv:2504.04349,
-
[7]
Nonparametric contextual online bilateral trade
Emanuele Coccia, Martino Bernasconi, and Andrea Celli. Nonparametric contextual online bilateral trade. InThe Fourteenth International Conference on Learning Representations. Romain Cosson, Federico Fusco, Anupam Gupta, Stefano Leonardi, Renato Paes Leme, and Matteo Russo. Contextual online bilateral trade.arXiv preprint arXiv:2602.12903,
-
[8]
Subquadratic dynamic path reporting in directed graphs against an adaptive adversary , booktitle =
doi: 10.1145/3519935.3520054. Simone Di Gregorio, Paul D¨ utting, Federico Fusco, and Chris Schwiegelshohn. Nearly tight regret bounds for profit maximization in bilateral trade. In2025 IEEE 66th Annual Symposium on Foundations of Computer Science (FOCS), pages 1570–1594,
-
[9]
Tight pair query lower bounds for matching and earth mover’s distance
doi: 10.1109/FOCS63196.2025.00083. Yumou Fei. Improved approximation to first-best gains-from-trade. InInternational Conference on Web and Internet Economics, pages 204–218. Springer,
-
[10]
doi: 10.1007/978-3-031-22832-2
-
[11]
Tom´ aˇ s Koc´ ak, Gergely Neu, Michal Valko, and R´ emi Munos
doi: 10.1137/1.9781611977073.115. Tom´ aˇ s Koc´ ak, Gergely Neu, Michal Valko, and R´ emi Munos. Efficient learning by implicit exploration in bandit problems with side observations.Advances in Neural Information Processing Systems, 27,
-
[12]
doi: 10.1137/1.9781611978971.233
Anna Lunghi, Matteo Castiglioni, and Alberto Marchesi.Better Regret Rates in Bilateral Trade via Sublinear Budget Violation, pages 6494–6536. doi: 10.1137/1.9781611978971.233. URLhttps://epubs.siam.org/ doi/abs/10.1137/1.9781611978971.233. Anna Lunghi, Matteo Castiglioni, and Alberto Marchesi. Online two-sided markets: Many buyers enhance learning.arXiv p...
-
[13]
Anna Lunghi, Mattia Piccinato, Matteo Castiglioni, and Alberto Marchesi. A stronger benchmark for online bilateral trade: From fixed prices to distributions.arXiv preprint arXiv:2602.05681,
-
[14]
13 A Additional Related Works The study of GFT maximization in online bilateral trade was initiated by [Cesa-Bianchi et al., 2024a], who show that under one-bit feedback the problem is in general unlearnable. However, they showed that Strong Budget Balanced (SBB) mechanisms are learnable when the seller and buyer valuation distributions are independent an...
work page 2024
-
[15]
We can useCalso to bound the following quantity X t∈[T] E(p,q)∼π E (s,b)∼Lt [Rev(p, q, s, b)] − X t∈[T] E(p,q)∼π E (s,b)∼P [Rev(p, q, s, b)] ≤C, which means that E(p,q)∼π E (s,b)∼P [Rev(p, q, s, b)] ≥ −C T . Letλ t conditioned to the filtration up tot−1, be independent from the revenue at timetconditioned to the filtration up tot−1, i.e.λ t is independent...
work page 2015
-
[16]
! ≤ X (p,q)∈A ˆπt(p, q) (1−I(s t ≤U t ≤p, b t ≥q))I(q= ˆq t, Ht = 1)· · 1 α 2 P p′:(p,q)∈A ˆπt(p′, q) − 1 α 2 P p′:(p,q)∈A ˆπt(p′, q) +γ ! | {z } (⋆) and (⋆) can be further bounded as (⋆) = 1 α 2 P p′:(p,q)∈A ˆπt(p′, q) − 1 α 2 P p′:(p,q)∈A ˆπt(p′, q) +γ ! = 1 α 2 P p′:(p,q)∈A ˆπt(p′, q) α 2 P p′:(p,q)∈A ˆπt(p′, q) +γ− α 2 P p′:(p,q)∈A ˆπt(p′, q) α 2 P p′...
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.