arxiv: 2605.10475 · v1 · submitted 2026-05-11 · 💻 cs.GT · cs.LG

Recognition: no theorem link

Regret Minimization in Bilateral Trade With Perturbed Markets

Alberto Marchesi, Anna Lunghi, Matteo Castiglioni

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:23 UTC · model grok-4.3

classification 💻 cs.GT cs.LG

keywords bilateral traderegret minimizationperturbed marketsbudget balancegain from tradeadversarial corruptiononline learning

0 comments

The pith

An algorithm for repeated bilateral trade achieves regret that scales with the total adversarial corruption in an otherwise stochastic market.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies repeated buyer-seller exchanges where valuations are drawn from a fixed stochastic distribution but can be altered by an adversary whose total changes are bounded by a known value C. It constructs an algorithm whose regret against the best budget-balanced distribution over prices is on the order of T to the three-fourths plus C times log T. The same algorithm simultaneously keeps the worst-case regret at T to the three-fourths against any per-round budget-balanced choice. This closes the gap between pure stochastic settings, where distribution-based mechanisms can double expected gain from trade, and pure adversarial settings, where only fixed prices are safe.

Core claim

In perturbed markets consisting of an underlying stochastic distribution of valuations subject to adversarial corruption of total magnitude C, there exists an algorithm that attains Õ(T^{3/4}) + O(C log T) regret against the best budget-balanced distribution over prices while preserving the Õ(T^{3/4}) worst-case regret bound relative to a per-round budget-balanced baseline.

What carries the argument

An adaptive algorithm that scales its regret contribution with the known corruption bound C.

If this is right

When C is small the algorithm recovers performance close to the stochastic optimum that uses a distribution over prices.
When C grows to the order of T the bound collapses to the known adversarial guarantee of Õ(T^{3/4}).
The same procedure works without prior knowledge of the underlying stochastic distribution.
Budget balance can be enforced in expectation while still adapting to the realized corruption level.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Markets whose corruption level can be estimated from past data could achieve higher long-run gains by tuning the algorithm to the observed C.
The same scaling idea might extend to other online mechanism-design problems that mix stochastic and adversarial elements, such as dynamic pricing with occasional shocks.
Simulations that vary C while holding the stochastic part fixed would show how much performance improves as corruption decreases.

Load-bearing premise

The total size of adversarial changes to the stochastic valuations is bounded by a known constant C.

What would settle it

Running the algorithm on a sequence whose total corruption exactly equals the assumed bound C and observing that the realized regret substantially exceeds Õ(T^{3/4}) + O(C log T) would disprove the stated guarantee.

read the original abstract

We address the problem of maximizing Gain from Trade (GFT) in repeated buyer-seller exchanges subject to global budget balance constraints. While this problem is well-understood in purely adversarial and stochastic settings, these environments exhibit a sharp dichotomy: adversarial environments allow for no-regret learning against the best fixed-price mechanism, whereas stochastic environments allow for no-regret learning against the best distribution over prices that is budget balanced in expectation. This gap is significant, as policies balanced in expectation can increase the GFT by a multiplicative factor of two. In this work, we bridge these extremes by studying perturbed markets, where an underlying stochastic distribution is subject to an adversarial corruption $C$. We design an algorithm that adaptively scales with the level of corruption, achieving an $\tilde{\mathcal{O}}(T^{3/4}) + \mathcal{O}(C\log(T))$ regret bound against the best budget-balanced distribution over prices. Simultaneously, our algorithm maintains the worst-case $\tilde{\mathcal{O}}(T^{3/4})$ regret bound relative to a per-round budget-balanced baseline, ensuring optimality even in fully adversarial environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a perturbed-market model for repeated bilateral trade that interpolates between stochastic and adversarial regimes via bounded corruption C, with an adaptive algorithm delivering dual regret guarantees.

read the letter

The main thing to know is that this work fills the gap between the two standard regimes in repeated bilateral trade. In pure stochastic settings you can compete with the best distribution over prices that is budget-balanced in expectation, while adversarial settings only allow competition with the best fixed per-round mechanism. The authors model the market as an underlying stochastic distribution plus total adversarial corruption of size C, then give an algorithm whose regret scales with C but never exceeds the adversarial bound. Specifically, they claim Õ(T^{3/4}) + O(C log T) regret against the best budget-balanced distribution and simultaneously Õ(T^{3/4}) against the per-round baseline. That dual guarantee is the useful part, because it lets you capture the multiplicative gain-from-trade improvement from expectation-balanced policies without sacrificing robustness when the corruption turns out large. The model itself appears new in this literature, and the adaptive scaling with C is a clean way to interpolate without introducing extra parameters. The abstract is internally consistent on these points. On the downside, the abstract supplies no proof sketch, key lemmas, or explicit assumptions about the price space or how budget balance is enforced under perturbation, so the actual soundness is hard to judge from what is visible. The T^{3/4} rate is standard for some online learning problems with constraints, but it would be good to see whether the corruption term requires new analysis or just plugs into existing techniques. Knowing C upfront is also a modeling choice that may need justification or an estimation procedure in the full paper. This is aimed at researchers working on no-regret learning for mechanisms with hard constraints like budget balance. Anyone thinking about repeated trade or auctions with partial stochasticity would find the model and the dual-bound approach worth reading. It deserves peer review because the gap it targets is real and the stated results are specific enough to be checked.

Referee Report

2 major / 2 minor

Summary. The manuscript studies regret minimization for maximizing Gain from Trade (GFT) in repeated bilateral trade subject to global budget-balance constraints. It considers perturbed markets consisting of an underlying stochastic distribution of buyer/seller valuations that is corrupted by an adversary whose total corruption is bounded by a known parameter C. The central contribution is an algorithm that achieves Õ(T^{3/4}) + O(C log T) regret against the best budget-balanced distribution over prices while simultaneously guaranteeing Õ(T^{3/4}) regret against the best per-round budget-balanced mechanism, thereby interpolating between the stochastic and fully adversarial regimes.

Significance. If the stated regret bounds hold, the work meaningfully bridges the sharp dichotomy between stochastic and adversarial settings that currently exists for budget-balanced bilateral trade. The adaptive scaling with C and the retention of the worst-case adversarial bound are both valuable; the former permits substantially higher GFT when corruption is modest, while the latter ensures robustness. The dual-regret formulation against externally defined baselines (best distribution and per-round mechanism) is internally consistent and avoids circularity.

major comments (2)

[§4] §4 (Algorithm and Regret Analysis): The proof that the additive O(C log T) term arises from the corruption while preserving the Õ(T^{3/4}) stochastic term must be made fully explicit. In particular, it is unclear how the algorithm's adaptive scaling interacts with the global budget-balance constraint when the corruption affects the support of the valuation distributions; a concrete expansion of the decomposition used in the regret analysis (currently only sketched) is required to confirm the bound is load-bearing.
[§3.2] §3.2 (Perturbed Market Model): The assumption that the total corruption magnitude C is known to the algorithm is used to tune the exploration/exploitation parameters. If this knowledge is relaxed to an unknown C, the claimed adaptivity no longer holds; the manuscript should either prove a version with unknown C or state the known-C assumption as a modeling limitation.

minor comments (2)

[Preliminaries] The notation Õ(·) is used throughout but never formally defined; a single sentence in the preliminaries clarifying that it hides polylog(T) factors would improve readability.
[Experiments] Figure 1 (regret curves) lacks error bars or confidence intervals; adding them would make the empirical validation of the O(C log T) scaling clearer.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading, positive assessment of the contribution, and constructive suggestions. We address each major comment below.

read point-by-point responses

Referee: [§4] §4 (Algorithm and Regret Analysis): The proof that the additive O(C log T) term arises from the corruption while preserving the Õ(T^{3/4}) stochastic term must be made fully explicit. In particular, it is unclear how the algorithm's adaptive scaling interacts with the global budget-balance constraint when the corruption affects the support of the valuation distributions; a concrete expansion of the decomposition used in the regret analysis (currently only sketched) is required to confirm the bound is load-bearing.

Authors: We agree that the regret analysis in Section 4 is currently only sketched and would benefit from a fully explicit decomposition. In the revised manuscript we will expand the proof to separate the stochastic regret, the additive corruption term, and the interaction with the global budget-balance constraint. The expansion will explicitly bound how the adaptive scaling (tuned to the known C) controls the effect of corruption on the support of the valuation distributions, showing that the O(C log T) term is additive and does not inflate the Õ(T^{3/4}) term. This will make the load-bearing nature of the bound transparent. revision: yes
Referee: [§3.2] §3.2 (Perturbed Market Model): The assumption that the total corruption magnitude C is known to the algorithm is used to tune the exploration/exploitation parameters. If this knowledge is relaxed to an unknown C, the claimed adaptivity no longer holds; the manuscript should either prove a version with unknown C or state the known-C assumption as a modeling limitation.

Authors: The algorithm does rely on knowledge of C to set its exploration/exploitation parameters. Extending the result to unknown C would require additional machinery such as doubling tricks or online parameter tuning, which is outside the current scope and could degrade the bounds. In the revision we will therefore explicitly state the known-C assumption as a modeling limitation in Section 3.2, discuss its practical relevance, and identify the unknown-C case as an interesting direction for future work. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained against external baselines

full rationale

The paper's central claims consist of an algorithm achieving stated regret bounds against two externally defined comparators: the best budget-balanced distribution over prices (a fixed but unknown stochastic optimum) and a per-round budget-balanced baseline (an adversarial worst-case mechanism). These targets are defined independently of the algorithm's outputs or fitted parameters, with the corruption bound C serving as an explicit input rather than a derived quantity. No equations reduce a claimed prediction to a fitted input by construction, no uniqueness theorems are imported via self-citation, and no ansatz is smuggled through prior work. The dual guarantees are internally consistent with the perturbed-market model but do not rely on re-labeling or self-referential definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard domain assumptions of online learning and mechanism design for bilateral trade; no free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption There exists a best budget-balanced distribution over prices for the underlying stochastic market.
This baseline is used for the main regret guarantee.
domain assumption The total adversarial corruption is bounded by a parameter C.
The algorithm's adaptive scaling depends on this bound.

pith-pipeline@v0.9.0 · 5494 in / 1302 out tokens · 38190 ms · 2026-05-12T04:23:03.485344+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 1 internal anchor

[1]

Liad Blumrosen and Yehonatan Mizrahi

doi: 10.1145/2600057.2602843. Liad Blumrosen and Yehonatan Mizrahi. Approximating gains-from-trade in bilateral trading. In WINE, volume 10123 ofLecture Notes in Computer Science, pages 400–413. Springer,

work page doi:10.1145/2600057.2602843
[2]

doi: 10.1007/978-3-662-54110-4

work page doi:10.1007/978-3-662-54110-4
[3]

Johannes Brustle, Yang Cai, Fa Wu, and Mingfei Zhao

ISBN 9798400704864. Johannes Brustle, Yang Cai, Fa Wu, and Mingfei Zhao. Approximating gains from trade in two-sided markets via simple mechanisms. InProceedings of the 2017 ACM Conference on Economics and Computation, pages 589–590,

work page 2017
[4]

Online learning under budget and roi constraints via weak adaptivity.arXiv preprint arXiv:2302.01203,

Matteo Castiglioni, Andrea Celli, and Christian Kroer. Online learning under budget and roi constraints via weak adaptivity.arXiv preprint arXiv:2302.01203,

work page arXiv
[5]

The Sample Complexity of Uniform Approximation for Multi-Dimensional CDFs and Fixed-Price Mechanisms

Matteo Castiglioni, Anna Lunghi, and Alberto Marchesi. The sample complexity of uniform approximation for multi-dimensional cdfs and fixed-price mechanisms.arXiv preprint arXiv:2602.10868,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

Tight regret bounds for fixed-price bilateral trade.arXiv preprint arXiv:2504.04349,

Houshuang Chen, Yaonan Jin, Pinyan Lu, and Chihao Zhang. Tight regret bounds for fixed-price bilateral trade.arXiv preprint arXiv:2504.04349,

work page arXiv
[7]

Nonparametric contextual online bilateral trade

Emanuele Coccia, Martino Bernasconi, and Andrea Celli. Nonparametric contextual online bilateral trade. InThe Fourteenth International Conference on Learning Representations. Romain Cosson, Federico Fusco, Anupam Gupta, Stefano Leonardi, Renato Paes Leme, and Matteo Russo. Contextual online bilateral trade.arXiv preprint arXiv:2602.12903,

work page arXiv
[8]

Subquadratic dynamic path reporting in directed graphs against an adaptive adversary , booktitle =

doi: 10.1145/3519935.3520054. Simone Di Gregorio, Paul D¨ utting, Federico Fusco, and Chris Schwiegelshohn. Nearly tight regret bounds for profit maximization in bilateral trade. In2025 IEEE 66th Annual Symposium on Foundations of Computer Science (FOCS), pages 1570–1594,

work page doi:10.1145/3519935.3520054
[9]

Tight pair query lower bounds for matching and earth mover’s distance

doi: 10.1109/FOCS63196.2025.00083. Yumou Fei. Improved approximation to first-best gains-from-trade. InInternational Conference on Web and Internet Economics, pages 204–218. Springer,

work page doi:10.1109/focs63196.2025.00083 2025
[10]

doi: 10.1007/978-3-031-22832-2

work page doi:10.1007/978-3-031-22832-2
[11]

Tom´ aˇ s Koc´ ak, Gergely Neu, Michal Valko, and R´ emi Munos

doi: 10.1137/1.9781611977073.115. Tom´ aˇ s Koc´ ak, Gergely Neu, Michal Valko, and R´ emi Munos. Efficient learning by implicit exploration in bandit problems with side observations.Advances in Neural Information Processing Systems, 27,

work page doi:10.1137/1.9781611977073.115
[12]

doi: 10.1137/1.9781611978971.233

Anna Lunghi, Matteo Castiglioni, and Alberto Marchesi.Better Regret Rates in Bilateral Trade via Sublinear Budget Violation, pages 6494–6536. doi: 10.1137/1.9781611978971.233. URLhttps://epubs.siam.org/ doi/abs/10.1137/1.9781611978971.233. Anna Lunghi, Matteo Castiglioni, and Alberto Marchesi. Online two-sided markets: Many buyers enhance learning.arXiv p...

work page doi:10.1137/1.9781611978971.233
[13]

A stronger benchmark for online bilateral trade: From fixed prices to distributions.arXiv preprint arXiv:2602.05681,

Anna Lunghi, Mattia Piccinato, Matteo Castiglioni, and Alberto Marchesi. A stronger benchmark for online bilateral trade: From fixed prices to distributions.arXiv preprint arXiv:2602.05681,

work page arXiv
[14]

However, they showed that Strong Budget Balanced (SBB) mechanisms are learnable when the seller and buyer valuation distributions are independent and admit bounded density

13 A Additional Related Works The study of GFT maximization in online bilateral trade was initiated by [Cesa-Bianchi et al., 2024a], who show that under one-bit feedback the problem is in general unlearnable. However, they showed that Strong Budget Balanced (SBB) mechanisms are learnable when the seller and buyer valuation distributions are independent an...

work page 2024
[15]

We can useCalso to bound the following quantity X t∈[T] E(p,q)∼π E (s,b)∼Lt [Rev(p, q, s, b)] − X t∈[T] E(p,q)∼π E (s,b)∼P [Rev(p, q, s, b)] ≤C, which means that E(p,q)∼π E (s,b)∼P [Rev(p, q, s, b)] ≥ −C T . Letλ t conditioned to the filtration up tot−1, be independent from the revenue at timetconditioned to the filtration up tot−1, i.e.λ t is independent...

work page 2015
[16]

! ≤ X (p,q)∈A ˆπt(p, q) (1−I(s t ≤U t ≤p, b t ≥q))I(q= ˆq t, Ht = 1)· · 1 α 2 P p′:(p,q)∈A ˆπt(p′, q) − 1 α 2 P p′:(p,q)∈A ˆπt(p′, q) +γ ! | {z } (⋆) and (⋆) can be further bounded as (⋆) = 1 α 2 P p′:(p,q)∈A ˆπt(p′, q) − 1 α 2 P p′:(p,q)∈A ˆπt(p′, q) +γ ! = 1 α 2 P p′:(p,q)∈A ˆπt(p′, q) α 2 P p′:(p,q)∈A ˆπt(p′, q) +γ− α 2 P p′:(p,q)∈A ˆπt(p′, q) α 2 P p′...

work page 2015