pith. machine review for the scientific record. sign in

arxiv: 2605.10475 · v1 · submitted 2026-05-11 · 💻 cs.GT · cs.LG

Recognition: no theorem link

Regret Minimization in Bilateral Trade With Perturbed Markets

Alberto Marchesi, Anna Lunghi, Matteo Castiglioni

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:23 UTC · model grok-4.3

classification 💻 cs.GT cs.LG
keywords bilateral traderegret minimizationperturbed marketsbudget balancegain from tradeadversarial corruptiononline learning
0
0 comments X

The pith

An algorithm for repeated bilateral trade achieves regret that scales with the total adversarial corruption in an otherwise stochastic market.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies repeated buyer-seller exchanges where valuations are drawn from a fixed stochastic distribution but can be altered by an adversary whose total changes are bounded by a known value C. It constructs an algorithm whose regret against the best budget-balanced distribution over prices is on the order of T to the three-fourths plus C times log T. The same algorithm simultaneously keeps the worst-case regret at T to the three-fourths against any per-round budget-balanced choice. This closes the gap between pure stochastic settings, where distribution-based mechanisms can double expected gain from trade, and pure adversarial settings, where only fixed prices are safe.

Core claim

In perturbed markets consisting of an underlying stochastic distribution of valuations subject to adversarial corruption of total magnitude C, there exists an algorithm that attains Õ(T^{3/4}) + O(C log T) regret against the best budget-balanced distribution over prices while preserving the Õ(T^{3/4}) worst-case regret bound relative to a per-round budget-balanced baseline.

What carries the argument

An adaptive algorithm that scales its regret contribution with the known corruption bound C.

If this is right

  • When C is small the algorithm recovers performance close to the stochastic optimum that uses a distribution over prices.
  • When C grows to the order of T the bound collapses to the known adversarial guarantee of Õ(T^{3/4}).
  • The same procedure works without prior knowledge of the underlying stochastic distribution.
  • Budget balance can be enforced in expectation while still adapting to the realized corruption level.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Markets whose corruption level can be estimated from past data could achieve higher long-run gains by tuning the algorithm to the observed C.
  • The same scaling idea might extend to other online mechanism-design problems that mix stochastic and adversarial elements, such as dynamic pricing with occasional shocks.
  • Simulations that vary C while holding the stochastic part fixed would show how much performance improves as corruption decreases.

Load-bearing premise

The total size of adversarial changes to the stochastic valuations is bounded by a known constant C.

What would settle it

Running the algorithm on a sequence whose total corruption exactly equals the assumed bound C and observing that the realized regret substantially exceeds Õ(T^{3/4}) + O(C log T) would disprove the stated guarantee.

read the original abstract

We address the problem of maximizing Gain from Trade (GFT) in repeated buyer-seller exchanges subject to global budget balance constraints. While this problem is well-understood in purely adversarial and stochastic settings, these environments exhibit a sharp dichotomy: adversarial environments allow for no-regret learning against the best fixed-price mechanism, whereas stochastic environments allow for no-regret learning against the best distribution over prices that is budget balanced in expectation. This gap is significant, as policies balanced in expectation can increase the GFT by a multiplicative factor of two. In this work, we bridge these extremes by studying perturbed markets, where an underlying stochastic distribution is subject to an adversarial corruption $C$. We design an algorithm that adaptively scales with the level of corruption, achieving an $\tilde{\mathcal{O}}(T^{3/4}) + \mathcal{O}(C\log(T))$ regret bound against the best budget-balanced distribution over prices. Simultaneously, our algorithm maintains the worst-case $\tilde{\mathcal{O}}(T^{3/4})$ regret bound relative to a per-round budget-balanced baseline, ensuring optimality even in fully adversarial environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript studies regret minimization for maximizing Gain from Trade (GFT) in repeated bilateral trade subject to global budget-balance constraints. It considers perturbed markets consisting of an underlying stochastic distribution of buyer/seller valuations that is corrupted by an adversary whose total corruption is bounded by a known parameter C. The central contribution is an algorithm that achieves Õ(T^{3/4}) + O(C log T) regret against the best budget-balanced distribution over prices while simultaneously guaranteeing Õ(T^{3/4}) regret against the best per-round budget-balanced mechanism, thereby interpolating between the stochastic and fully adversarial regimes.

Significance. If the stated regret bounds hold, the work meaningfully bridges the sharp dichotomy between stochastic and adversarial settings that currently exists for budget-balanced bilateral trade. The adaptive scaling with C and the retention of the worst-case adversarial bound are both valuable; the former permits substantially higher GFT when corruption is modest, while the latter ensures robustness. The dual-regret formulation against externally defined baselines (best distribution and per-round mechanism) is internally consistent and avoids circularity.

major comments (2)
  1. [§4] §4 (Algorithm and Regret Analysis): The proof that the additive O(C log T) term arises from the corruption while preserving the Õ(T^{3/4}) stochastic term must be made fully explicit. In particular, it is unclear how the algorithm's adaptive scaling interacts with the global budget-balance constraint when the corruption affects the support of the valuation distributions; a concrete expansion of the decomposition used in the regret analysis (currently only sketched) is required to confirm the bound is load-bearing.
  2. [§3.2] §3.2 (Perturbed Market Model): The assumption that the total corruption magnitude C is known to the algorithm is used to tune the exploration/exploitation parameters. If this knowledge is relaxed to an unknown C, the claimed adaptivity no longer holds; the manuscript should either prove a version with unknown C or state the known-C assumption as a modeling limitation.
minor comments (2)
  1. [Preliminaries] The notation Õ(·) is used throughout but never formally defined; a single sentence in the preliminaries clarifying that it hides polylog(T) factors would improve readability.
  2. [Experiments] Figure 1 (regret curves) lacks error bars or confidence intervals; adding them would make the empirical validation of the O(C log T) scaling clearer.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading, positive assessment of the contribution, and constructive suggestions. We address each major comment below.

read point-by-point responses
  1. Referee: [§4] §4 (Algorithm and Regret Analysis): The proof that the additive O(C log T) term arises from the corruption while preserving the Õ(T^{3/4}) stochastic term must be made fully explicit. In particular, it is unclear how the algorithm's adaptive scaling interacts with the global budget-balance constraint when the corruption affects the support of the valuation distributions; a concrete expansion of the decomposition used in the regret analysis (currently only sketched) is required to confirm the bound is load-bearing.

    Authors: We agree that the regret analysis in Section 4 is currently only sketched and would benefit from a fully explicit decomposition. In the revised manuscript we will expand the proof to separate the stochastic regret, the additive corruption term, and the interaction with the global budget-balance constraint. The expansion will explicitly bound how the adaptive scaling (tuned to the known C) controls the effect of corruption on the support of the valuation distributions, showing that the O(C log T) term is additive and does not inflate the Õ(T^{3/4}) term. This will make the load-bearing nature of the bound transparent. revision: yes

  2. Referee: [§3.2] §3.2 (Perturbed Market Model): The assumption that the total corruption magnitude C is known to the algorithm is used to tune the exploration/exploitation parameters. If this knowledge is relaxed to an unknown C, the claimed adaptivity no longer holds; the manuscript should either prove a version with unknown C or state the known-C assumption as a modeling limitation.

    Authors: The algorithm does rely on knowledge of C to set its exploration/exploitation parameters. Extending the result to unknown C would require additional machinery such as doubling tricks or online parameter tuning, which is outside the current scope and could degrade the bounds. In the revision we will therefore explicitly state the known-C assumption as a modeling limitation in Section 3.2, discuss its practical relevance, and identify the unknown-C case as an interesting direction for future work. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained against external baselines

full rationale

The paper's central claims consist of an algorithm achieving stated regret bounds against two externally defined comparators: the best budget-balanced distribution over prices (a fixed but unknown stochastic optimum) and a per-round budget-balanced baseline (an adversarial worst-case mechanism). These targets are defined independently of the algorithm's outputs or fitted parameters, with the corruption bound C serving as an explicit input rather than a derived quantity. No equations reduce a claimed prediction to a fitted input by construction, no uniqueness theorems are imported via self-citation, and no ansatz is smuggled through prior work. The dual guarantees are internally consistent with the perturbed-market model but do not rely on re-labeling or self-referential definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard domain assumptions of online learning and mechanism design for bilateral trade; no free parameters or invented entities are introduced in the abstract.

axioms (2)
  • domain assumption There exists a best budget-balanced distribution over prices for the underlying stochastic market.
    This baseline is used for the main regret guarantee.
  • domain assumption The total adversarial corruption is bounded by a parameter C.
    The algorithm's adaptive scaling depends on this bound.

pith-pipeline@v0.9.0 · 5494 in / 1302 out tokens · 38190 ms · 2026-05-12T04:23:03.485344+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 1 internal anchor

  1. [1]

    Liad Blumrosen and Yehonatan Mizrahi

    doi: 10.1145/2600057.2602843. Liad Blumrosen and Yehonatan Mizrahi. Approximating gains-from-trade in bilateral trading. In WINE, volume 10123 ofLecture Notes in Computer Science, pages 400–413. Springer,

  2. [2]

    doi: 10.1007/978-3-662-54110-4

  3. [3]

    Johannes Brustle, Yang Cai, Fa Wu, and Mingfei Zhao

    ISBN 9798400704864. Johannes Brustle, Yang Cai, Fa Wu, and Mingfei Zhao. Approximating gains from trade in two-sided markets via simple mechanisms. InProceedings of the 2017 ACM Conference on Economics and Computation, pages 589–590,

  4. [4]

    Online learning under budget and roi constraints via weak adaptivity.arXiv preprint arXiv:2302.01203,

    Matteo Castiglioni, Andrea Celli, and Christian Kroer. Online learning under budget and roi constraints via weak adaptivity.arXiv preprint arXiv:2302.01203,

  5. [5]

    The Sample Complexity of Uniform Approximation for Multi-Dimensional CDFs and Fixed-Price Mechanisms

    Matteo Castiglioni, Anna Lunghi, and Alberto Marchesi. The sample complexity of uniform approximation for multi-dimensional cdfs and fixed-price mechanisms.arXiv preprint arXiv:2602.10868,

  6. [6]

    Tight regret bounds for fixed-price bilateral trade.arXiv preprint arXiv:2504.04349,

    Houshuang Chen, Yaonan Jin, Pinyan Lu, and Chihao Zhang. Tight regret bounds for fixed-price bilateral trade.arXiv preprint arXiv:2504.04349,

  7. [7]

    Nonparametric contextual online bilateral trade

    Emanuele Coccia, Martino Bernasconi, and Andrea Celli. Nonparametric contextual online bilateral trade. InThe Fourteenth International Conference on Learning Representations. Romain Cosson, Federico Fusco, Anupam Gupta, Stefano Leonardi, Renato Paes Leme, and Matteo Russo. Contextual online bilateral trade.arXiv preprint arXiv:2602.12903,

  8. [8]

    Subquadratic dynamic path reporting in directed graphs against an adaptive adversary , booktitle =

    doi: 10.1145/3519935.3520054. Simone Di Gregorio, Paul D¨ utting, Federico Fusco, and Chris Schwiegelshohn. Nearly tight regret bounds for profit maximization in bilateral trade. In2025 IEEE 66th Annual Symposium on Foundations of Computer Science (FOCS), pages 1570–1594,

  9. [9]

    Tight pair query lower bounds for matching and earth mover’s distance

    doi: 10.1109/FOCS63196.2025.00083. Yumou Fei. Improved approximation to first-best gains-from-trade. InInternational Conference on Web and Internet Economics, pages 204–218. Springer,

  10. [10]

    doi: 10.1007/978-3-031-22832-2

  11. [11]

    Tom´ aˇ s Koc´ ak, Gergely Neu, Michal Valko, and R´ emi Munos

    doi: 10.1137/1.9781611977073.115. Tom´ aˇ s Koc´ ak, Gergely Neu, Michal Valko, and R´ emi Munos. Efficient learning by implicit exploration in bandit problems with side observations.Advances in Neural Information Processing Systems, 27,

  12. [12]

    doi: 10.1137/1.9781611978971.233

    Anna Lunghi, Matteo Castiglioni, and Alberto Marchesi.Better Regret Rates in Bilateral Trade via Sublinear Budget Violation, pages 6494–6536. doi: 10.1137/1.9781611978971.233. URLhttps://epubs.siam.org/ doi/abs/10.1137/1.9781611978971.233. Anna Lunghi, Matteo Castiglioni, and Alberto Marchesi. Online two-sided markets: Many buyers enhance learning.arXiv p...

  13. [13]

    A stronger benchmark for online bilateral trade: From fixed prices to distributions.arXiv preprint arXiv:2602.05681,

    Anna Lunghi, Mattia Piccinato, Matteo Castiglioni, and Alberto Marchesi. A stronger benchmark for online bilateral trade: From fixed prices to distributions.arXiv preprint arXiv:2602.05681,

  14. [14]

    However, they showed that Strong Budget Balanced (SBB) mechanisms are learnable when the seller and buyer valuation distributions are independent and admit bounded density

    13 A Additional Related Works The study of GFT maximization in online bilateral trade was initiated by [Cesa-Bianchi et al., 2024a], who show that under one-bit feedback the problem is in general unlearnable. However, they showed that Strong Budget Balanced (SBB) mechanisms are learnable when the seller and buyer valuation distributions are independent an...

  15. [15]

    We can useCalso to bound the following quantity X t∈[T] E(p,q)∼π E (s,b)∼Lt [Rev(p, q, s, b)] − X t∈[T] E(p,q)∼π E (s,b)∼P [Rev(p, q, s, b)] ≤C, which means that E(p,q)∼π E (s,b)∼P [Rev(p, q, s, b)] ≥ −C T . Letλ t conditioned to the filtration up tot−1, be independent from the revenue at timetconditioned to the filtration up tot−1, i.e.λ t is independent...

  16. [16]

    ! ≤ X (p,q)∈A ˆπt(p, q) (1−I(s t ≤U t ≤p, b t ≥q))I(q= ˆq t, Ht = 1)· · 1 α 2 P p′:(p,q)∈A ˆπt(p′, q) − 1 α 2 P p′:(p,q)∈A ˆπt(p′, q) +γ ! | {z } (⋆) and (⋆) can be further bounded as (⋆) = 1 α 2 P p′:(p,q)∈A ˆπt(p′, q) − 1 α 2 P p′:(p,q)∈A ˆπt(p′, q) +γ ! = 1 α 2 P p′:(p,q)∈A ˆπt(p′, q) α 2 P p′:(p,q)∈A ˆπt(p′, q) +γ− α 2 P p′:(p,q)∈A ˆπt(p′, q) α 2 P p′...