Online Market Making and the Value of Observing the Order Book
Pith reviewed 2026-05-20 06:41 UTC · model grok-4.3
The pith
Action-dependent feedback from order book no-trades enables O(sqrt(T)) regret in online market making without smoothness assumptions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In the stochastic setting with i.i.d. market prices, an elimination-based algorithm achieves O(sqrt(T)) regret with high probability without requiring any smoothness assumptions on the distribution of trader valuations. The result rests on the action-dependent feedback model in which no-trade events reveal informative supply-and-demand signals while trades leave valuations hidden. The same O(sqrt(T)) high-probability bounds are obtained for broad classes of mean-reverting price processes by means of a new concentration inequality. In the adversarial setting with oblivious prices an explore-then-perturb algorithm guarantees O(T^{2/3}) regret in expectation.
What carries the argument
The action-dependent feedback model in which no-trade events reveal informative supply-and-demand signals while trades leave valuations hidden.
If this is right
- Market makers obtain sublinear regret in stochastic environments even when trader valuations lack smoothness.
- The same square-root bounds hold when prices follow local autoregressive dynamics or satisfy a global cumulative-deviation condition.
- In adversarial oblivious-price environments the regret improves to T to the two-thirds instead of remaining linear.
- The results directly quantify how limited order-book observations improve learning relative to fully censored feedback.
Where Pith is reading between the lines
- Similar action-dependent feedback may improve regret in other sequential pricing tasks that involve partial observability.
- Empirical tests on real order-book data could check whether no-trade events indeed correlate with valuation ranges as assumed.
- The new concentration inequality may apply to other online problems that mix revealed and censored observations.
Load-bearing premise
That no-trade events supply useful information about the hidden valuations while trades supply none.
What would settle it
An experiment in which the proposed elimination algorithm is run on i.i.d. prices yet the observed regret remains linear in T rather than square-root.
read the original abstract
We study an online market-making problem in which a learner sequentially posts bid and ask prices for a single asset while interacting with traders holding private valuations. Unlike existing online learning formulations that assume fully censored feedback, we introduce an action-dependent feedback model inspired by real limit order books: when a trade occurs, the trader's valuation remains hidden, whereas when no trade occurs, informative feedback about supply and demand is revealed. We show that this additional information fundamentally changes the learnability of the problem. In the stochastic setting with i.i.d. market prices, we propose an elimination-based algorithm that achieves $O(\sqrt T)$ regret with high probability, without requiring any smoothness assumptions on the distribution of trader valuations. We then extend this result to a broad class of mean-reverting price processes by considering both local, autoregressive dynamics and a weaker global drift condition based on cumulative deviations from the mean. Under either assumption, we establish high-probability $O(\sqrt T)$ regret bounds, relying on a new concentration inequality of independent interest. Finally, in the adversarial setting with oblivious prices, we design an explore-then-perturb algorithm that guarantees $O(T^{2/3})$ regret in expectation. Our results quantify the value of observing the order book in online market making and demonstrate that even limited, action-dependent feedback can substantially improve regret guarantees compared to standard bandit feedback models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies online market making with an action-dependent feedback model inspired by limit order books: no-trade events reveal supply/demand signals while trades censor trader valuations. In the stochastic i.i.d. setting it proposes an elimination algorithm achieving O(√T) high-probability regret without smoothness assumptions on valuations. Results extend to mean-reverting processes (local autoregressive and global drift) via a new concentration inequality, and an explore-then-perturb algorithm yields O(T^{2/3}) regret in the oblivious adversarial case.
Significance. If the derivations hold, the work shows that limited order-book feedback can improve learnability over fully censored bandit models, delivering √T regret in stochastic settings without regularity conditions on valuations and a new concentration tool for mean-reverting dynamics. These are concrete advances for online market-making literature.
major comments (2)
- [§3] §3, Algorithm 1 and Theorem 1: the elimination procedure and O(√T) analysis are stated for a finite discrete price grid. When prices lie in a continuum (standard for market making), the regret bound requires an explicit covering or discretization argument whose error term remains o(√T) uniformly over arbitrary valuation distributions; the current no-smoothness claim does not address this and is therefore load-bearing for the central stochastic result.
- [§5] §5, Theorem 3 and the new concentration inequality: the high-probability O(√T) bound under mean-reversion relies on this inequality. The proof sketch must be expanded to confirm that the deviation control holds under only the stated local autoregressive or cumulative-drift conditions and does not implicitly require stronger mixing or bounded moments that would narrow the claimed generality.
minor comments (2)
- [Abstract] Abstract and §2: the phrasing 'i.i.d. market prices' is used while the model centers on trader valuations; add one clarifying sentence distinguishing the two processes and stating how market-price realizations enter the feedback model.
- [§4] Notation in §4: the definition of the global drift condition uses cumulative deviations; ensure the constant factors and the precise form of the deviation threshold are stated explicitly so that the concentration inequality can be applied directly.
Simulated Author's Rebuttal
We thank the referee for their careful and constructive review. The comments highlight important points regarding the scope of our assumptions and the completeness of our proofs. We address each major comment below and will revise the manuscript accordingly to improve clarity and rigor.
read point-by-point responses
-
Referee: [§3] §3, Algorithm 1 and Theorem 1: the elimination procedure and O(√T) analysis are stated for a finite discrete price grid. When prices lie in a continuum (standard for market making), the regret bound requires an explicit covering or discretization argument whose error term remains o(√T) uniformly over arbitrary valuation distributions; the current no-smoothness claim does not address this and is therefore load-bearing for the central stochastic result.
Authors: We thank the referee for this observation. The analysis in Section 3, Algorithm 1, and Theorem 1 is developed explicitly for a finite discrete price grid. This modeling choice focuses on the core learning challenge induced by the action-dependent feedback while avoiding continuity issues. The no-smoothness claim refers to the trader valuation distributions over the discrete grid, which permits arbitrary distributions and yields the O(√T) high-probability regret. We agree that a continuous price space would require an additional discretization argument, and without smoothness on valuations it is difficult to guarantee the approximation error is o(√T) uniformly. We will revise the manuscript to state the discrete price assumption explicitly in the problem setup and add a discussion of the challenges for continuous extensions. revision: yes
-
Referee: [§5] §5, Theorem 3 and the new concentration inequality: the high-probability O(√T) bound under mean-reversion relies on this inequality. The proof sketch must be expanded to confirm that the deviation control holds under only the stated local autoregressive or cumulative-drift conditions and does not implicitly require stronger mixing or bounded moments that would narrow the claimed generality.
Authors: We appreciate the referee's request to strengthen the presentation of the concentration inequality. The inequality is constructed to apply under the local autoregressive dynamics or the weaker global cumulative-drift condition, relying only on the stated assumptions without invoking stronger mixing rates or extra moment bounds. We will expand the current proof sketch into a complete, self-contained proof in the appendix of the revised version, with explicit steps verifying that the high-probability deviation bounds follow directly from the given conditions. This will confirm the claimed generality. revision: yes
Circularity Check
No significant circularity; derivation relies on independent algorithmic analysis and new concentration inequality.
full rationale
The paper introduces an action-dependent feedback model and derives O(√T) regret bounds via an elimination algorithm without smoothness assumptions, plus extensions using a new concentration inequality of independent interest for mean-reverting processes. No quoted steps reduce the claimed regret bounds or learnability results to self-definitions, fitted parameters renamed as predictions, or load-bearing self-citations. The central claims rest on explicit algorithmic construction and probabilistic analysis that do not presuppose the target bounds by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard concentration inequalities for i.i.d. and mean-reverting processes
Reference graph
Works this paper leans on
-
[1]
arXiv preprint arXiv:2411.13993 , year=
Market Making without Regret , author=. arXiv preprint arXiv:2411.13993 , year=
-
[2]
Quantitative Finance , volume=
A learning market-maker in the Glosten--Milgrom model , author=. Quantitative Finance , volume=. 2005 , publisher=
work page 2005
-
[3]
Journal of Statistical Mechanics: Theory and Experiment , volume=
Information thermodynamics of financial markets: The Glosten--Milgrom model , author=. Journal of Statistical Mechanics: Theory and Experiment , volume=. 2021 , publisher=
work page 2021
- [4]
- [5]
- [6]
- [7]
-
[8]
Advances in Neural Information Processing Systems , volume=
Nearly tight bounds for the continuum-armed bandit problem , author=. Advances in Neural Information Processing Systems , volume=
-
[9]
International Conference on Artificial Intelligence and Statistics , pages=
Smooth bandit optimization: generalization to holder space , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2021 , organization=
work page 2021
-
[10]
Journal of financial economics , volume=
An equilibrium characterization of the term structure , author=. Journal of financial economics , volume=. 1977 , publisher=
work page 1977
-
[11]
Theory of financial risk and derivative pricing: from statistical physics to risk management , author=. 2003 , publisher=
work page 2003
-
[12]
What Doubling Tricks Can and Can't Do for Multi-Armed Bandits
What doubling tricks can and can't do for multi-armed bandits , author=. arXiv preprint arXiv:1803.06971 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[13]
Advances in Neural Information Processing Systems , volume=
On explore-then-commit strategies , author=. Advances in Neural Information Processing Systems , volume=
-
[14]
Journal of Computer and System Sciences , volume=
Efficient algorithms for online decision problems , author=. Journal of Computer and System Sciences , volume=. 2005 , publisher=
work page 2005
-
[15]
Journal of financial Economics , volume=
Asset pricing and the bid-ask spread , author=. Journal of financial Economics , volume=. 1986 , publisher=
work page 1986
-
[16]
Journal of financial Economics , volume=
Estimating the components of the bid/ask spread , author=. Journal of financial Economics , volume=. 1988 , publisher=
work page 1988
-
[17]
Journal of financial markets , volume=
Market microstructure: A survey , author=. Journal of financial markets , volume=. 2000 , publisher=
work page 2000
-
[18]
Exchange Global Share and Segment Sizing 2025 , institution =. 2025 , url =
work page 2025
- [19]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.