Recognition: unknown
Arbitrage Analysis in Polymarket NBA Markets
Pith reviewed 2026-05-09 23:02 UTC · model grok-4.3
The pith
Polymarket NBA markets exhibit high microstructural efficiency with executable arbitrage rare and liquidity-bounded to retail scale.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By analyzing reconstructed market states, the study finds profound efficiency: single-market arbitrage episodes are exceedingly rare at seven instances lasting a median 3.6 seconds, while 290 combinatorial episodes concentrate in final minutes with 101 basis point median returns yet never realize the theoretical middle jackpot, and 76.9 percent constrained to 14.8 shares average size, confining risk-free extraction to retail scale.
What carries the argument
Reconstruction of continuous market states from discrete limit order book snapshots to identify and quantify single-market and combinatorial arbitrage opportunities.
Load-bearing premise
That the reconstruction of continuous market states from discrete snapshots accurately identifies all executable arbitrage without material distortion from latency, fees, or order cancellations.
What would settle it
A new dataset of similar size showing more than seven single-market arbitrage episodes or combinatorial opportunities with average executable sizes above 50 shares would falsify the efficiency and liquidity-bound claims.
Figures
read the original abstract
While decentralized prediction markets like Polymarket have gained significant traction, their market microstructure and high-frequency pricing efficiency remain underexplored. This paper conducts a systematic empirical analysis of algorithmic arbitrage within Polymarket's NBA game markets. By reconstructing continuous market states from over 75 million limit order book snapshots across 173 games, we evaluate the frequency, duration, and profitability of both single-market and combinatorial arbitrage opportunities. Our findings demonstrate profound microstructural efficiency. Single-market anomalies are exceedingly rare, yielding only 7 executable in-game episodes that persist for a median duration of just 3.6 seconds. Combinatorial inefficiencies are more frequent, producing 290 active episodes overwhelmingly concentrated in the final minutes of live play. While combinatorial execution yields a statistically meaningful median return of 101 basis points, we find that the theoretical "Middle" jackpot is never empirically realized. Furthermore, execution is severely bottlenecked by shallow order book depth, with 76.9\% of combinatorial opportunities constrained to an average executable size of just 14.8 shares. Ultimately, while executable mispricings exist, they are structurally bounded by liquidity, confining risk-free extraction strictly to the retail scale.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper conducts a large-scale empirical analysis of arbitrage in Polymarket NBA prediction markets by reconstructing continuous states from over 75 million limit order book snapshots across 173 games. It reports profound microstructural efficiency: only 7 executable single-market arbitrage episodes (median duration 3.6 seconds), 290 combinatorial episodes (concentrated in final minutes, median return 101 bp), with the theoretical 'Middle' jackpot never realized, and 76.9% of opportunities limited to an average executable size of 14.8 shares due to shallow depth, confining risk-free profits to retail scale.
Significance. If the episode identification and executability claims hold after addressing methodological gaps, the work offers a valuable high-frequency view of efficiency in decentralized prediction markets using an unusually large snapshot dataset. It quantifies the rarity of single-market anomalies versus more frequent but liquidity-constrained combinatorial ones, with implications for market design and the practical limits of arbitrage in similar platforms.
major comments (3)
- [Section 3] Section 3 (Methodology): The criteria and thresholds for identifying 'executable' arbitrage episodes from discrete LOB snapshots—including mispricing detection rules, minimum duration, handling of snapshot intervals, and definition of continuous states—are not specified. This is load-bearing for the central counts of 7 single-market and 290 combinatorial episodes and the efficiency conclusion.
- [Section 4] Section 4 (Results): No adjustment or discussion of transaction fees is provided when reporting the 101 bp median combinatorial return as 'statistically meaningful' and executable; fees could eliminate profitability for the reported small sizes (14.8 shares average), directly affecting the claim that opportunities are realizable.
- [Section 3.2] Section 3.2 (Data reconstruction): The snapshot-based reconstruction does not model execution latency, inter-snapshot cancellations, or temporary order book changes, which risks overstating the number and duration of executable episodes (as noted in the stress-test concern). Without robustness checks against these factors, the 'profound efficiency' finding with only 7 single-market cases cannot be verified.
minor comments (2)
- [Abstract] Abstract and Section 4: The phrase 'statistically meaningful median return' is used without describing the test or confidence interval; add this detail for clarity.
- [Results] Figures and tables: Ensure all reported statistics (e.g., 76.9%, 14.8 shares, episode durations) include sample sizes, standard errors, or distributions to allow assessment of variability.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which help clarify key aspects of our methodology and strengthen the robustness of our efficiency conclusions. We address each major comment below and commit to revisions that enhance transparency without altering the core empirical findings.
read point-by-point responses
-
Referee: [Section 3] Section 3 (Methodology): The criteria and thresholds for identifying 'executable' arbitrage episodes from discrete LOB snapshots—including mispricing detection rules, minimum duration, handling of snapshot intervals, and definition of continuous states—are not specified. This is load-bearing for the central counts of 7 single-market and 290 combinatorial episodes and the efficiency conclusion.
Authors: We agree that explicit specification of these criteria is essential for reproducibility and verification of the reported episode counts. In the revised manuscript, we will expand Section 3 with a new subsection detailing: (i) mispricing detection rules based on no-arbitrage bounds for single-market (price discrepancy > 0.5%) and combinatorial cases; (ii) a minimum duration threshold of 2 seconds to exclude transient noise; (iii) handling of snapshot intervals via forward-filling of the last observed state; and (iv) continuous state reconstruction assuming no unobserved cancellations between snapshots. We will also include pseudocode for the detection algorithm and report sensitivity to alternative thresholds. revision: yes
-
Referee: [Section 4] Section 4 (Results): No adjustment or discussion of transaction fees is provided when reporting the 101 bp median combinatorial return as 'statistically meaningful' and executable; fees could eliminate profitability for the reported small sizes (14.8 shares average), directly affecting the claim that opportunities are realizable.
Authors: The referee is correct that fees must be addressed to support claims of realizability. Polymarket fees vary by volume but can reach 2-4% for smaller traders, which would reduce or negate the 101 bp median return at the reported average executable size of 14.8 shares. In the revision, we will add to Section 4 a dedicated paragraph on fees, including a sensitivity table showing net returns under low- and high-fee scenarios, and qualify the 'executable' and 'retail scale' conclusions to note that net profitability is trader-dependent and often marginal after fees. revision: yes
-
Referee: [Section 3.2] Section 3.2 (Data reconstruction): The snapshot-based reconstruction does not model execution latency, inter-snapshot cancellations, or temporary order book changes, which risks overstating the number and duration of executable episodes (as noted in the stress-test concern). Without robustness checks against these factors, the 'profound efficiency' finding with only 7 single-market cases cannot be verified.
Authors: We acknowledge this as a valid methodological limitation of snapshot data. Our 75 million snapshots imply average intervals under 1 second during active trading, limiting the scope for unobserved changes, but we did not explicitly model latency or cancellations. In the revised version, we will add robustness analysis in Section 3.2: (i) stress tests assuming 500 ms execution latency and random cancellations at 10-20% probability between snapshots; (ii) re-computation of episode counts and durations under these assumptions; and (iii) confirmation that single-market episodes remain below 10 even under conservative scenarios, preserving the efficiency interpretation while noting the data constraints. revision: partial
Circularity Check
Purely empirical reconstruction with no derivations or self-referential steps
full rationale
The manuscript performs direct empirical measurement: it reconstructs discrete LOB snapshots into continuous states, then counts and times single-market and combinatorial arbitrage episodes across 75M snapshots. No equations define derived quantities in terms of themselves, no parameters are fitted and relabeled as predictions, and no uniqueness theorems or ansatzes are imported via self-citation. All headline statistics (7 episodes, 3.6 s median, 290 combinatorial episodes, 101 bp median return, 76.9 % depth limit) are literal tallies from the reconstructed data. The analysis is therefore self-contained against external benchmarks and contains no circular reduction.
Axiom & Free-Parameter Ledger
free parameters (1)
- arbitrage episode identification thresholds
axioms (1)
- domain assumption Limit order book snapshots can be accurately reconstructed into continuous market states that reflect all tradable prices without significant loss or bias.
Reference graph
Works this paper leans on
-
[1]
arXiv preprint arXiv:2508.03474 , year =
Unravelling the Probabilistic Forest: Arbitrage in Prediction Markets , author=. arXiv preprint arXiv:2508.03474 , year=
-
[2]
Polymarket 101 , year =
-
[3]
Polymarket API , year =
-
[4]
2025 , note =
Slivkoff, Storm , title =. 2025 , note =
2025
-
[5]
Polymarket OrderFilled Data , year =
-
[6]
Available at SSRN 5331995 , year=
Price Discovery and Trading in Prediction Markets , author=. Available at SSRN 5331995 , year=
-
[7]
Journal of Finance , volume=
The Limits of Arbitrage , author=. Journal of Finance , volume=. 1997 , publisher=
1997
-
[8]
The Anatomy of
Tsang, Kwok Ping and Yang, Zichao , journal=. The Anatomy of
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.