arxiv: 2605.00864 · v1 · submitted 2026-04-22 · 💱 q-fin.TR

Recognition: unknown

Arbitrage Analysis in Polymarket NBA Markets

Guang Cheng, Haoxuan Zou, Jiaxin Yang

Pith reviewed 2026-05-09 23:02 UTC · model grok-4.3

classification 💱 q-fin.TR

keywords arbitrageprediction marketsPolymarketNBAmarket microstructurelimit order bookmarket efficiencycombinatorial arbitrage

0 comments

The pith

Polymarket NBA markets exhibit high microstructural efficiency with executable arbitrage rare and liquidity-bounded to retail scale.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reconstructs continuous market states from over 75 million limit order book snapshots across 173 NBA games to measure arbitrage frequency, duration, and profitability. Single-market anomalies prove exceedingly rare with only seven executable episodes lasting a median 3.6 seconds, while combinatorial opportunities occur 290 times mostly in final minutes and deliver a median 101 basis point return. These opportunities never realize the theoretical middle jackpot and remain constrained by shallow order books, with 76.9 percent limited to an average 14.8 shares. A sympathetic reader cares because the results indicate decentralized prediction markets can achieve tight efficiency without central clearing, yet mispricings persist at small scale due to liquidity rather than vanishing entirely.

Core claim

By analyzing reconstructed market states, the study finds profound efficiency: single-market arbitrage episodes are exceedingly rare at seven instances lasting a median 3.6 seconds, while 290 combinatorial episodes concentrate in final minutes with 101 basis point median returns yet never realize the theoretical middle jackpot, and 76.9 percent constrained to 14.8 shares average size, confining risk-free extraction to retail scale.

What carries the argument

Reconstruction of continuous market states from discrete limit order book snapshots to identify and quantify single-market and combinatorial arbitrage opportunities.

Load-bearing premise

That the reconstruction of continuous market states from discrete snapshots accurately identifies all executable arbitrage without material distortion from latency, fees, or order cancellations.

What would settle it

A new dataset of similar size showing more than seven single-market arbitrage episodes or combinatorial opportunities with average executable sizes above 50 shares would falsify the efficiency and liquidity-bound claims.

Figures

Figures reproduced from arXiv: 2605.00864 by Guang Cheng, Haoxuan Zou, Jiaxin Yang.

**Figure 2.** Figure 2: Distribution of arbitrage episode durations. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Distribution of arbitrage opportunities relative to game start. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Distribution of arbitrage episode durations. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: In-Game arbitrage duration vs. time from tip-off. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

While decentralized prediction markets like Polymarket have gained significant traction, their market microstructure and high-frequency pricing efficiency remain underexplored. This paper conducts a systematic empirical analysis of algorithmic arbitrage within Polymarket's NBA game markets. By reconstructing continuous market states from over 75 million limit order book snapshots across 173 games, we evaluate the frequency, duration, and profitability of both single-market and combinatorial arbitrage opportunities. Our findings demonstrate profound microstructural efficiency. Single-market anomalies are exceedingly rare, yielding only 7 executable in-game episodes that persist for a median duration of just 3.6 seconds. Combinatorial inefficiencies are more frequent, producing 290 active episodes overwhelmingly concentrated in the final minutes of live play. While combinatorial execution yields a statistically meaningful median return of 101 basis points, we find that the theoretical "Middle" jackpot is never empirically realized. Furthermore, execution is severely bottlenecked by shallow order book depth, with 76.9\% of combinatorial opportunities constrained to an average executable size of just 14.8 shares. Ultimately, while executable mispricings exist, they are structurally bounded by liquidity, confining risk-free extraction strictly to the retail scale.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Polymarket NBA arbitrage is rare and liquidity-limited, but snapshot reconstruction needs validation for real executability.

read the letter

The one or two things to know: this paper shows that single-market arbitrage is almost nonexistent in Polymarket's NBA games, with only seven executable episodes lasting a median of 3.6 seconds, while combinatorial opportunities occur more often but are still constrained by shallow liquidity to small sizes. It does a good job with the empirical work. Reconstructing states from 75 million snapshots across 173 games lets them report concrete figures on frequency, duration, returns, and how opportunities cluster in the final minutes. The specific claim that the theoretical middle jackpot is never realized stands out as a clear observation, and the 76.9% of cases limited to 14.8 shares average size highlights the practical limits. This kind of platform-specific quantification is new and extends prior efficiency studies in prediction markets. The paper stays grounded in the data without pushing untested models. Where it is softer is on the identification of those opportunities. The abstract describes the reconstruction but gives no details on thresholds, how they deal with order cancellations between snapshots, or adjustments for trading fees and latency. The stress-test concern is fair here: treating discrete snapshots as stable states could include non-executable cases or miss others. Without that, the conclusion of profound efficiency rests on an assumption that may not fully hold in live trading. It's not a fatal issue, but it needs addressing for the results to be fully convincing. This is useful for readers working on decentralized finance, prediction markets, or high-frequency trading in crypto. Someone building models for these platforms or comparing to traditional betting markets would find the numbers helpful. I would send it to peer review. The data effort is real, and referees can request the missing method details and robustness tests to strengthen it.

Referee Report

3 major / 2 minor

Summary. The paper conducts a large-scale empirical analysis of arbitrage in Polymarket NBA prediction markets by reconstructing continuous states from over 75 million limit order book snapshots across 173 games. It reports profound microstructural efficiency: only 7 executable single-market arbitrage episodes (median duration 3.6 seconds), 290 combinatorial episodes (concentrated in final minutes, median return 101 bp), with the theoretical 'Middle' jackpot never realized, and 76.9% of opportunities limited to an average executable size of 14.8 shares due to shallow depth, confining risk-free profits to retail scale.

Significance. If the episode identification and executability claims hold after addressing methodological gaps, the work offers a valuable high-frequency view of efficiency in decentralized prediction markets using an unusually large snapshot dataset. It quantifies the rarity of single-market anomalies versus more frequent but liquidity-constrained combinatorial ones, with implications for market design and the practical limits of arbitrage in similar platforms.

major comments (3)

[Section 3] Section 3 (Methodology): The criteria and thresholds for identifying 'executable' arbitrage episodes from discrete LOB snapshots—including mispricing detection rules, minimum duration, handling of snapshot intervals, and definition of continuous states—are not specified. This is load-bearing for the central counts of 7 single-market and 290 combinatorial episodes and the efficiency conclusion.
[Section 4] Section 4 (Results): No adjustment or discussion of transaction fees is provided when reporting the 101 bp median combinatorial return as 'statistically meaningful' and executable; fees could eliminate profitability for the reported small sizes (14.8 shares average), directly affecting the claim that opportunities are realizable.
[Section 3.2] Section 3.2 (Data reconstruction): The snapshot-based reconstruction does not model execution latency, inter-snapshot cancellations, or temporary order book changes, which risks overstating the number and duration of executable episodes (as noted in the stress-test concern). Without robustness checks against these factors, the 'profound efficiency' finding with only 7 single-market cases cannot be verified.

minor comments (2)

[Abstract] Abstract and Section 4: The phrase 'statistically meaningful median return' is used without describing the test or confidence interval; add this detail for clarity.
[Results] Figures and tables: Ensure all reported statistics (e.g., 76.9%, 14.8 shares, episode durations) include sample sizes, standard errors, or distributions to allow assessment of variability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help clarify key aspects of our methodology and strengthen the robustness of our efficiency conclusions. We address each major comment below and commit to revisions that enhance transparency without altering the core empirical findings.

read point-by-point responses

Referee: [Section 3] Section 3 (Methodology): The criteria and thresholds for identifying 'executable' arbitrage episodes from discrete LOB snapshots—including mispricing detection rules, minimum duration, handling of snapshot intervals, and definition of continuous states—are not specified. This is load-bearing for the central counts of 7 single-market and 290 combinatorial episodes and the efficiency conclusion.

Authors: We agree that explicit specification of these criteria is essential for reproducibility and verification of the reported episode counts. In the revised manuscript, we will expand Section 3 with a new subsection detailing: (i) mispricing detection rules based on no-arbitrage bounds for single-market (price discrepancy > 0.5%) and combinatorial cases; (ii) a minimum duration threshold of 2 seconds to exclude transient noise; (iii) handling of snapshot intervals via forward-filling of the last observed state; and (iv) continuous state reconstruction assuming no unobserved cancellations between snapshots. We will also include pseudocode for the detection algorithm and report sensitivity to alternative thresholds. revision: yes
Referee: [Section 4] Section 4 (Results): No adjustment or discussion of transaction fees is provided when reporting the 101 bp median combinatorial return as 'statistically meaningful' and executable; fees could eliminate profitability for the reported small sizes (14.8 shares average), directly affecting the claim that opportunities are realizable.

Authors: The referee is correct that fees must be addressed to support claims of realizability. Polymarket fees vary by volume but can reach 2-4% for smaller traders, which would reduce or negate the 101 bp median return at the reported average executable size of 14.8 shares. In the revision, we will add to Section 4 a dedicated paragraph on fees, including a sensitivity table showing net returns under low- and high-fee scenarios, and qualify the 'executable' and 'retail scale' conclusions to note that net profitability is trader-dependent and often marginal after fees. revision: yes
Referee: [Section 3.2] Section 3.2 (Data reconstruction): The snapshot-based reconstruction does not model execution latency, inter-snapshot cancellations, or temporary order book changes, which risks overstating the number and duration of executable episodes (as noted in the stress-test concern). Without robustness checks against these factors, the 'profound efficiency' finding with only 7 single-market cases cannot be verified.

Authors: We acknowledge this as a valid methodological limitation of snapshot data. Our 75 million snapshots imply average intervals under 1 second during active trading, limiting the scope for unobserved changes, but we did not explicitly model latency or cancellations. In the revised version, we will add robustness analysis in Section 3.2: (i) stress tests assuming 500 ms execution latency and random cancellations at 10-20% probability between snapshots; (ii) re-computation of episode counts and durations under these assumptions; and (iii) confirmation that single-market episodes remain below 10 even under conservative scenarios, preserving the efficiency interpretation while noting the data constraints. revision: partial

Circularity Check

0 steps flagged

Purely empirical reconstruction with no derivations or self-referential steps

full rationale

The manuscript performs direct empirical measurement: it reconstructs discrete LOB snapshots into continuous states, then counts and times single-market and combinatorial arbitrage episodes across 75M snapshots. No equations define derived quantities in terms of themselves, no parameters are fitted and relabeled as predictions, and no uniqueness theorems or ansatzes are imported via self-citation. All headline statistics (7 episodes, 3.6 s median, 290 combinatorial episodes, 101 bp median return, 76.9 % depth limit) are literal tallies from the reconstructed data. The analysis is therefore self-contained against external benchmarks and contains no circular reduction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The analysis rests on domain assumptions about data reconstruction and implicit thresholds for defining executable opportunities; no free parameters or invented entities are explicitly stated in the abstract.

free parameters (1)

arbitrage episode identification thresholds
Unstated criteria for what counts as an executable opportunity, such as minimum profit margin or duration, are required to produce the reported counts of 7 and 290 episodes.

axioms (1)

domain assumption Limit order book snapshots can be accurately reconstructed into continuous market states that reflect all tradable prices without significant loss or bias.
Invoked to identify arbitrage from discrete data points across 75 million snapshots.

pith-pipeline@v0.9.0 · 5499 in / 1602 out tokens · 43461 ms · 2026-05-09T23:02:46.092119+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

8 extracted references · 1 canonical work pages

[1]

arXiv preprint arXiv:2508.03474 , year =

Unravelling the Probabilistic Forest: Arbitrage in Prediction Markets , author=. arXiv preprint arXiv:2508.03474 , year=

work page arXiv
[2]

Polymarket 101 , year =
[3]

Polymarket API , year =
[4]

2025 , note =

Slivkoff, Storm , title =. 2025 , note =

2025
[5]

Polymarket OrderFilled Data , year =
[6]

Available at SSRN 5331995 , year=

Price Discovery and Trading in Prediction Markets , author=. Available at SSRN 5331995 , year=
[7]

Journal of Finance , volume=

The Limits of Arbitrage , author=. Journal of Finance , volume=. 1997 , publisher=

1997
[8]

The Anatomy of

Tsang, Kwok Ping and Yang, Zichao , journal=. The Anatomy of