pith. machine review for the scientific record. sign in

arxiv: 2604.24366 · v1 · submitted 2026-04-27 · 💱 q-fin.TR · cs.GT· q-fin.GN

Recognition: unknown

The Anatomy of a Decentralized Prediction Market: Microstructure Evidence from the Polymarket Order Book

Philipp D. Dubach

Pith reviewed 2026-05-07 17:04 UTC · model grok-4.3

classification 💱 q-fin.TR cs.GTq-fin.GN
keywords prediction marketsorder book microstructuretrade directionon-chain datadecentralized financeeffective spreadstylized factsblockchain markets
0
0 comments X

The pith

Polymarket's public order-book feed infers the correct trade direction only about 59 percent of the time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper joins a large archive of Polymarket's public WebSocket order-book events to the definitive on-chain trade records. It shows that trade directions inferred from the public feed match the on-chain ground truth in only 59 percent of cases on average across a panel of 600 markets. Because of this mismatch, common microstructure statistics such as effective spreads and Kyle's lambda frequently change sign when switching between the two data sources. The study also lays out eight stylized facts about order-book depth, spreads, maker behavior, and liquidity decay in this decentralized setting. These results indicate that accurate analysis of on-chain prediction markets demands direct use of blockchain trade events rather than public feeds alone.

Core claim

The paper establishes that trade direction inferred from Polymarket's public order-book feed agrees with on-chain ground truth only ~59% of the time (panel mean 0.615, 95% CI [0.58, 0.65]), barely above the 50% chance baseline. On a top-100 subset, effective half-spread changes sign between feed- and on-chain directions on 67% of markets in a first window and 50% in a second, while Kyle's lambda flips on 60% and 43% respectively. The public feed therefore recovers the on-chain sign at rates far below the ~80% achieved by the Lee-Ready algorithm on equity venues. The authors conclude that microstructure work on Polymarket requires sourcing trade direction from on-chain OrderFilled events and,

What carries the argument

The comparison and join of public WebSocket order-book feed data with on-chain OrderFilled events to validate trade direction inference and compute microstructure measures.

Load-bearing premise

The WebSocket order-book feed and on-chain trade records can be reliably joined without significant discrepancies or missing data, and the pre-registered stratified panel of 600 markets represents overall Polymarket behavior.

What would settle it

Replicating the analysis on a new sample of markets and finding that feed-inferred directions agree with on-chain records more than 70 percent of the time, or that microstructure measures maintain consistent signs across data sources.

Figures

Figures reproduced from arXiv: 2604.24366 by Philipp D. Dubach.

Figure 2
Figure 2. Figure 2: SF2 panel: histogram of L1/L10 depth-concentration ratio across 546 panel markets. Vertical lines mark the uniform-grid benchmark (green, 0.10) and the fully top-of￾book limit (red, 1.0). 5.3 SF3 – Polygon block-clock align￾ment We test whether price_change events cluster near Polygon block boundaries by computing, per market, the share of events that fall within ±100 ms of the nearest 2 000 ms grid point.… view at source ↗
Figure 1
Figure 1. Figure 1: SF1 panel: median quoted spread (bps) per mid-price decile, 600 panel markets. Shaded band is interquartile range. 5.2 SF2 – Depth concentration We summarize the L2 depth profile by the ratio depthL=1/depthL=10, the share of cumulative top-10 depth held at the top-of-book. A value of 1.0 means the entire top-10 depth sits at level 1 (a thin, top-heavy book); 0.1 matches a uniform grid where each level carr… view at source ↗
Figure 3
Figure 3. Figure 3: SF3 panel: distribution of per-market block-alignment shares. The red dashed line marks the chance-level null (0.10). HHI = 1; a uniform distribution across n makers yields 1/n. Across 600 markets and 6.4 M trades, the median HHI is 0.031 (∼ 32 effective makers). The distribution is right-skewed: p90 = 0.119 (∼ 8 effective makers) and a maximum of 0.40 (roughly 3 effective makers). Maker liquidity is decen… view at source ↗
Figure 5
Figure 5. Figure 5: SF5 panel: median effective half￾spread by category, with interquartile-range er￾ror bars. Categories are derived from keyword classification of CLOB REST question text. Each archive row carries two timestamps: timestamp_received (exchange side) and timestamp_created_at (collector side). Their difference is a per-event ingestion delay. Across 547 markets with non-empty windows, the median per-market p50 de… view at source ↗
Figure 6
Figure 6. Figure 6: SF6 panel: per-market percentile distributions of archive-ingestion latency (log scale). 5.7 SF7 – Self-counterparty wash share We flag a trade as wash-suspect under a two-tier rule: (a) maker == taker (direct self-match), or (b) a flipped pair (makera,takera) ↔ (takera, makera) within 128 blocks (Polygon finality buffer) on the same market. This is an explicit lower bound: it captures only direct and imme… view at source ↗
Figure 7
Figure 7. Figure 7: SF7 panel: distribution of self￾counterparty wash share by market. Red dashed line marks a 25% reference, the lower bound of the wash-share range documented by Cong et al. [2023] on unregulated cryptocur￾rency venues. midpoint (2026-03-13), restricted to 322 mar￾kets with positive seconds-to-close and non￾zero summary depth view at source ↗
Figure 8
Figure 8. Figure 8: SF8: cross-sectional fit of log mean depth on log seconds-to-close at the panel midpoint. 6 Spread Decomposition Following Glosten and Harris [1988] and the modern restatements in Huang and Stoll [1997], Madhavan et al. [1997], we decompose the per￾market effective half-spread into two compo￾nents: S eff 1/2 = c + φ, (1) where c is a transitory order-processing / inventory component, recovered as the reali… view at source ↗
Figure 9
Figure 9. Figure 9: Glosten-Harris decomposition across the top-100 stratum: distribution of transitory component c (left) and adverse-selection com￾ponent φ (right), both in probability points. component (0.0). This near-null pattern lines up with the calibration in Section 7: once sign errors are removed, the dollar￾weighted “adverse selection” that orderbook￾only inference produces collapses, leaving the typical top-100 ma… view at source ↗
read the original abstract

We study the microstructure of Polymarket, the largest on-chain prediction market, using a continuous tick-level archive of the public WebSocket order-book feed (30 billion events over 52 days) joined to the authoritative on-chain trade record. On a pre-registered stratified panel of 600 markets we report eight stylized facts: a longshot spread premium; a depth-concentration profile closer to a uniform geometric grid than to the top-of-book pattern often assumed for prediction markets; a null block-clock alignment effect; broad maker-wallet diversity with a concentrated tail; category-conditional differences in effective spread; a sub-50 ms median archive-ingestion delay with a multi-second tail; a self-counterparty wash share with median 1% and a 22% upper tail, well below the network-classifier benchmarks of Cong et al. (2023) for unregulated cryptocurrency token exchanges (a sanity bound, not an apples-to-apples reference, since the venues face different wash incentives); and a depth decay near resolution with a within-category slope of 0.55 on log seconds-to-close (t=3.85). The paper also contributes a measurement result: trade direction inferred from Polymarket's public order-book feed agrees with on-chain ground truth only ~59% of the time (panel mean 0.615, 95% CI [0.58, 0.65]), barely above the 50% chance baseline. On the comparable subset of the top-100 panel, the effective half-spread changes sign between feed- and on-chain directions on 67% of markets in a first 7-day window and 50% in a second non-overlapping window, with Kyle's lambda flipping on 60% and 43% respectively; neither window recovers the on-chain sign at anything close to the ~80% rate that Lee-Ready achieves on equity venues. Microstructure work on Polymarket therefore needs to source trade direction from on-chain OrderFilled events; we release a replication package that performs the join.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper examines the microstructure of Polymarket using a 30-billion-event WebSocket order-book archive joined to on-chain trade records over 52 days. On a pre-registered stratified panel of 600 markets it reports eight stylized facts (longshot spread premium, depth profile, null block-clock effect, maker diversity, category-conditional effective spreads, sub-50 ms median ingestion delay, low wash-trading share, and depth decay near resolution) and a central measurement result: trade direction inferred from the public order-book feed agrees with on-chain OrderFilled ground truth only ~59% of the time (panel mean 0.615, 95% CI [0.58, 0.65]). The authors conclude that microstructure work on Polymarket requires on-chain direction and release a replication package performing the join.

Significance. If the reported agreement rate and stylized facts hold, the paper supplies a large-scale, directly validated empirical baseline for decentralized prediction-market microstructure that is currently absent from the literature. The pre-registered panel, explicit join procedure, 30-billion-event scale, and released replication package are concrete strengths that lower the cost of follow-on work and allow direct falsification of the 59% benchmark.

minor comments (2)
  1. The abstract states that the wash-trading share is 'well below the network-classifier benchmarks of Cong et al. (2023)' but does not report the exact Cong et al. figure or the precise definition of 'self-counterparty wash' used in the join; a one-sentence clarification in §4 would remove ambiguity.
  2. Figure captions and table notes should explicitly state the exact number of markets and events underlying each panel statistic (e.g., 'N=600 markets, 1.2 billion matched trades') to allow readers to assess precision without returning to the text.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their supportive review, detailed summary of the paper's contributions, and recommendation to accept. We are pleased that the pre-registered design, data scale, join procedure, and replication package were highlighted as strengths that facilitate follow-on work.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper performs purely empirical measurements and reports descriptive stylized facts from raw data joins (WebSocket feed to on-chain OrderFilled events) on a pre-registered stratified panel. No equations, fitted parameters, derivations, or self-citations appear in the load-bearing claims; the key result (trade-direction agreement ~59%) is a direct empirical comparison to ground truth with reported CIs and released replication code. All eight stylized facts are data summaries without reduction to prior inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper's claims rest on the completeness and accuracy of the data sources and the representativeness of the market panel, with no new entities postulated.

axioms (2)
  • domain assumption The on-chain OrderFilled events provide the authoritative record of trades and their directions
    Used as ground truth to evaluate the public feed inference
  • domain assumption The WebSocket feed captures all public order book events without significant loss
    Basis for the 30 billion event archive

pith-pipeline@v0.9.0 · 5683 in / 1477 out tokens · 86487 ms · 2026-05-07T17:04:18.065096+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

19 extracted references · 3 canonical work pages · 1 internal anchor

  1. [1]

    High-frequency trading and price discovery

    Jonathan Brogaard, Terrence Hendershott, and Ryan Riordan. High-frequency trading and price discovery. Review of Financial Studies, 27 0 (8): 0 2267--2306, 2014

  2. [2]

    Crypto wash trading

    Lin William Cong, Xi Li, Ke Tang, and Yang Yang. Crypto wash trading. Management Science, 69 0 (11): 0 6427--6454, 2023

  3. [3]

    Philipp D. Dubach. Replication package: The anatomy of a decentralized prediction market, 2026. URL https://doi.org/10.5281/zenodo.19811426

  4. [4]

    The accuracy of trade classification rules: Evidence from nasdaq

    Katrina Ellis, Roni Michaely, and Maureen O'Hara. The accuracy of trade classification rules: Evidence from nasdaq. Journal of Financial and Quantitative Analysis, 35 0 (4): 0 529--551, 2000

  5. [5]

    Market Liquidity: Theory, Evidence, and Policy

    Thierry Foucault, Marco Pagano, and Ailsa R \"o ell. Market Liquidity: Theory, Evidence, and Policy. Oxford University Press, 2013

  6. [6]

    Glosten and Lawrence E

    Lawrence R. Glosten and Lawrence E. Harris. Estimating the components of the bid/ask spread. Journal of Financial Economics, 21 0 (1): 0 123--142, 1988

  7. [7]

    Logarithmic market scoring rules for modular combinatorial information aggregation

    Robin Hanson. Logarithmic market scoring rules for modular combinatorial information aggregation. Journal of Prediction Markets, 1 0 (1): 0 3--15, 2007

  8. [8]

    Empirical Market Microstructure: The Institutions, Economics, and Econometrics of Securities Trading

    Joel Hasbrouck. Empirical Market Microstructure: The Institutions, Economics, and Econometrics of Securities Trading. Oxford University Press, 2007

  9. [9]

    Huang and Hans R

    Roger D. Huang and Hans R. Stoll. The components of the bid-ask spread: A general approach. Review of Financial Studies, 10 0 (4): 0 995--1034, 1997

  10. [10]

    Charles M. C. Lee and Mark J. Ready. Inferring trade direction from intraday data. Journal of Finance, 46 0 (2): 0 733--746, 1991

  11. [11]

    Why do security prices change? a transaction-level analysis of NYSE stocks

    Ananth Madhavan, Matthew Richardson, and Mark Roomans. Why do security prices change? a transaction-level analysis of NYSE stocks. Review of Financial Studies, 10 0 (4): 0 1035--1064, 1997

  12. [12]

    Charles F. Manski. Interpreting the predictions of prediction markets. Economics Letters, 91 0 (3): 0 425--429, 2006

  13. [13]

    Market Microstructure Theory

    Maureen O'Hara. Market Microstructure Theory. Blackwell Publishing, 1995

  14. [14]

    Lionel Page and Robert T. Clemen. Do prediction markets produce well-calibrated probability forecasts? The Economic Journal, 123 0 (568): 0 491--513, 2013

  15. [15]

    SoK : Market microstructure for decentralized prediction markets ( DePMs )

    Nahid Rahman, Joseph Al-Chami, and Jeremy Clark. SoK : Market microstructure for decentralized prediction markets ( DePMs ). arXiv preprint arXiv:2510.15612, 2025. URL https://arxiv.org/abs/2510.15612

  16. [16]

    Explaining the favorite-long shot bias: Is it risk-love or misperceptions? Journal of Political Economy, 118 0 (4): 0 723--746, 2010

    Erik Snowberg and Justin Wolfers. Explaining the favorite-long shot bias: Is it risk-love or misperceptions? Journal of Political Economy, 118 0 (4): 0 723--746, 2010

  17. [17]

    Thaler and William T

    Richard H. Thaler and William T. Ziemba. Anomalies: Parimutuel betting markets: Racetracks and lotteries. Journal of Economic Perspectives, 2 0 (2): 0 161--174, 1988

  18. [18]

    The Anatomy of a Blockchain Prediction Market: Polymarket in the 2024 U.S. Presidential Election

    Kwok Ping Tsang and Zichao Yang. The anatomy of Polymarket : Evidence from the 2024 presidential election. arXiv preprint arXiv:2603.03136, 2026. URL https://arxiv.org/abs/2603.03136

  19. [19]

    Prediction markets

    Justin Wolfers and Eric Zitzewitz. Prediction markets. Journal of Economic Perspectives, 18 0 (2): 0 107--126, 2004