pith. machine review for the scientific record. sign in

arxiv: 2605.10400 · v1 · submitted 2026-05-11 · 💱 q-fin.TR · q-fin.GN· q-fin.RM

Recognition: 1 theorem link

Resolution-Aware Perpetual Futures on Binary Prediction Markets: An Empirical Risk-Design Framework Using Polymarket Data

Maksym Nechepurenko

Pith reviewed 2026-05-12 03:21 UTC · model grok-4.3

classification 💱 q-fin.TR q-fin.GNq-fin.RM
keywords perpetual futuresprediction marketsrisk designbinary eventsmargin requirementsfunding rateshalt protocolsPolymarket
0
0 comments X

The pith

A six-component resolution-aware framework for perpetual futures on binary prediction markets fails to validate for deployment but separates halt and margin roles.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops PIRAP, a resolution-aware risk-design framework for perpetual futures that track a single binary prediction-market probability through its resolution date. It defines six components—an index estimator blending mid-price, depth-weighted mid, and time-decayed VWAP; jump-aware tiered margin sized to terminal collapse; leverage compression toward resolution; resolution-aware funding with boundary correction; a multi-stage halt protocol; and an eligibility framework—to handle risks that standard perpetual designs cannot. Counterfactual replay on 13,298 Polymarket archives yields mixed results: stylized-fact floors on boundary depth and terminal-jump magnitude pass, yet most welfare and materiality floors fail, leading to an explicit non-deployable status. The record nevertheless isolates a scope distinction in which halts manage execution-channel risk while terminal-jump bad debt remains margin-side, and it documents a pre-emption trade-off that constrains the dynamic-margin component.

Core claim

Standard basis-only funding paired with continuous-volatility static margin is non-portable to bounded-event underlyings. The PIRAP framework, built from an index estimator, jump-aware tiered margin, leverage compression schedule, resolution-aware funding rule, multi-stage halt protocol, and eligibility framework, when counterfactually evaluated on Polymarket PMXT v2 data, passes pre-registered stylized-fact floors for boundary depth asymmetry and terminal-jump magnitude but fails the majority of welfare-side and materiality floors. This establishes that the framework does not validate for deployment while confirming a halt-versus-margin scope distinction and a pre-emption trade-off that nar

What carries the argument

The six-component PIRAP risk-design framework, which supplies an index estimator, jump-aware tiered margin, leverage compression, resolution-aware funding, multi-stage halt protocol, and eligibility rules to address resolution-specific risks in perpetual contracts on binary-event underlyings.

Load-bearing premise

The chosen pre-registered stylized-fact floors, welfare floors, and materiality floors constitute an adequate and unbiased test of whether the six-component framework improves risk outcomes on bounded-event underlyings.

What would settle it

Live deployment of the six components on Polymarket or equivalent data that measures whether final-hour liquidation rates fall by approximately 80 percent or terminal bad-debt frequency drops relative to baseline perpetual designs would settle the risk-improvement claim.

read the original abstract

We develop and counterfactually evaluate a resolution-aware risk-design framework (PIRAP) for perpetual futures whose underlying tracks a single binary prediction-market probability through resolution. The framework specifies six components: an index estimator combining mid-price, depth-weighted mid, and time-decayed VWAP; jump-aware tiered margin sized against bounded-event terminal-collapse magnitude; leverage compression schedule contracting toward resolution; resolution-aware funding rule with boundary-aware correction; a multi-stage halt protocol; and an eligibility framework. Two formal non-portability propositions establish that standard basis-only funding paired with continuous-vol static margin fails on bounded-event underlyings. Empirical evaluation uses Polymarket's PMXT v2 archive for 2026-04-21 to 2026-04-27 (13,298-market analysis sample passing adequacy gates from 61,087 ingested; 13,115 resolved within the empirical window for E3). E1 evaluates two pre-registered stylized facts; E2 conducts counterfactual replay across three engine configurations; E3 isolates the resolution-zone protocol's contribution. Results are mixed. Five pre-registered floors: stylized-fact floors (boundary depth asymmetry, terminal-jump magnitude) PASS; welfare-side directional floors (final-hour liquidation -6%, drawdown -5.1% pooled, median PnL +14%) two FAIL one PASS; E3 mechanic floors (final-hour liquidation -80% by halt construction PASS; bad-debt frequency +2.4% FAIL). Three of five materiality floors fail: the framework as specified does not validate deployment, but the empirical record establishes a halt-versus-margin scope distinction (halt addresses execution-channel risk; terminal-jump bad-debt remains margin-side) and documents a pre-emption trade-off constraining the dynamic-margin component. The paper concludes with structural recommendations and explicit non-deployable status.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper develops a resolution-aware risk-design framework (PIRAP) for perpetual futures on binary prediction-market probabilities, specifying six components (index estimator with mid-price/depth-weighted mid/time-decayed VWAP, jump-aware tiered margin, leverage compression, resolution-aware funding, multi-stage halt protocol, eligibility framework) plus two formal non-portability propositions. It counterfactually replays the framework on a 13,298-market Polymarket PMXT v2 sample (2026-04-21 to 2026-04-27), pre-registering five materiality floors across stylized-fact, welfare, and mechanic evaluations (E1–E3); results are mixed with three floors failing, leading to a non-deployable conclusion while claiming a halt-versus-margin scope distinction and pre-emption trade-off.

Significance. If the counterfactual results and pre-registered design hold under scrutiny, the work is significant for documenting why standard basis-only funding and static margin fail on bounded-event underlyings, for isolating the distinct roles of halt protocols (execution-channel risk) versus margin (terminal-jump bad debt), and for supplying explicit structural recommendations plus an empirical record from real prediction-market data. The pre-registered floors and formal propositions are strengths that support falsifiability.

major comments (2)
  1. [Abstract / E3] Abstract / E3: The conclusion that the framework 'does not validate deployment' rests on three of five pre-registered materiality floors failing (welfare directional floors at -6% liquidation, -5.1% drawdown, +14% median PnL; E3 bad-debt frequency at +2.4%). No derivation, sensitivity analysis, or justification is supplied for why these specific thresholds constitute an adequate and unbiased test of risk improvement on bounded-event underlyings, which is load-bearing for the non-deployability claim and the halt-versus-margin distinction.
  2. [E2] E2 counterfactual replay: The evaluation reports outcomes across three engine configurations but supplies no implementation detail on the free parameters (time-decay weight in the VWAP index estimator; jump-magnitude bound for tiered margin sizing). Without these, it is impossible to determine whether the mixed welfare and bad-debt results are robust or artifacts of the chosen parameter values.
minor comments (1)
  1. [Methods / Data] The abstract states that 13,298 markets passed adequacy gates from 61,087 ingested and that 13,115 resolved within the window; the methods section should expand on the exact gate criteria and any selection bias this introduces.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify areas where the manuscript can be strengthened. We respond to each major comment below and commit to revisions that address the concerns while preserving the core contributions of the pre-registered evaluation and the halt-versus-margin distinction.

read point-by-point responses
  1. Referee: [Abstract / E3] Abstract / E3: The conclusion that the framework 'does not validate deployment' rests on three of five pre-registered materiality floors failing (welfare directional floors at -6% liquidation, -5.1% drawdown, +14% median PnL; E3 bad-debt frequency at +2.4%). No derivation, sensitivity analysis, or justification is supplied for why these specific thresholds constitute an adequate and unbiased test of risk improvement on bounded-event underlyings, which is load-bearing for the non-deployability claim and the halt-versus-margin distinction.

    Authors: The materiality floors were pre-registered prior to any data analysis precisely to ensure an unbiased and falsifiable test of the framework. This pre-registration itself supplies the primary justification for treating the thresholds as an adequate benchmark, as they were fixed ex ante to capture economically material improvements in risk metrics for bounded-event underlyings. We nevertheless agree that the manuscript would be improved by an explicit derivation and sensitivity analysis. In revision we will add a dedicated subsection (and appendix) that (i) links each threshold to stylized facts from prediction-market literature and observed participation effects, and (ii) reports sensitivity checks under alternative threshold values (e.g., ±20 % shifts). These additions will reinforce rather than alter the non-deployability conclusion and the documented halt-versus-margin scope distinction. revision: yes

  2. Referee: [E2] E2 counterfactual replay: The evaluation reports outcomes across three engine configurations but supplies no implementation detail on the free parameters (time-decay weight in the VWAP index estimator; jump-magnitude bound for tiered margin sizing). Without these, it is impossible to determine whether the mixed welfare and bad-debt results are robust or artifacts of the chosen parameter values.

    Authors: We accept that the absence of explicit numerical values for the free parameters in the E2 section limits reproducibility and the ability to judge robustness. The manuscript describes the functional forms of the index estimator and tiered-margin rule but does not tabulate the concrete parameter settings used in the replay. In the revised version we will insert a complete parameter table in the empirical methods section, state the exact values employed for the time-decay weight and jump-magnitude bound, and supply pseudocode for both components. This will enable readers to replicate the three engine configurations and to assess whether the mixed welfare and bad-debt outcomes are sensitive to reasonable parameter perturbations. revision: yes

Circularity Check

0 steps flagged

Empirical evaluation on external archive with pre-registered floors shows no load-bearing circularity

full rationale

The paper's derivation chain consists of formal non-portability propositions followed by counterfactual replay on an external Polymarket archive (13,298-market sample) and pre-registered stylized-fact, welfare, and materiality floors. No equations reduce claimed performance metrics, the halt-versus-margin distinction, or the non-deployability conclusion to parameters fitted inside the paper or to self-citations whose validity depends on the present work. The evaluation is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

Only the abstract is available; the framework description implies several design choices whose concrete parameter values are not stated. The two formal propositions are treated as background results.

free parameters (2)
  • time-decay weight in VWAP index estimator
    Mentioned as part of the index estimator but no numerical value or fitting procedure given in abstract
  • jump-magnitude bound used for tiered margin sizing
    Framework sizes margins against bounded-event terminal-collapse magnitude; the specific bound is not reported
axioms (1)
  • domain assumption Standard basis-only funding paired with continuous-vol static margin fails on bounded-event underlyings
    Invoked to motivate the two formal non-portability propositions

pith-pipeline@v0.9.0 · 5646 in / 1560 out tokens · 60577 ms · 2026-05-12T03:21:44.604943+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · 1 internal anchor

  1. [1]

    The Signal Credibility Index for Prediction Markets: A Microstructure-Grounded Diagnostic with Weighted and Time-Varying Extensions

    Zenodo.doi: 10.5281/zenodo.20107449.url: https://doi.org/10.5281/zenodo. 20107449. — (2026g). “The Signal Credibility Index for Prediction Markets: A Microstructure-Grounded Diagnostic with Weighted and Time-Varying Extensions”. arXiv:2604.27041; SSRN:6676179. doi: 10.48550/arXiv.2604.27041. arXiv:2604.27041.url: https://arxiv.org/abs/ 2604.27041. Nimmaga...

  2. [2]

    Take the most recentbooksnapshot at or before the target timestamp

  3. [3]

    Applyevery price_changebetweenthatsnapshotandthetargettimestampin timestamp_received order

  4. [4]

    Result: the order book as the venue knew it at the target timestamp. A sanity-check spot test of this procedure on five markets is part of the G5 sample adequacy gate (Section 5.4); reconstructed best bid/ask must agree with feed best-bid/best-ask fields on at least 99% ofprice_changeevents. B.3 Polymarket Gamma API: market metadata Public REST API atgamm...

  5. [5]

    PMXT events are ingested byasset_id

  6. [6]

    Join is by exact-string matchasset_id=clobTokenIds[i]for some outcome indexi

    Gamma metadata is fetched for the uniqueasset_id set in the empirical window. Join is by exact-string matchasset_id=clobTokenIds[i]for some outcome indexi

  7. [7]

    OOv2 adapters are queried via Goldsky; MOOV2 adapters are queried via GammaoutcomePrices

    For each Gamma-resolved market, UMA OO records are fetched. OOv2 adapters are queried via Goldsky; MOOV2 adapters are queried via GammaoutcomePrices

  8. [8]

    B.7 Cleaning steps

    The combined record is persisted with theasset_id as primary key andoracle_source field tagging which adapter produced the resolution. B.7 Cleaning steps

  9. [9]

    Sort all events by(market, timestamp_received, event_seq) where event_seq is a within-file sequence number for tie-breaking

  10. [10]

    Drop exact duplicate events (same market, timestamp, sequence, payload)

  11. [11]

    Drop events with malformed numeric fields (rare; logged at WARNING level)

  12. [12]

    rank distinct markets bymin(first_event_timestamp) and take the firstN

    Forbook events with empty bids and asks, treat as market dormancy markers, not as data quality failures. The percentage of events dropped under each cleaning rule is reported per file in the production G5 output JSON. B.8 Diagnostic chronological-prefix subsamples The originally pre-registered analysis-sample selection rule (plan v1.2 §5.4) was: “rank dis...

  13. [13]

    Uses direct R2 endpoint enumeration per the rule in Section 5.3

    Download the PMXT v2 archive for the target empirical window (default 2026-04-21 to 2026-04-27, configurable viaTARGET_WEEK_START and TARGET_WEEK_END environment variables). Uses direct R2 endpoint enumeration per the rule in Section 5.3. Verified againstMANIFEST.sha256in stage 2

  14. [14]

    Failure aborts the pipeline

    Verify archive integrity by SHA-256 hash comparison against the committed manifest. Failure aborts the pipeline

  15. [15]

    Producestable_t_g5_*.json with class counts, gate floor evaluations, and consequence-rule status

    Run the Sample Adequacy Gate evaluation (Section 5.5). Producestable_t_g5_*.json with class counts, gate floor evaluations, and consequence-rule status. The same script runs against any subsample size and rule (chronological or stratified-by-day) via CLI arguments

  16. [16]

    Produces SF1–SF9 (Section 5.6) as JSON time-series and as paper tables

    Run E1 stylized-facts evaluation on the analysis sample. Produces SF1–SF9 (Section 5.6) as JSON time-series and as paper tables

  17. [17]

    Produces engine-comparison metrics and falsifiability- test outputs

    Run E2 counterfactual replay (E2a position-agnostic, E2b deterministic grid, E2c synthetic-trader robustness). Produces engine-comparison metrics and falsifiability- test outputs

  18. [18]

    Run E3 resolution-zone protocol comparison (Section 8.3)

  19. [19]

    Perps are Prediction Markets. Prediction Markets are Perps

    Build the paper PDF from latex source, with all empirical outputs from stages 3–6 referenced. Stagescanberunindividuallywith bash scripts/reproduce.sh –stage <N>. A–check-only mode verifies that expected artifacts exist without re-running. 1Please verify the canonical URL on the ForesightFlow organization page; the path may have been renamed since this pa...