arxiv: 2605.10400 · v1 · submitted 2026-05-11 · 💱 q-fin.TR · q-fin.GN· q-fin.RM

Recognition: 1 theorem link

Resolution-Aware Perpetual Futures on Binary Prediction Markets: An Empirical Risk-Design Framework Using Polymarket Data

Maksym Nechepurenko

Pith reviewed 2026-05-12 03:21 UTC · model grok-4.3

classification 💱 q-fin.TR q-fin.GNq-fin.RM

keywords perpetual futuresprediction marketsrisk designbinary eventsmargin requirementsfunding rateshalt protocolsPolymarket

0 comments

The pith

A six-component resolution-aware framework for perpetual futures on binary prediction markets fails to validate for deployment but separates halt and margin roles.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops PIRAP, a resolution-aware risk-design framework for perpetual futures that track a single binary prediction-market probability through its resolution date. It defines six components—an index estimator blending mid-price, depth-weighted mid, and time-decayed VWAP; jump-aware tiered margin sized to terminal collapse; leverage compression toward resolution; resolution-aware funding with boundary correction; a multi-stage halt protocol; and an eligibility framework—to handle risks that standard perpetual designs cannot. Counterfactual replay on 13,298 Polymarket archives yields mixed results: stylized-fact floors on boundary depth and terminal-jump magnitude pass, yet most welfare and materiality floors fail, leading to an explicit non-deployable status. The record nevertheless isolates a scope distinction in which halts manage execution-channel risk while terminal-jump bad debt remains margin-side, and it documents a pre-emption trade-off that constrains the dynamic-margin component.

Core claim

Standard basis-only funding paired with continuous-volatility static margin is non-portable to bounded-event underlyings. The PIRAP framework, built from an index estimator, jump-aware tiered margin, leverage compression schedule, resolution-aware funding rule, multi-stage halt protocol, and eligibility framework, when counterfactually evaluated on Polymarket PMXT v2 data, passes pre-registered stylized-fact floors for boundary depth asymmetry and terminal-jump magnitude but fails the majority of welfare-side and materiality floors. This establishes that the framework does not validate for deployment while confirming a halt-versus-margin scope distinction and a pre-emption trade-off that nar

What carries the argument

The six-component PIRAP risk-design framework, which supplies an index estimator, jump-aware tiered margin, leverage compression, resolution-aware funding, multi-stage halt protocol, and eligibility rules to address resolution-specific risks in perpetual contracts on binary-event underlyings.

Load-bearing premise

The chosen pre-registered stylized-fact floors, welfare floors, and materiality floors constitute an adequate and unbiased test of whether the six-component framework improves risk outcomes on bounded-event underlyings.

What would settle it

Live deployment of the six components on Polymarket or equivalent data that measures whether final-hour liquidation rates fall by approximately 80 percent or terminal bad-debt frequency drops relative to baseline perpetual designs would settle the risk-improvement claim.

read the original abstract

We develop and counterfactually evaluate a resolution-aware risk-design framework (PIRAP) for perpetual futures whose underlying tracks a single binary prediction-market probability through resolution. The framework specifies six components: an index estimator combining mid-price, depth-weighted mid, and time-decayed VWAP; jump-aware tiered margin sized against bounded-event terminal-collapse magnitude; leverage compression schedule contracting toward resolution; resolution-aware funding rule with boundary-aware correction; a multi-stage halt protocol; and an eligibility framework. Two formal non-portability propositions establish that standard basis-only funding paired with continuous-vol static margin fails on bounded-event underlyings. Empirical evaluation uses Polymarket's PMXT v2 archive for 2026-04-21 to 2026-04-27 (13,298-market analysis sample passing adequacy gates from 61,087 ingested; 13,115 resolved within the empirical window for E3). E1 evaluates two pre-registered stylized facts; E2 conducts counterfactual replay across three engine configurations; E3 isolates the resolution-zone protocol's contribution. Results are mixed. Five pre-registered floors: stylized-fact floors (boundary depth asymmetry, terminal-jump magnitude) PASS; welfare-side directional floors (final-hour liquidation -6%, drawdown -5.1% pooled, median PnL +14%) two FAIL one PASS; E3 mechanic floors (final-hour liquidation -80% by halt construction PASS; bad-debt frequency +2.4% FAIL). Three of five materiality floors fail: the framework as specified does not validate deployment, but the empirical record establishes a halt-versus-margin scope distinction (halt addresses execution-channel risk; terminal-jump bad-debt remains margin-side) and documents a pre-emption trade-off constraining the dynamic-margin component. The paper concludes with structural recommendations and explicit non-deployable status.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PIRAP adds resolution-specific tools for binary perpetuals and uses real Polymarket data to test them, but the pre-registered floors produce mixed results that leave the non-deployability conclusion dependent on unexamined threshold choices.

read the letter

The paper introduces PIRAP, a six-component framework for perpetual futures on single binary prediction-market probabilities. It combines a multi-source index estimator, jump-aware tiered margins, a leverage compression schedule, boundary-aware funding, a multi-stage halt protocol, and an eligibility layer. Two formal propositions show why standard basis funding and static margin rules break down on bounded-event underlyings. The empirical part replays three engine configurations on a 13,298-market slice of Polymarket data from one week in 2026, with pre-registered stylized-fact checks, welfare metrics, and an isolation of the halt component. Three of five materiality floors pass and two fail, leading the authors to state that the framework does not validate deployment while still documenting a halt-versus-margin distinction and a pre-emption trade-off in the dynamic-margin rule. The data work and the counterfactual design are the clearest strengths; the sample is large enough to run the tests and the pre-registration reduces some selection concerns. The main soft spot is the lack of justification or sensitivity analysis for the specific floors themselves. The welfare thresholds on liquidation, drawdown, and PnL, plus the bad-debt frequency target, are treated as decisive, yet the abstract gives no derivation showing they are the minimal or unbiased criteria for improvement on these underlyings. A narrow one-week window also leaves open whether the results generalize beyond that particular event cluster. The paper is aimed at platform operators and risk teams working on prediction-market derivatives rather than general derivatives theory. It has enough formal structure and reproducible data steps to deserve a serious referee, even if the evaluation criteria need tightening in revision.

Referee Report

2 major / 1 minor

Summary. The paper develops a resolution-aware risk-design framework (PIRAP) for perpetual futures on binary prediction-market probabilities, specifying six components (index estimator with mid-price/depth-weighted mid/time-decayed VWAP, jump-aware tiered margin, leverage compression, resolution-aware funding, multi-stage halt protocol, eligibility framework) plus two formal non-portability propositions. It counterfactually replays the framework on a 13,298-market Polymarket PMXT v2 sample (2026-04-21 to 2026-04-27), pre-registering five materiality floors across stylized-fact, welfare, and mechanic evaluations (E1–E3); results are mixed with three floors failing, leading to a non-deployable conclusion while claiming a halt-versus-margin scope distinction and pre-emption trade-off.

Significance. If the counterfactual results and pre-registered design hold under scrutiny, the work is significant for documenting why standard basis-only funding and static margin fail on bounded-event underlyings, for isolating the distinct roles of halt protocols (execution-channel risk) versus margin (terminal-jump bad debt), and for supplying explicit structural recommendations plus an empirical record from real prediction-market data. The pre-registered floors and formal propositions are strengths that support falsifiability.

major comments (2)

[Abstract / E3] Abstract / E3: The conclusion that the framework 'does not validate deployment' rests on three of five pre-registered materiality floors failing (welfare directional floors at -6% liquidation, -5.1% drawdown, +14% median PnL; E3 bad-debt frequency at +2.4%). No derivation, sensitivity analysis, or justification is supplied for why these specific thresholds constitute an adequate and unbiased test of risk improvement on bounded-event underlyings, which is load-bearing for the non-deployability claim and the halt-versus-margin distinction.
[E2] E2 counterfactual replay: The evaluation reports outcomes across three engine configurations but supplies no implementation detail on the free parameters (time-decay weight in the VWAP index estimator; jump-magnitude bound for tiered margin sizing). Without these, it is impossible to determine whether the mixed welfare and bad-debt results are robust or artifacts of the chosen parameter values.

minor comments (1)

[Methods / Data] The abstract states that 13,298 markets passed adequacy gates from 61,087 ingested and that 13,115 resolved within the window; the methods section should expand on the exact gate criteria and any selection bias this introduces.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify areas where the manuscript can be strengthened. We respond to each major comment below and commit to revisions that address the concerns while preserving the core contributions of the pre-registered evaluation and the halt-versus-margin distinction.

read point-by-point responses

Referee: [Abstract / E3] Abstract / E3: The conclusion that the framework 'does not validate deployment' rests on three of five pre-registered materiality floors failing (welfare directional floors at -6% liquidation, -5.1% drawdown, +14% median PnL; E3 bad-debt frequency at +2.4%). No derivation, sensitivity analysis, or justification is supplied for why these specific thresholds constitute an adequate and unbiased test of risk improvement on bounded-event underlyings, which is load-bearing for the non-deployability claim and the halt-versus-margin distinction.

Authors: The materiality floors were pre-registered prior to any data analysis precisely to ensure an unbiased and falsifiable test of the framework. This pre-registration itself supplies the primary justification for treating the thresholds as an adequate benchmark, as they were fixed ex ante to capture economically material improvements in risk metrics for bounded-event underlyings. We nevertheless agree that the manuscript would be improved by an explicit derivation and sensitivity analysis. In revision we will add a dedicated subsection (and appendix) that (i) links each threshold to stylized facts from prediction-market literature and observed participation effects, and (ii) reports sensitivity checks under alternative threshold values (e.g., ±20 % shifts). These additions will reinforce rather than alter the non-deployability conclusion and the documented halt-versus-margin scope distinction. revision: yes
Referee: [E2] E2 counterfactual replay: The evaluation reports outcomes across three engine configurations but supplies no implementation detail on the free parameters (time-decay weight in the VWAP index estimator; jump-magnitude bound for tiered margin sizing). Without these, it is impossible to determine whether the mixed welfare and bad-debt results are robust or artifacts of the chosen parameter values.

Authors: We accept that the absence of explicit numerical values for the free parameters in the E2 section limits reproducibility and the ability to judge robustness. The manuscript describes the functional forms of the index estimator and tiered-margin rule but does not tabulate the concrete parameter settings used in the replay. In the revised version we will insert a complete parameter table in the empirical methods section, state the exact values employed for the time-decay weight and jump-magnitude bound, and supply pseudocode for both components. This will enable readers to replicate the three engine configurations and to assess whether the mixed welfare and bad-debt outcomes are sensitive to reasonable parameter perturbations. revision: yes

Circularity Check

0 steps flagged

Empirical evaluation on external archive with pre-registered floors shows no load-bearing circularity

full rationale

The paper's derivation chain consists of formal non-portability propositions followed by counterfactual replay on an external Polymarket archive (13,298-market sample) and pre-registered stylized-fact, welfare, and materiality floors. No equations reduce claimed performance metrics, the halt-versus-margin distinction, or the non-deployability conclusion to parameters fitted inside the paper or to self-citations whose validity depends on the present work. The evaluation is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

Only the abstract is available; the framework description implies several design choices whose concrete parameter values are not stated. The two formal propositions are treated as background results.

free parameters (2)

time-decay weight in VWAP index estimator
Mentioned as part of the index estimator but no numerical value or fitting procedure given in abstract
jump-magnitude bound used for tiered margin sizing
Framework sizes margins against bounded-event terminal-collapse magnitude; the specific bound is not reported

axioms (1)

domain assumption Standard basis-only funding paired with continuous-vol static margin fails on bounded-event underlyings
Invoked to motivate the two formal non-portability propositions

pith-pipeline@v0.9.0 · 5646 in / 1560 out tokens · 60577 ms · 2026-05-12T03:21:44.604943+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · 1 internal anchor

[1]

The Signal Credibility Index for Prediction Markets: A Microstructure-Grounded Diagnostic with Weighted and Time-Varying Extensions

Zenodo.doi: 10.5281/zenodo.20107449.url: https://doi.org/10.5281/zenodo. 20107449. — (2026g). “The Signal Credibility Index for Prediction Markets: A Microstructure-Grounded Diagnostic with Weighted and Time-Varying Extensions”. arXiv:2604.27041; SSRN:6676179. doi: 10.48550/arXiv.2604.27041. arXiv:2604.27041.url: https://arxiv.org/abs/ 2604.27041. Nimmaga...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.5281/zenodo.20107449.url: 2019
[2]

Take the most recentbooksnapshot at or before the target timestamp

work page
[3]

Applyevery price_changebetweenthatsnapshotandthetargettimestampin timestamp_received order

work page
[4]

Result: the order book as the venue knew it at the target timestamp. A sanity-check spot test of this procedure on five markets is part of the G5 sample adequacy gate (Section 5.4); reconstructed best bid/ask must agree with feed best-bid/best-ask fields on at least 99% ofprice_changeevents. B.3 Polymarket Gamma API: market metadata Public REST API atgamm...

work page 2025
[5]

PMXT events are ingested byasset_id

work page
[6]

Join is by exact-string matchasset_id=clobTokenIds[i]for some outcome indexi

Gamma metadata is fetched for the uniqueasset_id set in the empirical window. Join is by exact-string matchasset_id=clobTokenIds[i]for some outcome indexi

work page
[7]

OOv2 adapters are queried via Goldsky; MOOV2 adapters are queried via GammaoutcomePrices

For each Gamma-resolved market, UMA OO records are fetched. OOv2 adapters are queried via Goldsky; MOOV2 adapters are queried via GammaoutcomePrices

work page
[8]

B.7 Cleaning steps

The combined record is persisted with theasset_id as primary key andoracle_source field tagging which adapter produced the resolution. B.7 Cleaning steps

work page
[9]

Sort all events by(market, timestamp_received, event_seq) where event_seq is a within-file sequence number for tie-breaking

work page
[10]

Drop exact duplicate events (same market, timestamp, sequence, payload)

work page
[11]

Drop events with malformed numeric fields (rare; logged at WARNING level)

work page
[12]

rank distinct markets bymin(first_event_timestamp) and take the firstN

Forbook events with empty bids and asks, treat as market dormancy markers, not as data quality failures. The percentage of events dropped under each cleaning rule is reported per file in the production G5 output JSON. B.8 Diagnostic chronological-prefix subsamples The originally pre-registered analysis-sample selection rule (plan v1.2 §5.4) was: “rank dis...

work page 2026
[13]

Uses direct R2 endpoint enumeration per the rule in Section 5.3

Download the PMXT v2 archive for the target empirical window (default 2026-04-21 to 2026-04-27, configurable viaTARGET_WEEK_START and TARGET_WEEK_END environment variables). Uses direct R2 endpoint enumeration per the rule in Section 5.3. Verified againstMANIFEST.sha256in stage 2

work page 2026
[14]

Failure aborts the pipeline

Verify archive integrity by SHA-256 hash comparison against the committed manifest. Failure aborts the pipeline

work page
[15]

Producestable_t_g5_*.json with class counts, gate floor evaluations, and consequence-rule status

Run the Sample Adequacy Gate evaluation (Section 5.5). Producestable_t_g5_*.json with class counts, gate floor evaluations, and consequence-rule status. The same script runs against any subsample size and rule (chronological or stratified-by-day) via CLI arguments

work page
[16]

Produces SF1–SF9 (Section 5.6) as JSON time-series and as paper tables

Run E1 stylized-facts evaluation on the analysis sample. Produces SF1–SF9 (Section 5.6) as JSON time-series and as paper tables

work page
[17]

Produces engine-comparison metrics and falsifiability- test outputs

Run E2 counterfactual replay (E2a position-agnostic, E2b deterministic grid, E2c synthetic-trader robustness). Produces engine-comparison metrics and falsifiability- test outputs

work page
[18]

Run E3 resolution-zone protocol comparison (Section 8.3)

work page
[19]

Perps are Prediction Markets. Prediction Markets are Perps

Build the paper PDF from latex source, with all empirical outputs from stages 3–6 referenced. Stagescanberunindividuallywith bash scripts/reproduce.sh –stage <N>. A–check-only mode verifies that expected artifacts exist without re-running. 1Please verify the canonical URL on the ForesightFlow organization page; the path may have been renamed since this pa...

work page doi:10.5281/zenodo.20107449)andmirroredongithubat 2026