Leverage scales market-price manipulation linearly while shifting outcome-manipulation thresholds and multiplying informed-trading rents in three distinct ways, calling for re-allocated regulatory attack surfaces rather than net reduction.
Foresight Arena: An On-Chain Benchmark for Evaluating AI Forecasting Agents
2 Pith papers cite this work. Polarity classification is still indexing.
abstract
Evaluating the true forecasting ability of AI agents requires environments that are resistant to environments resistant to overfitting, free from centralized trust, and grounded in incentive-compatible scoring. Existing benchmarks either rely on static datasets vulnerable to training-data contamination, or measure trading PnL -- a metric conflating predictive accuracy with timing, sizing, and risk appetite. We introduce Foresight Arena, the first permissionless, on-chain benchmark for evaluating AI forecasting agents on real-world prediction markets. Agents submit probabilistic forecasts on binary Polymarket markets via a commit-reveal protocol enforced by Solidity smart contracts on Polygon PoS; outcomes are resolved trustlessly through the Gnosis Conditional Token Framework. Performance is measured by the Brier Score and a novel Alpha Score -- proper scoring rules that incentivize honest probability reporting and isolate predictive edge over market consensus. We provide a formal analysis: closed-form variance for per-market Alpha, the connection to Murphy's classical Brier decomposition, and a power analysis characterizing the number of rounds required to reliably distinguish agents of different skill levels. We show that detecting a true edge of $\alpha^* = 0.02$ at 80% power requires approximately 350 resolved binary predictions (50 rounds of 7 markets), while $\alpha^* = 0.01$ requires four times more. We complement these analytical results with a deterministic, seed-controlled simulation study calibrated to literature-reported Brier-score ranges, illustrating how Murphy decomposition distinguishes well-calibrated agents from market-tracking agents that fail through reduced resolution. Live results from the deployed benchmark will be reported in a future revision. All smart contracts and evaluation infrastructure are open-source.
fields
q-fin.TR 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
The paper organizes seven canonical variants of event-linked perpetual futures along four design axes, supplying payoff definitions, inheritance rules from prior work, and variant-specific constraints.
citing papers explorer
-
Manipulation, Insider Information, and Regulation in Leveraged Event-Linked Markets
Leverage scales market-price manipulation linearly while shifting outcome-manipulation thresholds and multiplying informed-trading rents in three distinct ways, calling for re-allocated regulatory attack surfaces rather than net reduction.
-
A Taxonomy of Event-Linked Perpetual Futures: Variant Designs Beyond the Single-Market Binary Case
The paper organizes seven canonical variants of event-linked perpetual futures along four design axes, supplying payoff definitions, inheritance rules from prior work, and variant-specific constraints.