Only two of seven LLMs produce positive returns on live Polymarket data, with MiMo-V2-Flash at 17.6% CWR and Gemini-3-Flash at 6.2% CWR while the other five lose money.
Title resolution pending
6 Pith papers cite this work. Polarity classification is still indexing.
years
2026 6representative citing papers
OracleProto is a reproducible framework that uses model-cutoff alignment, temporal masking, and leakage detection to create low-leakage benchmarks for LLM native forecasting from past events.
Foresight Arena is an on-chain benchmark using Brier and novel Alpha scores to evaluate AI forecasting agents on live prediction markets via Polygon smart contracts.
Energy-Arena is a dynamic, forward-looking benchmarking platform that standardizes ex-ante submissions and rolling ex-post evaluations for operational energy forecasting to improve transparency and comparability.
CT Open is a new live platform with an automated LLM-powered decontamination pipeline that supplies uncontaminated benchmarks for predicting clinical trial outcomes.
Coordination treated as a separable architectural layer in LLM multi-agent systems yields distinguishable Murphy-decomposed performance signatures on prediction-market tasks, with some configurations dominating a cost-quality Pareto frontier.
citing papers explorer
-
PolyBench: Benchmarking LLM Forecasting and Trading Capabilities on Live Prediction Market Data
Only two of seven LLMs produce positive returns on live Polymarket data, with MiMo-V2-Flash at 17.6% CWR and Gemini-3-Flash at 6.2% CWR while the other five lose money.
-
OracleProto: A Reproducible Framework for Benchmarking LLM Native Forecasting via Knowledge Cutoff and Temporal Masking
OracleProto is a reproducible framework that uses model-cutoff alignment, temporal masking, and leakage detection to create low-leakage benchmarks for LLM native forecasting from past events.
-
Foresight Arena: An On-Chain Benchmark for Evaluating AI Forecasting Agents
Foresight Arena is an on-chain benchmark using Brier and novel Alpha scores to evaluate AI forecasting agents on live prediction markets via Polygon smart contracts.
-
Energy-Arena: A Dynamic Benchmark for Operational Energy Forecasting
Energy-Arena is a dynamic, forward-looking benchmarking platform that standardizes ex-ante submissions and rolling ex-post evaluations for operational energy forecasting to improve transparency and comparability.
-
CT Open: An Open-Access, Uncontaminated, Live Platform for the Open Challenge of Clinical Trial Outcome Prediction
CT Open is a new live platform with an automated LLM-powered decontamination pipeline that supplies uncontaminated benchmarks for predicting clinical trial outcomes.
-
Coordination as an Architectural Layer for LLM-Based Multi-Agent Systems
Coordination treated as a separable architectural layer in LLM multi-agent systems yields distinguishable Murphy-decomposed performance signatures on prediction-market tasks, with some configurations dominating a cost-quality Pareto frontier.