Forecastbench: A dynamic benchmark of ai forecasting capabilities.arXiv preprint arXiv:2409.19839, 2024

Ezra Karger, Houtan Bastani, Chen Yueh-Han, Zachary Jacobs, Danny Halawi, Fred Zhang, Philip E Tetlock · 2024 · arXiv 2409.19839

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

PolyBench: Benchmarking LLM Forecasting and Trading Capabilities on Live Prediction Market Data

q-fin.CP · 2026-04-03 · conditional · novelty 8.0

Only two of seven LLMs produce positive returns on live Polymarket data, with MiMo-V2-Flash at 17.6% CWR and Gemini-3-Flash at 6.2% CWR while the other five lose money.

StakeBench: Evaluating Language Understanding Grounded in Market Commitment

cs.CL · 2026-05-25 · unverdicted · novelty 7.0

StakeBench is a new benchmark using market-derived supervision from resolved prediction markets to test LLMs on commitment detection, side identification, action anticipation, and odds projection, revealing partial success on sides but structural failures on higher tasks.

OracleProto: A Reproducible Framework for Benchmarking LLM Native Forecasting via Knowledge Cutoff and Temporal Masking

cs.AI · 2026-05-05 · conditional · novelty 7.0

OracleProto is a reproducible framework that uses model-cutoff alignment, temporal masking, and leakage detection to create low-leakage benchmarks for LLM native forecasting from past events.

Foresight Arena: An On-Chain Benchmark for Evaluating AI Forecasting Agents

cs.MA · 2026-05-01 · conditional · novelty 7.0

Foresight Arena is an on-chain benchmark using Brier and novel Alpha scores to evaluate AI forecasting agents on live prediction markets via Polygon smart contracts.

Energy-Arena: A Dynamic Benchmark for Operational Energy Forecasting

econ.EM · 2026-04-27 · unverdicted · novelty 7.0

Energy-Arena is a dynamic, forward-looking benchmarking platform that standardizes ex-ante submissions and rolling ex-post evaluations for operational energy forecasting to improve transparency and comparability.

CT Open: An Open-Access, Uncontaminated, Live Platform for the Open Challenge of Clinical Trial Outcome Prediction

cs.AI · 2026-04-17 · accept · novelty 7.0

CT Open is a new live platform with an automated LLM-powered decontamination pipeline that supplies uncontaminated benchmarks for predicting clinical trial outcomes.

Coordination as an Architectural Layer for LLM-Based Multi-Agent Systems

cs.MA · 2026-05-05 · unverdicted · novelty 6.0

Coordination treated as a separable architectural layer in LLM multi-agent systems yields distinguishable Murphy-decomposed performance signatures on prediction-market tasks, with some configurations dominating a cost-quality Pareto frontier.

citing papers explorer

Showing 1 of 1 citing paper after filters.

PolyBench: Benchmarking LLM Forecasting and Trading Capabilities on Live Prediction Market Data q-fin.CP · 2026-04-03 · conditional · none · ref 14
Only two of seven LLMs produce positive returns on live Polymarket data, with MiMo-V2-Flash at 17.6% CWR and Gemini-3-Flash at 6.2% CWR while the other five lose money.

Forecastbench: A dynamic benchmark of ai forecasting capabilities.arXiv preprint arXiv:2409.19839, 2024

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer