OracleProto is a reproducible framework that uses model-cutoff alignment, temporal masking, and leakage detection to create low-leakage benchmarks for LLM native forecasting from past events.
Title resolution pending
7 Pith papers cite this work. Polarity classification is still indexing.
years
2026 7representative citing papers
Foresight Arena is an on-chain benchmark using Brier and novel Alpha scores to evaluate AI forecasting agents on live prediction markets via Polygon smart contracts.
Coordination treated as a separable architectural layer in LLM multi-agent systems yields distinguishable Murphy-decomposed performance signatures on prediction-market tasks, with some configurations dominating a cost-quality Pareto frontier.
BLF achieves state-of-the-art binary forecasting on ForecastBench by using linguistic belief states updated in tool-use loops, hierarchical multi-trial logit averaging, and hierarchical Platt scaling calibration.
Milkyway uses pre-resolution signals from temporal contrasts in evolving evidence and repeated forecasts to evolve a harness and improve predictions before resolution, outperforming baselines on FutureX and FutureWorld.
Recursive information markets with forgetful LLM buyers can align information prices with true value and extend to scalable oversight in AI alignment.
Three independent LLMs exhibit correlated forecasting errors on 568 binary questions but human predictions show no activation of this shared bias.
citing papers explorer
-
OracleProto: A Reproducible Framework for Benchmarking LLM Native Forecasting via Knowledge Cutoff and Temporal Masking
OracleProto is a reproducible framework that uses model-cutoff alignment, temporal masking, and leakage detection to create low-leakage benchmarks for LLM native forecasting from past events.
-
Foresight Arena: An On-Chain Benchmark for Evaluating AI Forecasting Agents
Foresight Arena is an on-chain benchmark using Brier and novel Alpha scores to evaluate AI forecasting agents on live prediction markets via Polygon smart contracts.
-
Coordination as an Architectural Layer for LLM-Based Multi-Agent Systems
Coordination treated as a separable architectural layer in LLM multi-agent systems yields distinguishable Murphy-decomposed performance signatures on prediction-market tasks, with some configurations dominating a cost-quality Pareto frontier.
-
Agentic Forecasting using Sequential Bayesian Updating of Linguistic Beliefs
BLF achieves state-of-the-art binary forecasting on ForecastBench by using linguistic belief states updated in tool-use loops, hierarchical multi-trial logit averaging, and hierarchical Platt scaling calibration.
-
Harnessing Pre-Resolution Signals for Future Prediction Agents
Milkyway uses pre-resolution signals from temporal contrasts in evolving evidence and repeated forecasts to evolve a harness and improve predictions before resolution, outperforming baselines on FutureX and FutureWorld.
-
Extrapolating Volition with Recursive Information Markets
Recursive information markets with forgetful LLM buyers can align information prices with true value and extend to scalable oversight in AI alignment.
-
The Oracle's Fingerprint: Correlated AI Forecasting Errors and the Limits of Bias Transmission
Three independent LLMs exhibit correlated forecasting errors on 568 binary questions but human predictions show no activation of this shared bias.