arXiv preprint arXiv:2402.18563 , year=

Halawi, D · 2024 · arXiv 2402.18563

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

Is Capability a Liability? More Capable Language Models Make Worse Forecasts When It Matters Most

cs.AI · 2026-05-21 · unverdicted · novelty 7.0 · 2 refs

More capable LLMs produce worse distributional forecasts on superlinear growth time series with tail risks of regime change, with the error concentrated in the upper tail; this reverses on conventional threshold metrics.

OracleProto: A Reproducible Framework for Benchmarking LLM Native Forecasting via Knowledge Cutoff and Temporal Masking

cs.AI · 2026-05-05 · conditional · novelty 7.0

OracleProto is a reproducible framework that uses model-cutoff alignment, temporal masking, and leakage detection to create low-leakage benchmarks for LLM native forecasting from past events.

Foresight Arena: An On-Chain Benchmark for Evaluating AI Forecasting Agents

cs.MA · 2026-05-01 · conditional · novelty 7.0

Foresight Arena is an on-chain benchmark using Brier and novel Alpha scores to evaluate AI forecasting agents on live prediction markets via Polygon smart contracts.

Coordination as an Architectural Layer for LLM-Based Multi-Agent Systems

cs.MA · 2026-05-05 · unverdicted · novelty 6.0

Coordination treated as a separable architectural layer in LLM multi-agent systems yields distinguishable Murphy-decomposed performance signatures on prediction-market tasks, with some configurations dominating a cost-quality Pareto frontier.

Agentic Forecasting using Sequential Bayesian Updating of Linguistic Beliefs

cs.AI · 2026-04-20 · unverdicted · novelty 6.0

BLF achieves state-of-the-art binary forecasting on ForecastBench by using linguistic belief states updated in tool-use loops, hierarchical multi-trial logit averaging, and hierarchical Platt scaling calibration.

Argumentative Large Language Models for Explainable and Contestable Claim Verification

cs.CL · 2024-05-03 · unverdicted · novelty 6.0

ArgLLMs build argumentation frameworks from LLMs to support explainable and contestable formal reasoning for claim verification.

Towards Effective Long-Video Event Prediction via Multi-Level Event Semantics Mining

cs.CV · 2026-05-29 · unverdicted · novelty 5.0

VISTA mines multi-level event semantics via visual prompts, knowledge-enhanced retrieval, and proposal integration to improve long-video event prediction over existing LVLMs.

Harnessing Pre-Resolution Signals for Future Prediction Agents

cs.AI · 2026-04-17 · unverdicted · novelty 5.0 · 2 refs

Milkyway uses pre-resolution signals from temporal contrasts in evolving evidence and repeated forecasts to evolve a harness and improve predictions before resolution, outperforming baselines on FutureX and FutureWorld.

Extrapolating Volition with Recursive Information Markets

cs.GT · 2026-04-08 · unverdicted · novelty 5.0

Recursive information markets with forgetful LLM buyers can align information prices with true value and extend to scalable oversight in AI alignment.

The Oracle's Fingerprint: Correlated AI Forecasting Errors and the Limits of Bias Transmission

cs.CY · 2026-04-07 · unverdicted · novelty 5.0

Three independent LLMs exhibit correlated forecasting errors on 568 binary questions but human predictions show no activation of this shared bias.

citing papers explorer

Showing 10 of 10 citing papers.

Is Capability a Liability? More Capable Language Models Make Worse Forecasts When It Matters Most cs.AI · 2026-05-21 · unverdicted · none · ref 3 · 2 links
More capable LLMs produce worse distributional forecasts on superlinear growth time series with tail risks of regime change, with the error concentrated in the upper tail; this reverses on conventional threshold metrics.
OracleProto: A Reproducible Framework for Benchmarking LLM Native Forecasting via Knowledge Cutoff and Temporal Masking cs.AI · 2026-05-05 · conditional · none · ref 4
OracleProto is a reproducible framework that uses model-cutoff alignment, temporal masking, and leakage detection to create low-leakage benchmarks for LLM native forecasting from past events.
Foresight Arena: An On-Chain Benchmark for Evaluating AI Forecasting Agents cs.MA · 2026-05-01 · conditional · none · ref 6
Foresight Arena is an on-chain benchmark using Brier and novel Alpha scores to evaluate AI forecasting agents on live prediction markets via Polygon smart contracts.
Coordination as an Architectural Layer for LLM-Based Multi-Agent Systems cs.MA · 2026-05-05 · unverdicted · none · ref 7
Coordination treated as a separable architectural layer in LLM multi-agent systems yields distinguishable Murphy-decomposed performance signatures on prediction-market tasks, with some configurations dominating a cost-quality Pareto frontier.
Agentic Forecasting using Sequential Bayesian Updating of Linguistic Beliefs cs.AI · 2026-04-20 · unverdicted · none · ref 17
BLF achieves state-of-the-art binary forecasting on ForecastBench by using linguistic belief states updated in tool-use loops, hierarchical multi-trial logit averaging, and hierarchical Platt scaling calibration.
Argumentative Large Language Models for Explainable and Contestable Claim Verification cs.CL · 2024-05-03 · unverdicted · none · ref 22
ArgLLMs build argumentation frameworks from LLMs to support explainable and contestable formal reasoning for claim verification.
Towards Effective Long-Video Event Prediction via Multi-Level Event Semantics Mining cs.CV · 2026-05-29 · unverdicted · none · ref 11
VISTA mines multi-level event semantics via visual prompts, knowledge-enhanced retrieval, and proposal integration to improve long-video event prediction over existing LVLMs.
Harnessing Pre-Resolution Signals for Future Prediction Agents cs.AI · 2026-04-17 · unverdicted · none · ref 3 · 2 links
Milkyway uses pre-resolution signals from temporal contrasts in evolving evidence and repeated forecasts to evolve a harness and improve predictions before resolution, outperforming baselines on FutureX and FutureWorld.
Extrapolating Volition with Recursive Information Markets cs.GT · 2026-04-08 · unverdicted · none · ref 20
Recursive information markets with forgetful LLM buyers can align information prices with true value and extend to scalable oversight in AI alignment.
The Oracle's Fingerprint: Correlated AI Forecasting Errors and the Limits of Bias Transmission cs.CY · 2026-04-07 · unverdicted · none · ref 4
Three independent LLMs exhibit correlated forecasting errors on 568 binary questions but human predictions show no activation of this shared bias.

arXiv preprint arXiv:2402.18563 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer