pith. sign in

Forecastbench: A dynamic benchmark of ai forecasting capabilities.arXiv preprint arXiv:2409.19839, 2024

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

years

2026 7

roles

background 1

polarities

background 1

representative citing papers

StakeBench: Evaluating Language Understanding Grounded in Market Commitment

cs.CL · 2026-05-25 · unverdicted · novelty 7.0

StakeBench is a new benchmark using market-derived supervision from resolved prediction markets to test LLMs on commitment detection, side identification, action anticipation, and odds projection, revealing partial success on sides but structural failures on higher tasks.

Coordination as an Architectural Layer for LLM-Based Multi-Agent Systems

cs.MA · 2026-05-05 · unverdicted · novelty 6.0

Coordination treated as a separable architectural layer in LLM multi-agent systems yields distinguishable Murphy-decomposed performance signatures on prediction-market tasks, with some configurations dominating a cost-quality Pareto frontier.

citing papers explorer

Showing 7 of 7 citing papers.