pith. machine review for the scientific record. sign in

arxiv: 2605.11220 · v1 · submitted 2026-05-11 · 📊 stat.AP

Recognition: 2 theorem links

· Lean Theorem

Prediction Markets Underperform Simple Baselines For Infectious Disease Forecasting

Carson Dudley, Reiden Magdaleno

Pith reviewed 2026-05-13 01:01 UTC · model grok-4.3

classification 📊 stat.AP
keywords prediction marketsinfectious disease forecastinginfluenza hospitalizationsmeasles casesFluSight ensembleforecast evaluationensemble methodsstatistical baselines
0
0 comments X

The pith

Prediction markets fail to outperform statistical baselines when forecasting flu hospitalizations and measles cases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether real-money prediction markets can produce accurate forecasts of infectious disease trends by aggregating participant bets. It tests Polymarket prices against the CDC FluSight ensemble for weekly US influenza hospitalizations and against simple statistical models for monthly measles cases. Markets place probability on impossible outcomes such as declining cumulative counts and suffer from low trading volume, leading to worse accuracy than the benchmarks. A reader would care because disease forecasts guide public health responses, and markets could in principle supply fast, incentive-driven predictions without relying on expert pipelines. The findings indicate that current market designs do not yet deliver reliable signals for these applications.

Core claim

The central claim is that Polymarket forecasts for cumulative influenza hospitalizations are competitive only with the weakest individual FluSight models yet are strictly dominated by the FluSight ensemble, with optimal linear combinations assigning zero weight to the market component; for measles cases, the same markets are outperformed by elementary statistical baselines. Two concrete sources of inefficiency are identified: assignment of positive probability to impossible paths and insufficient trading volume that prevents prices from reflecting available information.

What carries the argument

Direct comparison of market-implied probability distributions against the FluSight ensemble and simple statistical baselines, with explicit checks for probability mass on impossible outcomes and assessment of trading volume as a performance limiter.

If this is right

  • The FluSight ensemble remains the dominant method for influenza hospitalization forecasts, and market data adds no value even when combined with it.
  • Simple statistical baselines outperform markets for measles case counts.
  • Market prices currently assign probability to impossible events such as negative changes in cumulative totals.
  • Low trading volume limits the ability of markets to aggregate useful information for disease dynamics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Contract specifications for cumulative quantities may need redesign to prevent impossible probability assignments in other trend-forecasting domains.
  • Public health systems should continue to rely on curated statistical ensembles rather than incorporating current market prices as inputs.
  • If markets are to become useful for epidemiology, experiments that increase volume or improve contract clarity could be tested on future outbreaks.

Load-bearing premise

That the FluSight ensemble and the chosen statistical models are the strongest available benchmarks, and that market prices reflect informed collective judgment rather than noise from low participation or contract design flaws.

What would settle it

Demonstrating that a redesigned market with corrected cumulative contracts and substantially higher volume produces forecasts that match or exceed the FluSight ensemble on held-out influenza data would falsify the claim of inherent underperformance.

Figures

Figures reproduced from arXiv: 2605.11220 by Carson Dudley, Reiden Magdaleno.

Figure 1
Figure 1. Figure 1: Prediction market performance for weekly cumulative U.S. influenza hospitalizations, 2025–2026 [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
read the original abstract

Prediction markets (e.g., Polymarket, Kalshi) allow participants to bet on future events, producing real-time forecasts based on collective judgment. In domains such as elections and finance, markets have been effective at aggregating information, often rivaling or outperforming expert forecasters or polls. Whether this performance extends to infectious disease dynamics is unclear. Participants are self-selected and typically lack epidemiological expertise. However, markets can respond in real time to emerging news and unstructured signals in ways that standard forecasting pipelines cannot. Also, substantial financial stakes encourage participants to make an effort to be accurate. We evaluate Polymarket forecasts during 2025 and 2026 for two settings: weekly cumulative influenza hospitalizations in the US, which have an established expert-curated forecasting ensemble (CDC FluSight), and monthly measles cases, which do not. Across both settings, prediction markets fail to outperform standard benchmarks. For influenza, markets are competitive with low-performing individual FluSight models but are dominated by the FluSight ensemble: even when we combine market forecasts with the ensemble, the best combination puts zero weight on the markets. For measles, markets are outperformed by simple statistical baselines. We diagnose two sources of market inefficiency: placement of probability mass on impossible outcomes (e.g., decreasing values in cumulative forecasts) and low trading volume. These results suggest that current prediction markets are not reliable forecasters of infectious disease dynamics on their own or useful as complementary features for existing forecasting systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript evaluates Polymarket prediction market forecasts for weekly cumulative US influenza hospitalizations (compared to the CDC FluSight ensemble) and monthly measles cases (compared to simple statistical baselines). It reports that markets are competitive only with weak individual FluSight models but are dominated by the ensemble (with optimal linear combinations assigning zero weight to market forecasts) and are outperformed by baselines for measles. Two mechanisms are diagnosed: assignment of probability mass to impossible outcomes (e.g., decreasing cumulatives) and low trading volume. The central claim is that current prediction markets are not reliable for infectious disease forecasting on their own or as complements to existing systems.

Significance. If the empirical comparisons hold, the work supplies concrete evidence that prediction markets have not yet succeeded in a high-stakes public-health domain where they might have been expected to aggregate real-time signals effectively. The explicit diagnosis of failure modes (invalid support and low liquidity) is actionable for market design and for forecasters considering hybrid systems. The zero-weight result in the influenza combination exercise is particularly informative, as it quantifies the lack of incremental value.

major comments (2)
  1. [§4] §4 (influenza results): the claim that the optimal combination places zero weight on market forecasts is load-bearing for the 'not useful as complementary features' conclusion. The optimization procedure, loss function (e.g., log score vs. MAE), and cross-validation scheme used to obtain the weights must be stated explicitly so that readers can assess whether the zero-weight outcome is robust to reasonable alternatives.
  2. [§3.2] §3.2 and §5 (market probability extraction): the diagnosis that markets place mass on impossible outcomes (decreasing cumulatives) is central to explaining underperformance. The precise mapping from observed market prices to probability distributions over cumulative trajectories, including any smoothing or normalization steps, should be documented with an example for at least one forecast date.
minor comments (3)
  1. The measles baseline models are described only as 'simple statistical baselines'; a short appendix or paragraph listing their exact specifications (e.g., ARIMA order, exponential smoothing parameters) would allow direct replication.
  2. Figure captions should state the exact evaluation periods (start and end dates) and the number of forecast targets evaluated, rather than relying solely on the main text.
  3. A brief discussion of contract resolution rules for the cumulative hospitalization markets would help readers understand why decreasing trajectories are impossible.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful review and constructive comments, which have helped us improve the clarity and reproducibility of the manuscript. We address each major comment below and have revised the paper to incorporate the requested methodological details.

read point-by-point responses
  1. Referee: [§4] §4 (influenza results): the claim that the optimal combination places zero weight on market forecasts is load-bearing for the 'not useful as complementary features' conclusion. The optimization procedure, loss function (e.g., log score vs. MAE), and cross-validation scheme used to obtain the weights must be stated explicitly so that readers can assess whether the zero-weight outcome is robust to reasonable alternatives.

    Authors: We agree that explicit documentation of the combination procedure is essential for evaluating the robustness of the zero-weight result. In the revised manuscript we have expanded §4 to fully describe the optimization procedure (a constrained minimization of forecast error over a rolling historical window), the loss function used to obtain the weights, and the cross-validation scheme. We also report that the zero-weight assignment to market forecasts remains unchanged under reasonable alternative loss functions and validation approaches. revision: yes

  2. Referee: [§3.2] §3.2 and §5 (market probability extraction): the diagnosis that markets place mass on impossible outcomes (decreasing cumulatives) is central to explaining underperformance. The precise mapping from observed market prices to probability distributions over cumulative trajectories, including any smoothing or normalization steps, should be documented with an example for at least one forecast date.

    Authors: We appreciate the request for greater transparency on this point. We have revised §3.2 to provide a precise, step-by-step account of how observed market prices are mapped to probability distributions over cumulative trajectories, including the normalization and any smoothing applied. The revision also includes a concrete worked example for one forecast date, showing the raw prices, the resulting distribution, and the probability mass assigned to impossible outcomes. revision: yes

Circularity Check

0 steps flagged

No significant circularity; purely empirical evaluation

full rationale

The paper conducts a direct empirical head-to-head comparison of Polymarket forecasts against external benchmarks (FluSight ensemble for influenza, simple statistical baselines for measles) using standard performance metrics. No mathematical derivations, fitted parameters, or self-citations are used to generate the central claims. The diagnoses of market inefficiencies (impossible outcomes and low volume) are observational and do not rely on any self-referential construction or ansatz. The work is self-contained against external data sources and does not reduce any result to its own inputs by definition.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The evaluation rests on the assumption that the selected baselines are appropriate comparators and that market prices can be treated as probability forecasts without adjustment for liquidity or contract design.

axioms (2)
  • domain assumption Market prices can be interpreted directly as probability forecasts for the defined outcomes.
    Invoked when comparing market forecasts to statistical models without liquidity or bias corrections.
  • domain assumption The FluSight ensemble and simple statistical baselines represent the relevant performance standards for these tasks.
    Used to declare market underperformance.

pith-pipeline@v0.9.0 · 5553 in / 1203 out tokens · 37173 ms · 2026-05-13T01:01:25.227728+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 1 internal anchor

  1. [1]

    Mechanistic models of covid-19: Insights into disease progression, vaccines, and therapeutics.International Journal of Antimicrobial Agents, 60(1):106606, 2022

    Rajat Desikan, Pranesh Padmanabhan, Andrzej M Kierzek, and Piet H van der Graaf. Mechanistic models of covid-19: Insights into disease progression, vaccines, and therapeutics.International Journal of Antimicrobial Agents, 60(1):106606, 2022. Epub 2022 May 16

  2. [2]

    Modeling covid-19 scenarios for the united states.Nature Medicine, 27:94–105, 2021

    IHME COVID-19 Forecasting Team. Modeling covid-19 scenarios for the united states.Nature Medicine, 27:94–105, 2021. Published online 23 October 2020

  3. [3]

    Deepgleam: A hybrid mechanistic and deep learning model for covid-19 forecasting, 2021

    Dongxia Wu, Liyao Gao, Xinyue Xiong, Matteo Chinazzi, Alessandro Vespignani, Yi-An Ma, and Rose Yu. Deepgleam: A hybrid mechanistic and deep learning model for covid-19 forecasting, 2021

  4. [4]

    Deepcovid: An operational deep learning-driven framework for explain- able real-time covid-19 forecasting

    Alexander Rodriguez et al. Deepcovid: An operational deep learning-driven framework for explain- able real-time covid-19 forecasting. InProceedings of the AAAI Conference on Artificial Intelligence, volume 35, 2021

  5. [5]

    Mantis: A Foundation Model for Mechanistic Disease Forecasting

    Carson Dudley, Reiden Magdaleno, Christopher Harding, Ananya Sharma, and Marisa Eisenberg. Man- tis: A foundation model for mechanistic disease forecasting.arXiv preprint arXiv:2508.12260, 2025

  6. [6]

    Mathis et al

    Sarabeth M. Mathis et al. Evaluation of FluSight influenza forecasting in the 2021–22 and 2022–23 seasons with a new target laboratory-confirmed influenza hospitalizations.Nature Communications, 15, July 2024

  7. [7]

    The united states covid-19 forecast hub dataset.Scientific Data, 2022

    Estee Y Cramer et al. The united states covid-19 forecast hub dataset.Scientific Data, 2022

  8. [8]

    Oidtman et al

    Rachel J. Oidtman et al. Trade-offs between individual and ensemble forecasts of an emerging infectious disease.Nature Communications, 12:5379, September 2021

  9. [9]

    Cramer et al

    Estee Y. Cramer et al. Evaluation of individual and ensemble probabilistic forecasts of covid-19 mortality in the united states.Proceedings of the National Academy of Sciences, 119(15):e2113561119, 2022

  10. [10]

    Us rsv forecast hub.https://rsvforecasthub.org/, 2025

    US RSV Forecast Hub Contributors. Us rsv forecast hub.https://rsvforecasthub.org/, 2025. Accessed: 2025-09-10. Updated 2025-04-09

  11. [11]

    Not all accuracy is equal: Prioritizing independence in infectious disease forecasting.arXiv preprint arXiv:2509.21191, 2025

    Carson Dudley and Marisa Eisenberg. Not all accuracy is equal: Prioritizing independence in infectious disease forecasting.arXiv preprint arXiv:2509.21191, 2025

  12. [12]

    Kalshi.https://kalshi.com/, 2026

    Kalshi. Kalshi.https://kalshi.com/, 2026

  13. [13]

    Polymarket.https://polymarket.com/, 2026

    Polymarket. Polymarket.https://polymarket.com/, 2026. 6

  14. [14]

    Accuracy and forecast standard error of prediction markets

    Joyce Berg, Forrest Nelson, and Thomas Rietz. Accuracy and forecast standard error of prediction markets. Working draft, Henry B. Tippie College of Business Administration, University of Iowa, July 2003

  15. [15]

    Prediction markets for economic forecasting

    Erik Snowberg, Justin Wolfers, and Eric Zitzewitz. Prediction markets for economic forecasting. Work- ing paper, Brookings Institution, June 2012. Prepared for The Handbook of Economic Forecasting, Volume 2

  16. [16]

    Berg, Forrest D

    Joyce E. Berg, Forrest D. Nelson, and Thomas A. Rietz. Prediction market accuracy in the long run. International Journal of Forecasting, 24(2):285–300, 2008

  17. [17]

    Alissa O’Halloran et al. Influenza-associated hospitalizations during a high severity season — influenza hospitalization surveillance network, united states, 2024–25 influenza season.Morbidity and Mortality Weekly Report (MMWR), 74(34):529–537, September 2025

  18. [18]

    Measles cases and outbreaks.https://www.cdc.gov/ measles/data-research/index.html, 2026

    Centers for Disease Control and Prevention. Measles cases and outbreaks.https://www.cdc.gov/ measles/data-research/index.html, 2026

  19. [19]

    Ray, Tilmann Gneiting, and Nicholas G

    Johannes Bracher, Evan L. Ray, Tilmann Gneiting, and Nicholas G. Reich. Evaluating epidemic forecasts in an interval format.PLOS Computational Biology, 17(2):e1008618, 2021

  20. [20]

    H. Akaike. A new look at the statistical model identification.IEEE Transactions on Automatic Control, 19(6):716–723, 1974

  21. [21]

    Prediction markets as a public health threat.Science, 392(6795), 2026

    Nizan Geslevich Packin and Sharon Rabinovitz. Prediction markets as a public health threat.Science, 392(6795), 2026. 7