pith. sign in

arxiv: 2605.22676 · v1 · pith:QQLKCPQHnew · submitted 2026-05-21 · 📊 stat.AP

Comparison of probabilistic nowcasts and forecasts of SARS-CoV-2 variant proportions made by hierarchical multinomial linear regression models

Pith reviewed 2026-05-22 03:41 UTC · model grok-4.3

classification 📊 stat.AP
keywords SARS-CoV-2 variantsnowcastingforecastinghierarchical multinomial logistic regressionenergy scoreBrier scorevariant proportionsprobabilistic forecasting
0
0 comments X

The pith

Hierarchical multinomial logistic regression models outperform a baseline in nowcasting and forecasting SARS-CoV-2 variant proportions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether hierarchical multinomial logistic regression models can improve predictions of the proportions of different SARS-CoV-2 variants circulating at a given time. These models allow information to be shared across locations that have different amounts of data and possibly different trends. Using two years of retrospective weekly data from the US SARS-CoV-2 Variant Nowcast Hub, the authors compare 12 such models to a baseline and find consistent outperformance on both energy score for probabilistic accuracy and Brier score for point accuracy. This matters for public health because better variant forecasts can inform decisions on vaccines, treatments, and surveillance priorities.

Core claim

The central claim is that HMLR models outperform the baseline both in terms of probabilistic accuracy, as measured by the energy score, as well as point accuracy, as measured by the Brier score, when making nowcasts and forecasts of SARS-CoV-2 variant proportions. The models perform best with respect to the baseline in locations with more data, and more complex models show more improvement in those high-data locations, while simpler models perform better in low-data locations.

What carries the argument

Hierarchical multinomial logistic regression (HMLR), which pools data across locations to account for varying sample sizes and trends in variant surveillance.

If this is right

  • HMLR models achieve better accuracy than the baseline particularly in locations with abundant data.
  • More complex versions of the HMLR models deliver greater gains in high-data settings.
  • Simpler HMLR models are preferable for locations with limited data.
  • There is no single best HMLR model that wins across all evaluation metrics and locations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Public health agencies could select model complexity based on the volume of local sequencing data available.
  • The approach may generalize to nowcasting variants of other pathogens if similar hierarchical structures are used.
  • Future work could examine how these models handle sudden changes in reporting practices or new variant emergence.
  • Integration into real-time systems would need to account for any additional delays not present in the retrospective tests.

Load-bearing premise

Retrospective datasets from the US SARS-CoV-2 Variant Nowcast Hub accurately represent the data-generating process and reporting delays that will occur in future real-time nowcasting.

What would settle it

In future real-time nowcasting after the study period, the HMLR models fail to achieve lower energy scores and Brier scores than the baseline across multiple locations and time periods.

Figures

Figures reproduced from arXiv: 2605.22676 by Benjamin W. Rogers, Evan L. Ray, Isaac MacArthur, Maryclare Griffin, Nicholas G. Reich, Thomas Robacker.

Figure 1
Figure 1. Figure 1: Number of sequences per week for each of the 50 states, Puerto Rico, and Washington [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: A plot showing the proportions of the clades 22E and 23A, and, for illustrative purposes, all [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: A plot showing the proportions of the clades at the national level; all clades that ever reached [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Forecasts and nowcasts for a subset of four HMLR models at a single location (Georgia) [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Forecasts and nowcasts for Georgia for the submission date of 2023-01-11. Solid lines [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Median and range of log relative energy score of HMLR models versus the MLR baseline [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: A heatmap of log relative mean energy scores by submission date with the best-scoring model [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: A heatmap showing the relative energy scores across locations, averaged over submission date [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗
read the original abstract

Nowcasting and forecasting of infectious diseases have become increasingly important since the SARS-CoV-2 pandemic. In particular, methods for modeling the composition of circulating variants at a given time have seen more use in part due to a large increase in the frequency of genomic sequencing conducted as a part of routine surveillance. However, methods must take into account that locations have different amounts of data and sometimes have different trends. We discuss hierarchical multinomial logistic regression (HMLR), a commonly used method for forecasting SARS-CoV-2 variants, which allows for data sharing across locations. We show how it has been used in the literature, and define a class of HMLR models for SARS-CoV-2 variant nowcasting and forecasting. We rigorously test a subset of this class of models using the framework of the US SARS-CoV-2 Variant Nowcast Hub, a collaborative modeling project that launched in 2024. We created two years of weekly predictions based on retrospective datasets, with the prediction dates ranging from Wednesday, August 3, 2022, to Wednesday, August 7, 2024. We tested 12 HMLR models against a baseline model on these datasets. We found that the HMLR models outperformed the baseline both in terms of probabilistic accuracy, as measured by the energy score, as well as point accuracy, as measured by the Brier score. Overall, we find that HMLR models perform best with respect to the baseline model in locations with more data, and more complex HMLR models also showed more improvement in those high-data locations; however, there was no one best model across all metrics, and simpler HMLR models perform better in low-data locations. We find that HMLR models perform well in practice for nowcasting and forecasting SARS-CoV-2 variants.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript defines a class of hierarchical multinomial logistic regression (HMLR) models for nowcasting and forecasting SARS-CoV-2 variant proportions, allowing data sharing across locations with varying data volumes. It evaluates 12 specific HMLR models retrospectively over weekly prediction dates from August 3, 2022 to August 7, 2024 using datasets from the US SARS-CoV-2 Variant Nowcast Hub, comparing them to a baseline model. The central claim is that the HMLR models outperform the baseline in probabilistic accuracy via the energy score and point accuracy via the Brier score, with stronger gains in high-data locations and simpler models performing relatively better in low-data locations.

Significance. If the results hold, this work offers concrete evidence that hierarchical models improve variant proportion nowcasts and forecasts by pooling information across locations, which is particularly relevant for surveillance systems with heterogeneous data availability. The use of two years of weekly retrospective predictions, an external collaborative hub framework, and proper scoring rules (energy score and Brier score) provides a reproducible and falsifiable assessment of practical performance. This strengthens the case for HMLR approaches in infectious disease modeling.

major comments (2)
  1. [Retrospective Evaluation] Retrospective Evaluation section: The central claim that HMLR models 'perform well in practice' rests on retrospective datasets from the Nowcast Hub reproducing real-time reporting delays, submission lags, and location-specific data completeness. The manuscript does not include explicit checks or sensitivity analyses for how these datasets match ongoing data-generating processes (e.g., changes in sequencing volume), which is load-bearing given the reported performance differences between high- and low-data locations.
  2. [Results] Results and Model Comparison: While outperformance on energy and Brier scores is stated, the manuscript would benefit from explicit reporting of effect sizes, such as average score differences or location-stratified tables, to quantify the improvement magnitude and support the claim that no single model is best across all metrics.
minor comments (2)
  1. [Abstract] Abstract: High-level descriptions of the 12 models without reference to specific quantitative metrics or tables limit quick verification of the outperformance findings.
  2. [Methods] Model definitions: The hierarchical structure and hyperparameter choices in the HMLR class could use more explicit notation or pseudocode to improve reproducibility for readers implementing similar models.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed review. The comments help clarify the presentation of our retrospective evaluation and strengthen the quantitative support for our results. We address each major comment below.

read point-by-point responses
  1. Referee: [Retrospective Evaluation] Retrospective Evaluation section: The central claim that HMLR models 'perform well in practice' rests on retrospective datasets from the Nowcast Hub reproducing real-time reporting delays, submission lags, and location-specific data completeness. The manuscript does not include explicit checks or sensitivity analyses for how these datasets match ongoing data-generating processes (e.g., changes in sequencing volume), which is load-bearing given the reported performance differences between high- and low-data locations.

    Authors: We agree that explicit discussion of dataset fidelity would strengthen the manuscript. In revision we will add a dedicated paragraph in the Retrospective Evaluation section describing how the Nowcast Hub datasets were constructed to replicate real-time reporting delays and location-specific completeness. We will also report a sensitivity analysis that varies assumed sequencing volume thresholds and shows the resulting changes in performance gaps between high- and low-data locations. revision: yes

  2. Referee: [Results] Results and Model Comparison: While outperformance on energy and Brier scores is stated, the manuscript would benefit from explicit reporting of effect sizes, such as average score differences or location-stratified tables, to quantify the improvement magnitude and support the claim that no single model is best across all metrics.

    Authors: We accept this suggestion. The revised Results section will include tables of mean energy-score and Brier-score differences (with standard errors) for each HMLR model versus the baseline, stratified by high- versus low-data locations. These additions will quantify the magnitude of improvement and reinforce the observation that no single model is uniformly best. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical out-of-sample retrospective evaluation

full rationale

The paper's central claims rest on fitting HMLR models to retrospective datasets from the US SARS-CoV-2 Variant Nowcast Hub and then evaluating predictive accuracy (energy score for probabilistic, Brier score for point) against a baseline over weekly prediction dates from August 2022 to August 2024. These are standard out-of-sample comparisons on held-out temporal windows; the reported performance metrics are computed directly from model outputs versus observed variant proportions and do not reduce to the fitted parameters by construction. No self-definitional equations, fitted-inputs-renamed-as-predictions, or load-bearing self-citations appear in the performance claims. The evaluation framework is externally defined by the Nowcast Hub and is falsifiable against future real-time data.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The models rely on standard hierarchical Bayesian assumptions for sharing information across locations; no new entities are postulated. Free parameters include location-specific and shared hyperparameters whose exact count and fitting procedure are not detailed in the abstract.

free parameters (1)
  • hierarchical hyperparameters
    Parameters controlling the degree of pooling across locations; their values are fitted during model training but not enumerated in the abstract.
axioms (1)
  • domain assumption Data from different locations can be usefully pooled through a hierarchical structure that respects local trends.
    Invoked when defining the class of HMLR models that allow data sharing across locations with different data volumes.

pith-pipeline@v0.9.0 · 5893 in / 1378 out tokens · 40454 ms · 2026-05-22T03:41:48.719457+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

  1. [1]

    Fitness models provide accurate short- term forecasts of sars-cov-2 variant frequency.PLOS Computational Biology, 20:1–20, 2024

    Eslam Abousamra, Marlin Figgins, and Trevor Bedford. Fitness models provide accurate short- term forecasts of sars-cov-2 variant frequency.PLOS Computational Biology, 20:1–20, 2024. doi: https://doi.org/10.1371/journal.pcbi.1012443. URLhttps://doi.org/10.1371/journal.pcbi. 1012443

  2. [2]

    Nextclade: clade assignment, mutation calling and quality control for viral genomes.Journal of open source software, 6(67):3773, 2021

    Ivan Aksamentov, Cornelius Roemer, Emma B Hodcroft, and Richard A Neher. Nextclade: clade assignment, mutation calling and quality control for viral genomes.Journal of open source software, 6(67):3773, 2021

  3. [3]

    Emer- gence and expansion of sars-cov-2 b

    Medini K Annavajhala, Hiroshi Mohri, Pengfei Wang, Manoj Nair, Jason E Zucker, Zizhang Sheng, Angela Gomez-Simmonds, Anne L Kelley, Maya Tagliavia, Yaoxing Huang, et al. Emer- gence and expansion of sars-cov-2 b. 1.526 after identification in new york.Nature, 597(7878): 703–708, 2021

  4. [4]

    Genbank.Nucleic acids research, 41(D1):D36–D42, 2012

    Dennis A Benson, Mark Cavanaugh, Karen Clark, Ilene Karsch-Mizrachi, David J Lipman, James Ostell, and Eric W Sayers. Genbank.Nucleic acids research, 41(D1):D36–D42, 2012

  5. [5]

    MMWR Week Fact Sheet.https://ndc.services

    Centers for Disease Control and Prevention . MMWR Week Fact Sheet.https://ndc.services. cdc.gov/wp-content/uploads/MMWR_Week_overview.pdf, 2025

  6. [6]

    Genomics and epidemiology of the p

    Nuno R Faria, Thomas A Mellan, Charles Whittaker, Ingra M Claro, Darlan da S Candido, Swapnil Mishra, Myuki AE Crispim, Flavia CS Sales, Iwona Hawryluk, John T McCrone, et al. Genomics and epidemiology of the p. 1 sars-cov-2 lineage in manaus, brazil.Science, 372(6544): 815–821, 2021

  7. [7]

    Multilevel (hierarchical) modeling: what it can and cannot do.Technometrics, 48(3):432–435, 2006

    Andrew Gelman. Multilevel (hierarchical) modeling: what it can and cannot do.Technometrics, 48(3):432–435, 2006

  8. [8]

    Stan: A probabilistic programming language for bayesian inference and optimization.Journal of Educational and Behavioral Statistics, 40(5): 530–543, 2015

    Andrew Gelman, Daniel Lee, and Jiqiang Guo. Stan: A probabilistic programming language for bayesian inference and optimization.Journal of Educational and Behavioral Statistics, 40(5): 530–543, 2015

  9. [9]

    GISAID. Clade and lineage nomenclature aids in genomic epidemiology studies of ac- tive hcov-19 viruses.https://gisaid.org/resources/statements-clarifications/ 22 clade-and-lineage-nomenclature-aids-in-genomic-epidemiology-of-active-hcov-19-viruses,

  10. [10]

    Accessed: 2026-04-21

  11. [11]

    Verification of forecasts expressed in terms of probability.Monthly weather review, 78(1):1–3, 1950

    W Brier Glenn et al. Verification of forecasts expressed in terms of probability.Monthly weather review, 78(1):1–3, 1950

  12. [12]

    Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds.Test, 17:211–235, 2008

    Tilmann Gneiting, Larissa I Stanberry, Eric P Grimit, Leonhard Held, and Nicholas A John- son. Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds.Test, 17:211–235, 2008

  13. [13]

    Nextstrain: real-time tracking of pathogen evolution.Bioinformatics, 34(23):4121–4123, 2018

    James Hadfield, Colin Megill, Sidney M Bell, John Huddleston, Barney Potter, Charlton Callen- der, Pavel Sagulenko, Trevor Bedford, and Richard A Neher. Nextstrain: real-time tracking of pathogen evolution.Bioinformatics, 34(23):4121–4123, 2018

  14. [14]

    Global dynamics of sars-cov-2 clades and their relation to covid-19 epidemiology.Scientific reports, 11 (1):8435, 2021

    Samira M Hamed, Walid F Elkhatib, Ahmed S Khairalla, and Ayman M Noreddin. Global dynamics of sars-cov-2 clades and their relation to covid-19 epidemiology.Scientific reports, 11 (1):8435, 2021

  15. [15]

    Evaluating probabilistic forecasts with scoringrules.Journal of Statistical Software, 90:1–37, 2019

    Alexander Jordan, Fabian Kr¨ uger, and Sebastian Lerch. Evaluating probabilistic forecasts with scoringrules.Journal of Statistical Software, 90:1–37, 2019

  16. [16]

    Generating random correlation matrices based on vines and extended onion method.Journal of Multivariate Analysis, 100(9):1989–2001, 2009

    Daniel Lewandowski, Dorota Kurowicka, and Harry Joe. Generating random correlation matrices based on vines and extended onion method.Journal of Multivariate Analysis, 100(9):1989–2001, 2009

  17. [17]

    Fox, Dylan H

    Isaac MacArthur, Thomas Robacker, Bren Case, Spencer J. Fox, Dylan H. Morris, Evan L. Ray, Benjamin Rogers, Becky Sweger, Natalie M. Linton, John L. Huddleston, Andrew Magee, Zachary Susswein, Jover Lee, Trevor Bedford, Marlin D. Figgins, Ehsan Suez, Rajath Prabhakar, Tom´ as M. Le´ on, Brent Siegel, Mugdha Thakur, Christopher M. Hoover, Rahil Ryder, Jess...

  18. [18]

    Reich, Kimia Ghobadi, Elizabeth C

    Kristen Nixon, Sonia Jindal, Felix Parker, Maximilian Marshall, Nicholas G. Reich, Kimia Ghobadi, Elizabeth C. Lee, Shawn Truelove, and Lauren Gradner. Real-time COVID-19 fore- casting: challenges and opportunities of model performance and translation.The Lancet Digital Health, 4(10):699–701, 2022. URLhttps://www.thelancet.com/journals/landig/article/ PII...

  19. [19]

    Holmes, ´Aine O’Toole, Verity Hill, John T

    Andrew Rambaut, Edward C. Holmes, ´Aine O’Toole, Verity Hill, John T. McCrone, Carla Ruis, Louis du Plessis, and Oliver G. Pybus. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology.Nature Microbiology, 2020. doi: 10.1038/s41564-020-0770-5

  20. [20]

    Genetics of covid-19.Jornal de pediatria, 97(4):378–386, 2021

    Salmo Raskin. Genetics of covid-19.Jornal de pediatria, 97(4):378–386, 2021

  21. [21]

    US SARS-CoV-2 Variant Nowcast Hub

    Reich Lab at UMass-Amherst. US SARS-CoV-2 Variant Nowcast Hub. https://github.com/reichlab/variant-nowcast-hub, 2024. Accessed: 2024-12-27

  22. [22]

    SARS-CoV-2 phylogeny

    Cornelius Roemer and Richard Neher. SARS-CoV-2 phylogeny. https://next.nextstrain.org/nextclade/sars-cov-2, 2024. 23

  23. [23]

    Johnson, Robel Kassa, Mina Parastaran, Vivian Peng, Leo Wolan- sky, Samuel V

    Zachary Susswein, Kaitlyn E. Johnson, Robel Kassa, Mina Parastaran, Vivian Peng, Leo Wolan- sky, Samuel V. Scarpino, and Ana I. Bento. Leveraging global genomic sequencing data to estimate local variant dynamics.medRxiv, 2023. doi: 10.1101/2023.01.02.23284123. URL https://www.medrxiv.org/content/early/2023/03/20/2023.01.02.23284123

  24. [24]

    cladetime, 2025

    The Reich Lab at UMass-Amherst. cladetime, 2025. URLhttps://github.com/reichlab/ cladetime

  25. [25]

    An introduction to hierarchical linear modeling.Tutorials in quantitative methods for psychology, 8(1):52–69, 2012

    Heather Woltman, Andrea Feldstain, J Christine MacKay, and Meredith Rocchi. An introduction to hierarchical linear modeling.Tutorials in quantitative methods for psychology, 8(1):52–69, 2012

  26. [26]

    World Health Organization, Geneva, 2021

    World Health Organization.Genomic sequencing of SARS-CoV-2: a guide to implementation for maximum impact on public health. World Health Organization, Geneva, 2021. URLhttps: //apps.who.int/iris/handle/10665/342003. Licence: CC BY-NC-SA 3.0 IGO

  27. [27]

    Association of sars-cov-2 clades with clinical, inflammatory and virologic outcomes: An observational study

    Barnaby E Young, Wycliffe E Wei, Siew-Wai Fong, Tze-Minn Mak, Danielle E Anderson, Yi-Hao Chan, Rachael Pung, Cheryl SY Heng, Li Wei Ang, Adrian Kang Eng Zheng, et al. Association of sars-cov-2 clades with clinical, inflammatory and virologic outcomes: An observational study. EBioMedicine, 66, 2021. 24