Comparison of probabilistic nowcasts and forecasts of SARS-CoV-2 variant proportions made by hierarchical multinomial linear regression models
Pith reviewed 2026-05-22 03:41 UTC · model grok-4.3
The pith
Hierarchical multinomial logistic regression models outperform a baseline in nowcasting and forecasting SARS-CoV-2 variant proportions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that HMLR models outperform the baseline both in terms of probabilistic accuracy, as measured by the energy score, as well as point accuracy, as measured by the Brier score, when making nowcasts and forecasts of SARS-CoV-2 variant proportions. The models perform best with respect to the baseline in locations with more data, and more complex models show more improvement in those high-data locations, while simpler models perform better in low-data locations.
What carries the argument
Hierarchical multinomial logistic regression (HMLR), which pools data across locations to account for varying sample sizes and trends in variant surveillance.
If this is right
- HMLR models achieve better accuracy than the baseline particularly in locations with abundant data.
- More complex versions of the HMLR models deliver greater gains in high-data settings.
- Simpler HMLR models are preferable for locations with limited data.
- There is no single best HMLR model that wins across all evaluation metrics and locations.
Where Pith is reading between the lines
- Public health agencies could select model complexity based on the volume of local sequencing data available.
- The approach may generalize to nowcasting variants of other pathogens if similar hierarchical structures are used.
- Future work could examine how these models handle sudden changes in reporting practices or new variant emergence.
- Integration into real-time systems would need to account for any additional delays not present in the retrospective tests.
Load-bearing premise
Retrospective datasets from the US SARS-CoV-2 Variant Nowcast Hub accurately represent the data-generating process and reporting delays that will occur in future real-time nowcasting.
What would settle it
In future real-time nowcasting after the study period, the HMLR models fail to achieve lower energy scores and Brier scores than the baseline across multiple locations and time periods.
Figures
read the original abstract
Nowcasting and forecasting of infectious diseases have become increasingly important since the SARS-CoV-2 pandemic. In particular, methods for modeling the composition of circulating variants at a given time have seen more use in part due to a large increase in the frequency of genomic sequencing conducted as a part of routine surveillance. However, methods must take into account that locations have different amounts of data and sometimes have different trends. We discuss hierarchical multinomial logistic regression (HMLR), a commonly used method for forecasting SARS-CoV-2 variants, which allows for data sharing across locations. We show how it has been used in the literature, and define a class of HMLR models for SARS-CoV-2 variant nowcasting and forecasting. We rigorously test a subset of this class of models using the framework of the US SARS-CoV-2 Variant Nowcast Hub, a collaborative modeling project that launched in 2024. We created two years of weekly predictions based on retrospective datasets, with the prediction dates ranging from Wednesday, August 3, 2022, to Wednesday, August 7, 2024. We tested 12 HMLR models against a baseline model on these datasets. We found that the HMLR models outperformed the baseline both in terms of probabilistic accuracy, as measured by the energy score, as well as point accuracy, as measured by the Brier score. Overall, we find that HMLR models perform best with respect to the baseline model in locations with more data, and more complex HMLR models also showed more improvement in those high-data locations; however, there was no one best model across all metrics, and simpler HMLR models perform better in low-data locations. We find that HMLR models perform well in practice for nowcasting and forecasting SARS-CoV-2 variants.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript defines a class of hierarchical multinomial logistic regression (HMLR) models for nowcasting and forecasting SARS-CoV-2 variant proportions, allowing data sharing across locations with varying data volumes. It evaluates 12 specific HMLR models retrospectively over weekly prediction dates from August 3, 2022 to August 7, 2024 using datasets from the US SARS-CoV-2 Variant Nowcast Hub, comparing them to a baseline model. The central claim is that the HMLR models outperform the baseline in probabilistic accuracy via the energy score and point accuracy via the Brier score, with stronger gains in high-data locations and simpler models performing relatively better in low-data locations.
Significance. If the results hold, this work offers concrete evidence that hierarchical models improve variant proportion nowcasts and forecasts by pooling information across locations, which is particularly relevant for surveillance systems with heterogeneous data availability. The use of two years of weekly retrospective predictions, an external collaborative hub framework, and proper scoring rules (energy score and Brier score) provides a reproducible and falsifiable assessment of practical performance. This strengthens the case for HMLR approaches in infectious disease modeling.
major comments (2)
- [Retrospective Evaluation] Retrospective Evaluation section: The central claim that HMLR models 'perform well in practice' rests on retrospective datasets from the Nowcast Hub reproducing real-time reporting delays, submission lags, and location-specific data completeness. The manuscript does not include explicit checks or sensitivity analyses for how these datasets match ongoing data-generating processes (e.g., changes in sequencing volume), which is load-bearing given the reported performance differences between high- and low-data locations.
- [Results] Results and Model Comparison: While outperformance on energy and Brier scores is stated, the manuscript would benefit from explicit reporting of effect sizes, such as average score differences or location-stratified tables, to quantify the improvement magnitude and support the claim that no single model is best across all metrics.
minor comments (2)
- [Abstract] Abstract: High-level descriptions of the 12 models without reference to specific quantitative metrics or tables limit quick verification of the outperformance findings.
- [Methods] Model definitions: The hierarchical structure and hyperparameter choices in the HMLR class could use more explicit notation or pseudocode to improve reproducibility for readers implementing similar models.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review. The comments help clarify the presentation of our retrospective evaluation and strengthen the quantitative support for our results. We address each major comment below.
read point-by-point responses
-
Referee: [Retrospective Evaluation] Retrospective Evaluation section: The central claim that HMLR models 'perform well in practice' rests on retrospective datasets from the Nowcast Hub reproducing real-time reporting delays, submission lags, and location-specific data completeness. The manuscript does not include explicit checks or sensitivity analyses for how these datasets match ongoing data-generating processes (e.g., changes in sequencing volume), which is load-bearing given the reported performance differences between high- and low-data locations.
Authors: We agree that explicit discussion of dataset fidelity would strengthen the manuscript. In revision we will add a dedicated paragraph in the Retrospective Evaluation section describing how the Nowcast Hub datasets were constructed to replicate real-time reporting delays and location-specific completeness. We will also report a sensitivity analysis that varies assumed sequencing volume thresholds and shows the resulting changes in performance gaps between high- and low-data locations. revision: yes
-
Referee: [Results] Results and Model Comparison: While outperformance on energy and Brier scores is stated, the manuscript would benefit from explicit reporting of effect sizes, such as average score differences or location-stratified tables, to quantify the improvement magnitude and support the claim that no single model is best across all metrics.
Authors: We accept this suggestion. The revised Results section will include tables of mean energy-score and Brier-score differences (with standard errors) for each HMLR model versus the baseline, stratified by high- versus low-data locations. These additions will quantify the magnitude of improvement and reinforce the observation that no single model is uniformly best. revision: yes
Circularity Check
No significant circularity: empirical out-of-sample retrospective evaluation
full rationale
The paper's central claims rest on fitting HMLR models to retrospective datasets from the US SARS-CoV-2 Variant Nowcast Hub and then evaluating predictive accuracy (energy score for probabilistic, Brier score for point) against a baseline over weekly prediction dates from August 2022 to August 2024. These are standard out-of-sample comparisons on held-out temporal windows; the reported performance metrics are computed directly from model outputs versus observed variant proportions and do not reduce to the fitted parameters by construction. No self-definitional equations, fitted-inputs-renamed-as-predictions, or load-bearing self-citations appear in the performance claims. The evaluation framework is externally defined by the Nowcast Hub and is falsifiable against future real-time data.
Axiom & Free-Parameter Ledger
free parameters (1)
- hierarchical hyperparameters
axioms (1)
- domain assumption Data from different locations can be usefully pooled through a hierarchical structure that respects local trends.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We define a set of 12 HMLR models that assume varying regression coefficient covariance structures, time trends, and variant count distributions... Models are given three-letter abbreviations... (Table 2)
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We found that the HMLR models outperformed the baseline both in terms of probabilistic accuracy, as measured by the energy score, as well as point accuracy, as measured by the Brier score.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Eslam Abousamra, Marlin Figgins, and Trevor Bedford. Fitness models provide accurate short- term forecasts of sars-cov-2 variant frequency.PLOS Computational Biology, 20:1–20, 2024. doi: https://doi.org/10.1371/journal.pcbi.1012443. URLhttps://doi.org/10.1371/journal.pcbi. 1012443
-
[2]
Ivan Aksamentov, Cornelius Roemer, Emma B Hodcroft, and Richard A Neher. Nextclade: clade assignment, mutation calling and quality control for viral genomes.Journal of open source software, 6(67):3773, 2021
work page 2021
-
[3]
Emer- gence and expansion of sars-cov-2 b
Medini K Annavajhala, Hiroshi Mohri, Pengfei Wang, Manoj Nair, Jason E Zucker, Zizhang Sheng, Angela Gomez-Simmonds, Anne L Kelley, Maya Tagliavia, Yaoxing Huang, et al. Emer- gence and expansion of sars-cov-2 b. 1.526 after identification in new york.Nature, 597(7878): 703–708, 2021
work page 2021
-
[4]
Genbank.Nucleic acids research, 41(D1):D36–D42, 2012
Dennis A Benson, Mark Cavanaugh, Karen Clark, Ilene Karsch-Mizrachi, David J Lipman, James Ostell, and Eric W Sayers. Genbank.Nucleic acids research, 41(D1):D36–D42, 2012
work page 2012
-
[5]
MMWR Week Fact Sheet.https://ndc.services
Centers for Disease Control and Prevention . MMWR Week Fact Sheet.https://ndc.services. cdc.gov/wp-content/uploads/MMWR_Week_overview.pdf, 2025
work page 2025
-
[6]
Genomics and epidemiology of the p
Nuno R Faria, Thomas A Mellan, Charles Whittaker, Ingra M Claro, Darlan da S Candido, Swapnil Mishra, Myuki AE Crispim, Flavia CS Sales, Iwona Hawryluk, John T McCrone, et al. Genomics and epidemiology of the p. 1 sars-cov-2 lineage in manaus, brazil.Science, 372(6544): 815–821, 2021
work page 2021
-
[7]
Multilevel (hierarchical) modeling: what it can and cannot do.Technometrics, 48(3):432–435, 2006
Andrew Gelman. Multilevel (hierarchical) modeling: what it can and cannot do.Technometrics, 48(3):432–435, 2006
work page 2006
-
[8]
Andrew Gelman, Daniel Lee, and Jiqiang Guo. Stan: A probabilistic programming language for bayesian inference and optimization.Journal of Educational and Behavioral Statistics, 40(5): 530–543, 2015
work page 2015
-
[9]
GISAID. Clade and lineage nomenclature aids in genomic epidemiology studies of ac- tive hcov-19 viruses.https://gisaid.org/resources/statements-clarifications/ 22 clade-and-lineage-nomenclature-aids-in-genomic-epidemiology-of-active-hcov-19-viruses,
-
[10]
Accessed: 2026-04-21
work page 2026
-
[11]
Verification of forecasts expressed in terms of probability.Monthly weather review, 78(1):1–3, 1950
W Brier Glenn et al. Verification of forecasts expressed in terms of probability.Monthly weather review, 78(1):1–3, 1950
work page 1950
-
[12]
Tilmann Gneiting, Larissa I Stanberry, Eric P Grimit, Leonhard Held, and Nicholas A John- son. Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds.Test, 17:211–235, 2008
work page 2008
-
[13]
Nextstrain: real-time tracking of pathogen evolution.Bioinformatics, 34(23):4121–4123, 2018
James Hadfield, Colin Megill, Sidney M Bell, John Huddleston, Barney Potter, Charlton Callen- der, Pavel Sagulenko, Trevor Bedford, and Richard A Neher. Nextstrain: real-time tracking of pathogen evolution.Bioinformatics, 34(23):4121–4123, 2018
work page 2018
-
[14]
Samira M Hamed, Walid F Elkhatib, Ahmed S Khairalla, and Ayman M Noreddin. Global dynamics of sars-cov-2 clades and their relation to covid-19 epidemiology.Scientific reports, 11 (1):8435, 2021
work page 2021
-
[15]
Evaluating probabilistic forecasts with scoringrules.Journal of Statistical Software, 90:1–37, 2019
Alexander Jordan, Fabian Kr¨ uger, and Sebastian Lerch. Evaluating probabilistic forecasts with scoringrules.Journal of Statistical Software, 90:1–37, 2019
work page 2019
-
[16]
Daniel Lewandowski, Dorota Kurowicka, and Harry Joe. Generating random correlation matrices based on vines and extended onion method.Journal of Multivariate Analysis, 100(9):1989–2001, 2009
work page 1989
-
[17]
Isaac MacArthur, Thomas Robacker, Bren Case, Spencer J. Fox, Dylan H. Morris, Evan L. Ray, Benjamin Rogers, Becky Sweger, Natalie M. Linton, John L. Huddleston, Andrew Magee, Zachary Susswein, Jover Lee, Trevor Bedford, Marlin D. Figgins, Ehsan Suez, Rajath Prabhakar, Tom´ as M. Le´ on, Brent Siegel, Mugdha Thakur, Christopher M. Hoover, Rahil Ryder, Jess...
work page 2026
-
[18]
Reich, Kimia Ghobadi, Elizabeth C
Kristen Nixon, Sonia Jindal, Felix Parker, Maximilian Marshall, Nicholas G. Reich, Kimia Ghobadi, Elizabeth C. Lee, Shawn Truelove, and Lauren Gradner. Real-time COVID-19 fore- casting: challenges and opportunities of model performance and translation.The Lancet Digital Health, 4(10):699–701, 2022. URLhttps://www.thelancet.com/journals/landig/article/ PII...
work page 2022
-
[19]
Holmes, ´Aine O’Toole, Verity Hill, John T
Andrew Rambaut, Edward C. Holmes, ´Aine O’Toole, Verity Hill, John T. McCrone, Carla Ruis, Louis du Plessis, and Oliver G. Pybus. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology.Nature Microbiology, 2020. doi: 10.1038/s41564-020-0770-5
-
[20]
Genetics of covid-19.Jornal de pediatria, 97(4):378–386, 2021
Salmo Raskin. Genetics of covid-19.Jornal de pediatria, 97(4):378–386, 2021
work page 2021
-
[21]
US SARS-CoV-2 Variant Nowcast Hub
Reich Lab at UMass-Amherst. US SARS-CoV-2 Variant Nowcast Hub. https://github.com/reichlab/variant-nowcast-hub, 2024. Accessed: 2024-12-27
work page 2024
-
[22]
Cornelius Roemer and Richard Neher. SARS-CoV-2 phylogeny. https://next.nextstrain.org/nextclade/sars-cov-2, 2024. 23
work page 2024
-
[23]
Johnson, Robel Kassa, Mina Parastaran, Vivian Peng, Leo Wolan- sky, Samuel V
Zachary Susswein, Kaitlyn E. Johnson, Robel Kassa, Mina Parastaran, Vivian Peng, Leo Wolan- sky, Samuel V. Scarpino, and Ana I. Bento. Leveraging global genomic sequencing data to estimate local variant dynamics.medRxiv, 2023. doi: 10.1101/2023.01.02.23284123. URL https://www.medrxiv.org/content/early/2023/03/20/2023.01.02.23284123
-
[24]
The Reich Lab at UMass-Amherst. cladetime, 2025. URLhttps://github.com/reichlab/ cladetime
work page 2025
-
[25]
Heather Woltman, Andrea Feldstain, J Christine MacKay, and Meredith Rocchi. An introduction to hierarchical linear modeling.Tutorials in quantitative methods for psychology, 8(1):52–69, 2012
work page 2012
-
[26]
World Health Organization, Geneva, 2021
World Health Organization.Genomic sequencing of SARS-CoV-2: a guide to implementation for maximum impact on public health. World Health Organization, Geneva, 2021. URLhttps: //apps.who.int/iris/handle/10665/342003. Licence: CC BY-NC-SA 3.0 IGO
work page 2021
-
[27]
Barnaby E Young, Wycliffe E Wei, Siew-Wai Fong, Tze-Minn Mak, Danielle E Anderson, Yi-Hao Chan, Rachael Pung, Cheryl SY Heng, Li Wei Ang, Adrian Kang Eng Zheng, et al. Association of sars-cov-2 clades with clinical, inflammatory and virologic outcomes: An observational study. EBioMedicine, 66, 2021. 24
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.