pith. machine review for the scientific record. sign in

arxiv: 2604.12900 · v1 · submitted 2026-04-14 · 📊 stat.ME · econ.EM

Recognition: unknown

Emulating Stepped-Wedge Cluster Randomized Trials to Evaluate Health Policies and Interventions

Fan Li, Gregg S. Gonsalves, Guanyu Tong, Haidong Lu, Lee Kennedy-Shaffer

Authors on Pith no claims yet

Pith reviewed 2026-05-10 14:26 UTC · model grok-4.3

classification 📊 stat.ME econ.EM
keywords stepped-wedge trialstarget trial emulationstaggered adoptiondifference-in-differencescluster randomized trialscausal inferencehealth policy evaluationquasi-experimental designs
0
0 comments X

The pith

Observational studies with staggered policy adoption can emulate stepped-wedge cluster randomized trials to improve design, reporting, and causal inference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes that researchers evaluating health policies with observational data on staggered adoption can restructure their studies as emulations of stepped-wedge cluster randomized trials inside the target trial emulation framework. This framing pushes investigators to specify the hypothetical trial features they are mimicking, such as cluster randomization and timing of rollout, along with the core assumptions required for valid emulation. A sympathetic reader would care because the variety of current difference-in-differences methods makes it hard to compare studies or judge whether their evidence is reliable. By borrowing the conceptual and reporting standards from trial emulation, the approach encourages explicit statements of the estimand, consideration of heterogeneity and time-varying effects, and clearer communication across randomized and quasi-experimental traditions.

Core claim

The authors claim that framing observational staggered-adoption studies as emulations of stepped-wedge cluster randomized trials within the target trial emulation framework provides a unified structure for design, analysis, and reporting. This structure highlights policy heterogeneity, time-varying effects, spillovers, and anticipation effects; clarifies the estimand and assumptions; identifies settings unlikely to yield high-quality causal evidence; and guides the bias-variance-generalizability trade-offs that arise from specific design and analysis choices.

What carries the argument

Target trial emulation framework, which restructures observational data on staggered policy adoption to match the randomization, timing, and cluster features of a hypothetical stepped-wedge cluster randomized trial.

If this is right

  • Studies will report a single, clearly defined estimand and list the assumptions needed for the emulation to be valid.
  • Analyses will routinely examine treatment effect heterogeneity, time-varying effects, and potential spillovers or anticipation.
  • Investigators will more often recognize and avoid study designs that cannot support credible causal claims under either randomized or observational approaches.
  • Insights on power, cluster effects, and crossover designs will flow between trialists and quasi-experimental researchers.
  • Design choices will be evaluated explicitly for their impact on bias, variance, and the generalizability of results.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same emulation logic could be applied to other staggered rollout settings outside health policy, such as education or environmental interventions.
  • Software tools that automate the restructuring of longitudinal data into stepped-wedge trial formats would reduce implementation barriers.
  • Emulation failures could serve as diagnostic signals that prompt collection of additional covariates or different analytic strategies.
  • Over time, journals might adopt reporting checklists that require explicit mapping from observational data to a target stepped-wedge trial.

Load-bearing premise

Observational data on staggered policy adoption can be validly restructured to emulate the randomization, timing, and cluster features of a stepped-wedge cluster randomized trial while satisfying the core assumptions of target trial emulation.

What would settle it

An actual stepped-wedge cluster randomized trial conducted in the same population and policy context yields materially different effect estimates or different conclusions about effect heterogeneity than the emulated analysis of the corresponding observational data.

Figures

Figures reproduced from arXiv: 2604.12900 by Fan Li, Gregg S. Gonsalves, Guanyu Tong, Haidong Lu, Lee Kennedy-Shaffer.

Figure 1
Figure 1. Figure 1: Design schema for alternative unit inclusion criteria (Component 1). MMWR weeks are as defined by the U.S. Centers for Disease Control and Prevention. Shaded boxes indicate post￾adoption time periods; dates indicate lottery announcement dates for the relevant states [PITH_FULL_IMAGE:figures/full_fig_p012_1.png] view at source ↗
read the original abstract

Both cluster randomized trials and quasi-experimental designs are used to evaluate the impact of health and social policies and interventions. Stepped-wedge cluster randomized trials randomize a staggered adoption approach, while recent difference-in-differences methods allow analysis of non-randomized settings where similar policies are adopted at different time points. These approaches have become common, but the sheer variety of methods for analyzing observational studies with staggered adoption makes it challenging to clearly design and report such studies. We propose that observational and quasi-experimental study investigators can address these challenges by emulating stepped-wedge cluster randomized trials in the target trial emulation framework. The conceptual framework and reporting standards of trial emulation will encourage consideration of key features of these designs, such as policy heterogeneity and time-varying effects, and clear reporting of the estimand and assumptions. It also highlights areas where those interested in randomized trials and quasi-experimental designs can benefit from one another's experience by bringing insights across disciplines. Questions of treatment effect heterogeneity, power, spillovers, and anticipation effects, among others, are common to both fields and can benefit from cross-pollination. This article also demonstrates how trial emulation can identify settings that are not well-served by either approach, thereby avoiding studies unlikely to generate high-quality causal evidence. Finally, it informs the bias-variance-generalizability trade-off that arises with design and analysis choices made in these settings, supporting better evidence generation and interpretation in settings where important questions can be answered.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes that observational and quasi-experimental studies with staggered policy adoption can be restructured to emulate the design features of stepped-wedge cluster randomized trials (SW-CRTs) inside the target trial emulation (TTE) framework. This is presented as a way to clarify estimands, make assumptions explicit, handle heterogeneity and time-varying effects, improve reporting, identify unsuitable study settings, and inform bias-variance-generalizability trade-offs by cross-pollinating insights from randomized trials and difference-in-differences methods.

Significance. If operationalized, the proposal could raise the quality of causal evidence for health policies by encouraging explicit mapping of observational data to SW-CRT features and by highlighting common challenges such as spillovers, anticipation, and effect heterogeneity. It offers a structured lens for design choices rather than promising automatic identification.

major comments (2)
  1. [Abstract] Abstract: the claim that the article 'demonstrates how trial emulation can identify settings that are not well-served by either approach' is unsupported; no concrete criterion, decision rule, or worked example is supplied for when emulation would be inappropriate, which is load-bearing for the practical utility asserted in the final paragraph.
  2. [Conceptual framework] The central proposal (restructuring observational staggered-adoption data to emulate SW-CRT randomization, timing, and clustering while satisfying TTE assumptions) is stated at a conceptual level without a step-by-step protocol, variable-mapping table, or explicit checklist for verifying the no-anticipation, consistency, and positivity conditions in the emulated design.
minor comments (2)
  1. The manuscript would benefit from a side-by-side table contrasting SW-CRT randomization, TTE emulation steps, and standard staggered DiD assumptions.
  2. Notation for clusters, adoption periods, and the target estimand (e.g., average treatment effect on the treated under the emulated design) should be introduced formally even if the paper remains non-technical.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which identify opportunities to strengthen the practical utility of our conceptual proposal. We respond to each major comment below and will incorporate revisions to address the concerns raised.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that the article 'demonstrates how trial emulation can identify settings that are not well-served by either approach' is unsupported; no concrete criterion, decision rule, or worked example is supplied for when emulation would be inappropriate, which is load-bearing for the practical utility asserted in the final paragraph.

    Authors: We agree that the abstract claim would be more robust with concrete support. The manuscript discusses conceptually how emulation can flag limitations (e.g., simultaneous adoption violating staggered structure or unaddressable spillovers), but lacks an explicit example or decision rule. In revision, we will add a brief illustrative scenario in the discussion section showing how the framework identifies unsuitable settings, such as when positivity fails due to universal adoption. This will substantiate the claim while preserving the paper's conceptual emphasis. revision: yes

  2. Referee: [Conceptual framework] The central proposal (restructuring observational staggered-adoption data to emulate SW-CRT randomization, timing, and clustering while satisfying TTE assumptions) is stated at a conceptual level without a step-by-step protocol, variable-mapping table, or explicit checklist for verifying the no-anticipation, consistency, and positivity conditions in the emulated design.

    Authors: The manuscript is framed as a high-level conceptual bridge between TTE and SW-CRT designs rather than an implementation protocol. We recognize that adding operational elements would improve usability. We will revise by including a variable-mapping table aligning observational elements (cluster IDs, adoption times, outcomes) with SW-CRT features and an expanded checklist for verifying TTE assumptions (no anticipation, consistency, positivity) in the emulated design. This keeps the focus on cross-pollination of ideas while making the proposal more actionable. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in conceptual proposal

full rationale

The paper advances a methodological proposal to restructure observational data on staggered policy adoption as an emulation of stepped-wedge cluster randomized trials inside the target trial emulation framework. This is presented as a conceptual aid for clarifying estimands, assumptions, heterogeneity, and reporting standards rather than as a mathematical derivation or statistical model with fitted parameters. No equations, self-definitional constructs, or predictions that reduce to inputs by construction appear in the abstract or described structure. The argument draws on established prior frameworks (target trial emulation and stepped-wedge designs) without load-bearing self-citations that would render the central claim tautological. The proposal remains self-contained as a suggestion for improved practice and cross-disciplinary insight, with no reduction of its content to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This is a conceptual proposal relying on standard causal inference assumptions from target trial emulation and stepped-wedge trial literature. No new free parameters, invented entities, or ad hoc axioms are introduced beyond those already established in the field.

axioms (1)
  • domain assumption Observational data on staggered policy adoption can be structured to emulate randomized stepped-wedge trial features under target trial emulation assumptions.
    This is the core premise invoked throughout the abstract to justify the emulation approach.

pith-pipeline@v0.9.0 · 5575 in / 1308 out tokens · 43014 ms · 2026-05-10T14:26:16.594143+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

78 extracted references · 71 canonical work pages · 1 internal anchor

  1. [1]

    Alternative causal inference methods in population health research: Evaluating tradeoffs and triangulating evidence

    Matthay EC, Hagan E, Gottlieb LM, Tan ML, Vlahov D, Adler NE, et al. Alternative causal inference methods in population health research: Evaluating tradeoffs and triangulating evidence. SSM - Popul Health. 2020;10:100526. doi:10.1016/j.ssmph.2019.100526

  2. [2]

    Natural experiments: An overview of methods, approaches, and contributions to public health intervention research

    Craig P, Katikireddi SV , Leyland A, Popham F. Natural experiments: An overview of methods, approaches, and contributions to public health intervention research. Annu Rev Public Health. 2017;38(1):39–56. doi:10.1146/annurev-publhealth-031816-044327

  3. [3]

    Estimating the effects of health policy initiatives: Where we are and where we need to go

    Localio AR, Guallar E. Estimating the effects of health policy initiatives: Where we are and where we need to go. Ann Intern Med. 2024;177(11):1586–7. doi:10.7326/M24-0896

  4. [4]

    Designing difference-in- difference studies with staggered treatment adoption: Key concepts and practical guidelines

    Wing C, Yozwiak M, Hollingsworth A, Freedman S, Simon K. Designing difference-in- difference studies with staggered treatment adoption: Key concepts and practical guidelines. Annu Rev Public Health. 2024;45(1):485–505. doi:10.1146/annurev-publhealth-061022- 050825

  5. [5]

    Difference-in-differences with variation in treatment timing

    Goodman-Bacon A. Difference-in-differences with variation in treatment timing. J Econom. 2021;225(2):254–77. doi:10.1016/j.jeconom.2021.03.014

  6. [6]

    Difference-in-differences with multiple time periods

    Callaway B, Sant’Anna PHC. Difference-in-differences with multiple time periods. J Econom. 2021;225(2):200–30. doi:10.1016/j.jeconom.2020.12.001

  7. [7]

    What’s trending in difference-in-differences? A synthesis of the recent econometrics literature

    Roth J, Sant’Anna PHC, Bilinski A, Poe J. What’s trending in difference-in-differences? A synthesis of the recent econometrics literature. J Econom. 2023;235(2):2218–44. doi:10.1016/j.jeconom.2023.03.008

  8. [8]

    Advances in difference-in-differences methods for policy evaluation research

    Wang G, Hamad R, White JS. Advances in difference-in-differences methods for policy evaluation research. Epidemiology. 2024;35(5):628–37. doi:10.1097/EDE.0000000000001755

  9. [9]

    Difference‐in‐differences for health policy and practice: A review of modern methods

    Feng S, Ganguli I, Lee Y , Poe J, Ryan A, Bilinski A. Difference‐in‐differences for health policy and practice: A review of modern methods. Stat Med. 2025;44(23–24):e70247. doi:10.1002/sim.70247

  10. [10]

    Cluster Randomised Trials

    Hayes RJ, Moulton LH. Cluster Randomised Trials. Second Edition. New York: Chapman and Hall/CRC; 2017

  11. [11]

    Estimands in cluster-randomized trials: Choosing analyses that answer the right question

    Kahan BC, Li F, Copas AJ, Harhay MO. Estimands in cluster-randomized trials: Choosing analyses that answer the right question. Int J Epidemiol. 2023;52(1):107–18. doi:10.1093/ije/dyac131

  12. [12]

    Selecting the optimal longitudinal cluster randomized design with a continuous outcome: Parallel-arm, crossover, or stepped-wedge

    Liu J, Li F, Sutcliffe S, Colditz GA. Selecting the optimal longitudinal cluster randomized design with a continuous outcome: Parallel-arm, crossover, or stepped-wedge. Stat Methods Med Res. 2025;34(10):2069–90. doi:10.1177/09622802251360409 20

  13. [13]

    The stepped wedge cluster randomised trial: Rationale, design, analysis, and reporting

    Hemming K, Haines TP, Chilton PJ, Girling AJ, Lilford RJ. The stepped wedge cluster randomised trial: Rationale, design, analysis, and reporting. BMJ. 2015;350:h391. doi:10.1136/bmj.h391

  14. [14]

    Statistical efficiency and optimal design for stepped cluster studies under linear mixed effects models

    Girling AJ, Hemming K. Statistical efficiency and optimal design for stepped cluster studies under linear mixed effects models. Stat Med. 2016;35(13):2149–66. doi:10.1002/sim.6850

  15. [15]

    Reflection on modern methods: When is a stepped-wedge cluster randomized trial a good study design choice? Int J Epidemiol

    Hemming K, Taljaard M. Reflection on modern methods: When is a stepped-wedge cluster randomized trial a good study design choice? Int J Epidemiol. 2020;49(3):1043–52. doi:10.1093/ije/dyaa077

  16. [16]

    American Journal of Epidemiology , volume=

    Hernán MA, Robins JM. Using big data to emulate a target trial when a randomized trial is not available. Am J Epidemiol. 2016;183(8):758–64. doi:10.1093/aje/kwv254

  17. [17]

    Target trial emulation: A framework for causal inference from observational data

    Hernán MA, Wang W, Leaf DE. Target trial emulation: A framework for causal inference from observational data. JAMA. 2022;328(24):2446. doi:10.1001/jama.2022.21383

  18. [18]

    Target trial emulation

    Hubbard RA, Gatsonis CA, Hogan JW, Hunter DJ, Normand SLT, Troxel AB. “Target trial emulation” for observational studies — Potential and pitfalls. N Engl J Med. 2024;391(21):1975–7. doi:10.1056/NEJMp2407586

  19. [19]

    Transparent Reporting of Observational Studies Emulating a Target Trial—The TARGET Statement.JAMA2025;334(12):1084–1093

    Cashin AG, Hansford HJ, Hernán MA, Swanson SA, Lee H, Jones MD, et al. Transparent reporting of observational studies emulating a target trial—The TARGET Statement. JAMA. 2025;334(12):1084. doi:10.1001/jama.2025.13350

  20. [20]

    Four targets: An enhanced framework for guiding causal inference from observational data

    Lu H, Li F, Lesko CR, Fink DS, Rudolph KE, Harhay MO, et al. Four targets: An enhanced framework for guiding causal inference from observational data. Int J Epidemiol. 2025;54(1):dyaf003. doi:10.1093/ije/dyaf003

  21. [21]

    A trial emulation approach for policy evaluations with group-level longitudinal data

    Ben-Michael E, Feller A, Stuart EA. A trial emulation approach for policy evaluations with group-level longitudinal data. Epidemiology. 2021;32(4):533–40. doi:10.1097/EDE.0000000000001369

  22. [22]

    Target trial emulation for evaluating health policy

    Seewald NJ, McGinty EE, Stuart EA. Target trial emulation for evaluating health policy. Ann Intern Med. 2024;177(11):1530–8. doi:10.7326/M23-2440

  23. [23]

    Emulating target trials of postexposure vaccines using observational data

    Boyer C, Lipsitch M. Emulating target trials of postexposure vaccines using observational data. Am J Epidemiol. 2025;194(7):2037–46. doi:10.1093/aje/kwae350

  24. [24]

    Specifying a target trial prevents immortal time bias and other self-inflicted injuries in observational analyses

    Hernán MA, Sauer BC, Hernández-Díaz S, Platt R, Shrier I. Specifying a target trial prevents immortal time bias and other self-inflicted injuries in observational analyses. J Clin Epidemiol. 2016;79:70–5. doi:10.1016/j.jclinepi.2016.04.014

  25. [25]

    Transparency and rigor: Target trial emulation aims to achieve both

    De Stavola BL, Gomes M, Katsoulis M. Transparency and rigor: Target trial emulation aims to achieve both. Epidemiology. 2023;34(5):624–6. doi:10.1097/EDE.0000000000001638 21

  26. [26]

    The target trial framework for causal inference from observational data: Why and when is it helpful? Ann Intern Med

    Hernán MA, Dahabreh IJ, Dickerman BA, Swanson SA. The target trial framework for causal inference from observational data: Why and when is it helpful? Ann Intern Med. 2025;178(3):402–7. doi:10.7326/ANNALS-24-01871

  27. [27]

    Journal of the Royal Statistical Society: Series A (Statistics in Society) , author =

    Imai K, King G, Stuart EA. Misunderstandings between experimentalists and observationalists about causal inference. J R Stat Soc Ser A Stat Soc. 2008;171(2):481–502. doi:10.1111/j.1467-985X.2007.00527.x

  28. [28]

    Observational data for comparative effectiveness research: An emulation of randomised trials of statins and primary prevention of coronary heart disease

    Danaei G, Rodríguez LAG, Cantero OF, Logan R, Hernán MA. Observational data for comparative effectiveness research: An emulation of randomised trials of statins and primary prevention of coronary heart disease. Stat Methods Med Res. 2013;22(1):70–96. doi:10.1177/0962280211403603

  29. [29]

    Language Model Cascades: Token-Level Uncertainty and Beyond

    Kennedy-Shaffer L. A generalized difference-in-differences estimator for randomized stepped-wedge and observational staggered adoption settings [Preprint]. arXiv; 2024. Available from: https://arxiv.org/abs/2405.08730 doi:10.48550/ARXIV .2405.08730

  30. [30]

    Consort 2010 statement: Extension to cluster randomised trials

    Campbell MK, Piaggio G, Elbourne DR, Altman DG, for the CONSORT Group. Consort 2010 statement: Extension to cluster randomised trials. BMJ. 2012;345:e5661. doi:10.1136/bmj.e5661

  31. [31]

    Reporting of stepped wedge cluster randomised trials: Extension of the CONSORT 2010 statement with explanation and elaboration

    Hemming K, Taljaard M, McKenzie JE, Hooper R, Copas A, Thompson JA, et al. Reporting of stepped wedge cluster randomised trials: Extension of the CONSORT 2010 statement with explanation and elaboration. BMJ. 2018;363:k1614. doi:10.1136/bmj.k1614

  32. [32]

    A scoping review identified additional considerations for defining estimands in cluster randomized trials

    Bi D, Copas A, Li F, Kahan BC. A scoping review identified additional considerations for defining estimands in cluster randomized trials. J Clin Epidemiol. 2026;189:112015. doi:10.1016/j.jclinepi.2025.112015

  33. [33]

    Causal inference under multiple versions of treatment

    VanderWeele TJ, Hernan MA. Causal inference under multiple versions of treatment. J Causal Inference. 2013;1(1):1–20. doi:10.1515/jci-2012-0002

  34. [34]

    Designing a stepped wedge trial: Three main designs, carry-over effects and randomisation approaches

    Copas AJ, Lewis JJ, Thompson JA, Davey C, Baio G, Hargreaves JR. Designing a stepped wedge trial: Three main designs, carry-over effects and randomisation approaches. Trials. 2015;16(1):352. doi:10.1186/s13063-015-0842-7

  35. [35]

    Information content of stepped‐wedge designs when treatment effect heterogeneity and/or implementation periods are present

    Kasza J, Taljaard M, Forbes AB. Information content of stepped‐wedge designs when treatment effect heterogeneity and/or implementation periods are present. Stat Med. 2019;38(23):4686–701. doi:10.1002/sim.8327

  36. [36]

    Informative cluster size in cluster- randomised trials: A case study from the TRIGGER trial

    Kahan BC, Li F, Blette B, Jairath V , Copas A, Harhay M. Informative cluster size in cluster- randomised trials: A case study from the TRIGGER trial. Clin Trials. 2023;20(6):661–9. doi:10.1177/17407745231186094

  37. [37]

    Model-robust standardization in cluster- randomized trials

    Li F, Tong J, Fang X, Cheng C, Kahan BC, Wang B. Model-robust standardization in cluster- randomized trials. Stat Med. 2025;44(20–22):e70270. doi:10.1002/sim.70270 22

  38. [38]

    Analysis of stepped wedge cluster randomized trials in the presence of a time‐varying treatment effect

    Kenny A, V oldal EC, Xia F, Heagerty PJ, Hughes JP. Analysis of stepped wedge cluster randomized trials in the presence of a time‐varying treatment effect. Stat Med. 2022;41(22):4311–39. doi:10.1002/sim.9511

  39. [39]

    Mixed effects approach to the analysis of the stepped wedge cluster randomised trial—Investigating the confounding effect of time through simulation

    Nickless A, V oysey M, Geddes J, Yu LM, Fanshawe TR. Mixed effects approach to the analysis of the stepped wedge cluster randomised trial—Investigating the confounding effect of time through simulation. PLOS ONE. 2018;13(12):e0208876. doi:10.1371/journal.pone.0208876

  40. [40]

    How to achieve model-robust inference in stepped wedge trials with model-based methods? Biometrics

    Wang B, Wang X, Li F. How to achieve model-robust inference in stepped wedge trials with model-based methods? Biometrics. 2024;80(4):ujae123. doi:10.1093/biomtc/ujae123

  41. [41]

    Sample size calculation for stepped wedge and other longitudinal cluster randomised trials

    Hooper R, Teerenstra S, De Hoop E, Eldridge S. Sample size calculation for stepped wedge and other longitudinal cluster randomised trials. Stat Med. 2016;35(26):4718–28. doi:10.1002/sim.7028

  42. [42]

    Guidelines for the content of statistical analysis plans in clinical trials: Protocol for an extension to cluster randomized trials

    Hemming K, Thompson JY , Hooper RL, Ukoumunne OC, Li F, Caille A, et al. Guidelines for the content of statistical analysis plans in clinical trials: Protocol for an extension to cluster randomized trials. Trials. 2025;26(1):72. doi:10.1186/s13063-025-08756-3

  43. [43]

    Assessing the effectiveness of COVID-19 vaccine lotteries: A cross-state synthetic control methods approach

    Fuller S, Kazemian S, Algara C, Simmons DJ. Assessing the effectiveness of COVID-19 vaccine lotteries: A cross-state synthetic control methods approach. Pereira T, editor. PLOS ONE. 2022;17(9):e0274374. doi:10.1371/journal.pone.0274374

  44. [44]

    The Ohio vaccine lottery and starting vaccination rates

    Brehm ME, Brehm PA, Saavedra M. The Ohio vaccine lottery and starting vaccination rates. Am J Health Econ. 2022 Jun 1;8(3):387–411. doi:10.1086/718512

  45. [45]

    Did Ohio’s vaccine lottery increase vaccination rates? A pre-registered, synthetic control study

    Lang D, Esbenshade L, Willer R. Did Ohio’s vaccine lottery increase vaccination rates? A pre-registered, synthetic control study. J Exp Polit Sci. 2023;10(2):242–60. doi:10.1017/XPS.2021.32

  46. [46]

    Weeks ending log 2020–2021 [MMWR weeks] [Internet]

    Morbidity and Mortality Weekly Report. Weeks ending log 2020–2021 [MMWR weeks] [Internet]. U.S. Centers for Disease Control and Prevention; 2019 Sep [cited 2026 Apr 2]. Available from: https://ndc.services.cdc.gov/wp-content/uploads/W2021-22.pdf

  47. [47]

    Quasi-experimental methods for pharmacoepidemiology: difference-in- differences and synthetic control methods with case studies for vaccine evaluation

    Kennedy-Shaffer L. Quasi-experimental methods for pharmacoepidemiology: difference-in- differences and synthetic control methods with case studies for vaccine evaluation. Am J Epidemiol. 2024;193(7):1050–8. doi:10.1093/aje/kwae019

  48. [48]

    Design and analysis of stepped wedge cluster randomized trials

    Hussey MA, Hughes JP. Design and analysis of stepped wedge cluster randomized trials. Contemp Clin Trials. 2007;28(2):182–91. doi:10.1016/j.cct.2006.05.007

  49. [49]

    Monetary incentives increase COVID-19 vaccinations

    Campos-Mercade P, Meier AN, Schneider FH, Meier S, Pope D, Wengström E. Monetary incentives increase COVID-19 vaccinations. Science. 2021;374(6569):879–82. doi:10.1126/science.abm0475 23

  50. [50]

    swdpwr: A SAS macro and an R package for power calculations in stepped wedge cluster randomized trials

    Chen J, Zhou X, Li F, Spiegelman D. swdpwr: A SAS macro and an R package for power calculations in stepped wedge cluster randomized trials. Comput Methods Programs Biomed. 2022;213:106522. doi:10.1016/j.cmpb.2021.106522

  51. [51]

    Using synthetic controls: Feasibility, data requirements, and methodological aspects

    Abadie A. Using synthetic controls: Feasibility, data requirements, and methodological aspects. J Econ Lit. 2021;59(2):391–425. doi:10.1257/jel.20191450

  52. [52]

    NASHP State Tracker [Internet]

    National Academy for State Health Policy. NASHP State Tracker [Internet]. 2025 [cited 2025 Oct 28]. State efforts to limit or enforce COVID-19 vaccine mandates. Available from: https://nashp.org/state-tracker/state-efforts-to-ban-or-enforce-covid-19-vaccine-mandates- and-passports/

  53. [53]

    US state vaccine mandates did not influence COVID-19 vaccination rates but reduced uptake of COVID-19 boosters and flu vaccines compared to bans on vaccine restrictions

    Rains SA, Richards AS. US state vaccine mandates did not influence COVID-19 vaccination rates but reduced uptake of COVID-19 boosters and flu vaccines compared to bans on vaccine restrictions. Proc Natl Acad Sci. 2024;121(8):e2313610121. doi:10.1073/pnas.2313610121

  54. [54]

    Vaccination mandates and their alternatives and complements

    Schmid P, Böhm R, Das E, Holford D, Korn L, Leask J, et al. Vaccination mandates and their alternatives and complements. Nat Rev Psychol. 2024;3(12):789–803. doi:10.1038/s44159- 024-00381-2

  55. [55]

    US states that mandated COVID-19 vaccination see higher, not lower, take-up of COVID-19 boosters and flu vaccines

    Fitzgerald J. US states that mandated COVID-19 vaccination see higher, not lower, take-up of COVID-19 boosters and flu vaccines. Proc Natl Acad Sci. 2024;121(41):e2403758121. doi:10.1073/pnas.2403758121

  56. [56]

    Information content of cluster–period cells in stepped wedge trials

    Kasza J, Forbes AB. Information content of cluster–period cells in stepped wedge trials. Biometrics. 2019;75(1):144–52. doi:10.1111/biom.12959

  57. [57]

    Heterogeneous treatment effects and bias in the analysis of the stepped wedge design

    Lindner S, McConnell KJ. Heterogeneous treatment effects and bias in the analysis of the stepped wedge design. Health Serv Outcomes Res Methodol. 2021;21(4):419–38. doi:10.1007/s10742-021-00244-w

  58. [58]

    M., Turner, E

    Lee KM, Turner EL, Kenny A. Analysis of stepped‐wedge cluster randomized trials when treatment effects vary by exposure time or calendar time. Stat Med. 2025;44(20–22):e70256. doi:10.1002/sim.70256

  59. [59]

    Key considerations for designing, conducting and analysing a cluster randomized trial

    Hemming K, Taljaard M. Key considerations for designing, conducting and analysing a cluster randomized trial. Int J Epidemiol. 2023;52(5):1648–58. doi:10.1093/ije/dyad064

  60. [60]

    A review of current practice in the design and analysis of extremely small stepped-wedge cluster randomized trials

    Tong G, Nevins P, Ryan M, Davis-Plourde K, Ouyang Y , Pereira Macedo JA, et al. A review of current practice in the design and analysis of extremely small stepped-wedge cluster randomized trials. Clin Trials. 2025 Feb;22(1):45–56. doi:10.1177/17407745241276137

  61. [61]

    Sample size calculators for planning stepped-wedge cluster randomized trials: A review and comparison

    Ouyang Y , Li F, Preisser JS, Taljaard M. Sample size calculators for planning stepped-wedge cluster randomized trials: A review and comparison. Int J Epidemiol. 2022;51(6):2000–13. doi:10.1093/ije/dyac123 24

  62. [62]

    Practical considerations for sample size calculation for cluster randomized trials

    Leyrat C, Eldridge S, Taljaard M, Hemming K. Practical considerations for sample size calculation for cluster randomized trials. J Epidemiol Popul Health. 2024;72(1):202198. doi:10.1016/j.jeph.2024.202198

  63. [63]

    Novel methods for the analysis of stepped wedge cluster randomized trials

    Kennedy-Shaffer L, De Gruttola V , Lipsitch M. Novel methods for the analysis of stepped wedge cluster randomized trials. Stat Med. 2020;39(7):815–44. doi:10.1002/sim.8451

  64. [64]

    Robust analysis of stepped wedge trials using composite likelihood models

    V oldal EC, Kenny A, Xia F, Heagerty P, Hughes JP. Robust analysis of stepped wedge trials using composite likelihood models. Stat Med. 2024;43(17):3326–52. doi:10.1002/sim.10120

  65. [65]

    Cluster randomized trial designs for modeling time‐varying intervention effects

    Lee KM, Cheung YB. Cluster randomized trial designs for modeling time‐varying intervention effects. Stat Med. 2024;43(1):49–60. doi:10.1002/sim.9941

  66. [66]

    Assessing exposure-time treatment effect heterogeneity in stepped-wedge cluster randomized trials

    Maleyeff L, Li F, Haneuse S, Wang R. Assessing exposure-time treatment effect heterogeneity in stepped-wedge cluster randomized trials. Biometrics. 2023;79(3):2551–64. doi:10.1111/biom.13803

  67. [67]

    Planning stepped wedge cluster randomized trials to detect treatment effect heterogeneity

    Li F, Chen X, Tian Z, Wang R, Heagerty PJ. Planning stepped wedge cluster randomized trials to detect treatment effect heterogeneity. Stat Med. 2024;43(5):890–911. doi:10.1002/sim.9990

  68. [68]

    Use of the stepped wedge design cannot be recommended: A critical appraisal and comparison with the classic cluster randomized controlled trial design

    Kotz D, Spigt M, Arts ICW, Crutzen R, Viechtbauer W. Use of the stepped wedge design cannot be recommended: A critical appraisal and comparison with the classic cluster randomized controlled trial design. J Clin Epidemiol. 2012;65(12):1249–52. doi:10.1016/j.jclinepi.2012.06.004

  69. [69]

    Policy effect evaluation under counterfactual neighbourhood intervention in the presence of spillover

    Lee Y , Hettinger G, Mitra N. Policy effect evaluation under counterfactual neighbourhood intervention in the presence of spillover. J R Stat Soc Ser A Stat Soc. 2026;189(1):392–411. doi:10.1093/jrsssa/qnae153

  70. [70]

    Two-way fixed effects estimators with heterogeneous treatment effects

    de Chaisemartin C, D’Haultfœuille X. Two-way fixed effects estimators with heterogeneous treatment effects. Am Econ Rev. 2020;110(9):2964–96. doi:10.1257/aer.20181169

  71. [71]

    Are Target Trial Emulations the Gold Standard for Observational Studies? Epidemiology

    Pearce N, Vandenbroucke JP. Are target trial emulations the gold standard for observational studies? Epidemiology. 2023;34(5):614–8. doi:10.1097/EDE.0000000000001636

  72. [72]

    Emulating randomized trials: Treading carefully and pushing the limits

    Renoux C, Suissa S. Emulating randomized trials: Treading carefully and pushing the limits. Am J Epidemiol. 2025;194(5):1460–1. doi:10.1093/aje/kwaf023

  73. [73]

    Invited Commentary: Conducting and emulating trials to study effects of social interventions

    Rojas-Saunero LP, Labrecque JA, Swanson SA. Invited Commentary: Conducting and emulating trials to study effects of social interventions. Am J Epidemiol. 2022;191(8):1453–

  74. [74]

    doi:10.1093/aje/kwac066

  75. [75]

    The staircase cluster randomised trial design: A pragmatic alternative to the stepped wedge

    Grantham KL, Forbes AB, Hooper R, Kasza J. The staircase cluster randomised trial design: A pragmatic alternative to the stepped wedge. Stat Methods Med Res. 2024;33(1):24–41. doi:10.1177/09622802231202364 25 Appendix 1 We generate example power calculations for three possible study designs based on the target trial emulation of the vaccine lottery polici...

  76. [76]

    Use all states in the CDC-defined Midwest region (12 total states), with all observations from CDC MMWR Weeks 15–30 of 2021

  77. [77]

    Use the four intervention states in the CDC-defined Midwest region (Ohio, Illinois, Michigan, Missouri) and a matched-comparison state for each, with all observations from CDC MMWR Weeks 15–30 of 2021

  78. [78]

    The schematics for these designs are shown in Figure 1 of the main text

    Use only the four intervention states in the CDC-defined Midwest region, with all observations from CDC MMWR Weeks 15–30 of 2021. The schematics for these designs are shown in Figure 1 of the main text. Note that these are only examples of designs that could be considered to illustrate how stepped-wedge trial power calculations can be used in the target t...