Recognition: 2 theorem links
· Lean TheoremDigital Twins as Synthetic Controls in Single-Arm Trials
Pith reviewed 2026-05-14 19:16 UTC · model grok-4.3
The pith
Digital twins from machine learning models can serve as synthetic controls in single-arm clinical trials
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Outcome-model-based synthetic control arms are an important tool for single-arm trials. Digital twins, which are personalized predictions of disease progression generated from machine learning models trained on historical datasets, naturally leverage these flexible approaches to yield more robust estimates of treatment effects and provide a principled way to incorporate corrections when external data are not directly comparable.
What carries the argument
Digital twins: personalized predictions of disease progression from machine learning models trained on historical datasets, serving as outcome-model-based synthetic controls
Load-bearing premise
Machine learning models trained on historical datasets produce accurate and unbiased predictions of disease progression for patients in the current single-arm trial even when populations differ in unmeasured ways
What would settle it
A randomized controlled trial of the same intervention showing a treatment effect estimate that differs substantially from the one derived using digital twin synthetic controls
Figures
read the original abstract
Single-arm trials are an important study design for evaluating drug efficacy and safety without enrolling patients into a control arm. Although they do not provide the gold-standard evidence of randomized controlled trials, they are increasingly used in clinical development as they offer an efficient, ethical, and practical alternative. A wide variety of approaches can be used to construct control comparators and estimate treatment effects, from fixed comparators informed by clinical knowledge to data-based and model-based patient-level comparators, also known as synthetic controls. Powerful and flexible machine learning models can allow outcome-model-based synthetic controls to overcome key limitations of direct data-based approaches, yield more robust estimates of treatment effects, and provide a principled way to incorporate corrections or encode additional assumptions when external data are not directly comparable. In this work, we argue that outcome-model-based synthetic control arms are an important tool for single-arm trials. We focus on digital twins, personalized predictions of disease progression generated from machine learning models trained on historical datasets, which naturally leverage these flexible approaches. We review doubly robust estimators, present power and sample size formulas, and discuss trade-offs in selecting historical data for training and analysis. We also outline practical considerations for deploying digital twins within the framework of recent FDA draft guidance on the use of artificial intelligence in drug development. Finally, we reanalyze data from trials in amyotrophic lateral sclerosis and Huntington's disease to demonstrate the proposed methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript argues that outcome-model-based synthetic controls using digital twins—personalized ML predictions of disease progression trained on historical datasets—offer a flexible and robust approach for estimating treatment effects in single-arm trials. It reviews doubly robust estimators, presents power and sample size formulas, discusses trade-offs in selecting historical training data, outlines practical considerations for alignment with FDA draft guidance on AI in drug development, and demonstrates the methods via reanalyses of amyotrophic lateral sclerosis and Huntington's disease trial data.
Significance. If the core assumptions hold, the work provides a timely framework for improving rigor in single-arm trials, which are common in rare-disease settings where RCTs are impractical. The integration of flexible ML outcome models with doubly robust estimation, combined with power formulas and regulatory alignment, could support more efficient trial design and analysis. The reanalyses illustrate feasibility on real neurodegenerative data, and the emphasis on handling non-comparable external data is a practical strength.
major comments (3)
- [Section on doubly robust estimators] The manuscript references doubly robust estimators but provides no explicit mathematical formulation (e.g., the precise form of the augmentation term combining the digital-twin outcome model with any weighting or propensity component) or derivation of consistency under distribution shift. Without this, it is difficult to verify the conditions under which double robustness protects against misspecification when the ML model is trained on historical data that may differ from the trial population in unmeasured prognostic factors.
- [Reanalysis sections] In the reanalysis sections for ALS and Huntington's data, the manuscript does not report model training details (feature engineering, hyperparameter selection, cross-validation strategy), predictive performance metrics on held-out historical data, or sensitivity analyses for covariate or outcome shifts between historical and trial cohorts. These omissions limit assessment of whether the reported treatment-effect estimates remain reliable when the digital-twin predictions are transported to the current trial population.
- [Power and sample size formulas] The power and sample size formulas are presented without accompanying derivation, simulation studies, or empirical validation showing type-I error control and coverage under realistic ML model misspecification or distribution shift scenarios. This weakens the practical utility of the formulas for trial planning.
minor comments (2)
- [Abstract] The abstract states that the methods are demonstrated on ALS and Huntington's data but does not summarize the key numerical findings (e.g., estimated treatment effects or confidence intervals), which would help readers quickly gauge the magnitude of the results.
- [Notation and methods] Notation for the digital-twin predictions and the doubly robust estimator is introduced without a dedicated notation table or consistent symbol definitions across sections, making some equations harder to follow.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments, which have identified important opportunities to strengthen the clarity and rigor of our manuscript. We address each major comment below and will revise the paper accordingly.
read point-by-point responses
-
Referee: [Section on doubly robust estimators] The manuscript references doubly robust estimators but provides no explicit mathematical formulation (e.g., the precise form of the augmentation term combining the digital-twin outcome model with any weighting or propensity component) or derivation of consistency under distribution shift. Without this, it is difficult to verify the conditions under which double robustness protects against misspecification when the ML model is trained on historical data that may differ from the trial population in unmeasured prognostic factors.
Authors: We agree that an explicit formulation and derivation will improve verifiability. In the revised manuscript we will add the precise doubly robust estimator expression (augmented inverse-probability-weighted form that combines the digital-twin outcome predictions with a propensity-based correction term) together with a short derivation of its consistency under distribution shift between historical training data and the trial population, conditional on correct specification of either the outcome model or the propensity model. revision: yes
-
Referee: [Reanalysis sections] In the reanalysis sections for ALS and Huntington's data, the manuscript does not report model training details (feature engineering, hyperparameter selection, cross-validation strategy), predictive performance metrics on held-out historical data, or sensitivity analyses for covariate or outcome shifts between historical and trial cohorts. These omissions limit assessment of whether the reported treatment-effect estimates remain reliable when the digital-twin predictions are transported to the current trial population.
Authors: We acknowledge these omissions limit reproducibility and transportability assessment. The revised manuscript will include a new subsection reporting feature engineering choices, hyperparameter tuning via cross-validation, predictive performance metrics (e.g., RMSE on held-out historical data), and sensitivity analyses that examine the impact of covariate and outcome distribution shifts between the historical training cohorts and the trial populations. revision: yes
-
Referee: [Power and sample size formulas] The power and sample size formulas are presented without accompanying derivation, simulation studies, or empirical validation showing type-I error control and coverage under realistic ML model misspecification or distribution shift scenarios. This weakens the practical utility of the formulas for trial planning.
Authors: We agree that supporting material is needed for practical use. The revision will add an appendix containing the full derivation of the power and sample-size formulas from the asymptotic variance of the doubly robust estimator, plus simulation studies that evaluate type-I error control and coverage under ML misspecification and realistic distribution-shift scenarios between historical and trial data. revision: yes
Circularity Check
No significant circularity; claims rest on external estimators and independent reanalyses
full rationale
The paper reviews established doubly robust estimators, derives power formulas from standard statistical principles, and demonstrates methods via reanalysis of external ALS and Huntington's datasets. No equations or central claims reduce by construction to fitted parameters renamed as predictions, nor do they depend on self-citation chains or author-specific uniqueness theorems. The core argument for digital twins as synthetic controls is supported by references to prior literature on doubly robust methods without self-referential loops, making the derivation self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Historical datasets can train models that generalize to predict outcomes in new single-arm trial populations
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclearoutcome-model-based synthetic control arms... digital twins... doubly robust estimators... AIPW
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat ≃ Nat recovery unclearpower and sample size formulas for AIPW
Reference graph
Works this paper leans on
-
[1]
Hern´ an and James M
Miguel A. Hern´ an and James M. Robins.Causal Inference: What If. Chapman & Hall/CRC, Boca Raton, FL, 2020
2020
-
[2]
Imbens and Donald B
Guido W. Imbens and Donald B. Rubin.Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press, Cambridge, 2015
2015
-
[3]
Donald B. Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies.Journal of Educational Psychology, 66(5):688–701, 1974
1974
-
[4]
Long story short: Omitted variable bias in causal machine learning, 2024
Victor Chernozhukov, Carlos Cinelli, Whitney Newey, Amit Sharma, and Vasilis Syrgkanis. Long story short: Omitted variable bias in causal machine learning, 2024
2024
-
[5]
Placebo effects: from the neurobiological paradigm to translational implica- tions.Neuron, 84(3):623–637, November 2014
Fabrizio Benedetti. Placebo effects: from the neurobiological paradigm to translational implica- tions.Neuron, 84(3):623–637, November 2014
2014
-
[6]
A novel cognitive disease progression model for clinical trials in autosomal-dominant alzheimer’s disease.Stat
Guoqiao Wang, Scott Berry, Chengjie Xiong, Jason Hassenstab, Melanie Quintana, Eric M Mc- Dade, Paul Delmar, Matteo Vestrucci, Gopalan Sethuraman, Randall J Bateman, and Dominantly Inherited Alzheimer Network Trials Unit. A novel cognitive disease progression model for clinical trials in autosomal-dominant alzheimer’s disease.Stat. Med., 37(21):3047–3055,...
2018
-
[7]
Rosenbaum and Donald B
Paul R. Rosenbaum and Donald B. Rubin. The central role of the propensity score in observational studies for causal effects.Biometrika, 70(1):41–55, 1983
1983
-
[8]
Alberto Abadie and Guido W. Imbens. Large sample properties of matching estimators for average treatment effects.Econometrica, 74(1):235–267, 2006
2006
-
[9]
Elizabeth A. Stuart. Matching methods for causal inference: A review and a look forward. Statistical Science, 25(1):1–21, 2010
2010
-
[10]
Imbens, and Geert Ridder
Keisuke Hirano, Guido W. Imbens, and Geert Ridder. Efficient estimation of average treatment effects using the estimated propensity score.Econometrica, 71(4):1161–1189, 2003
2003
-
[11]
Tsiatis.Semiparametric Theory and Missing Data
Anastasios A. Tsiatis.Semiparametric Theory and Missing Data. Springer Series in Statistics. Springer, New York, 2006. 25
2006
-
[12]
Double/debiased machine learning for treatment and structural parameters.The Econometrics Journal, 21(1):C1–C68, 2018
Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whit- ney Newey, and James Robins. Double/debiased machine learning for treatment and structural parameters.The Econometrics Journal, 21(1):C1–C68, 2018
2018
-
[13]
Edward H. Kennedy. Semiparametric theory and empirical processes in causal inference.Statistical Science, 37(3):289–308, 2022
2022
-
[14]
Increasing the effi- ciency of randomized trial estimates via linear adjustment for a prognostic score.The International Journal of Biostatistics, 18(2):329–356, 2022
Alejandro Schuler, David Walsh, Diana Hall, Jon Walsh, and Charles Fisher. Increasing the effi- ciency of randomized trial estimates via linear adjustment for a prognostic score.The International Journal of Biostatistics, 18(2):329–356, 2022
2022
-
[15]
van der Laan and Daniel Rubin
Mark J. van der Laan and Daniel Rubin. Targeted maximum likelihood learning.The International Journal of Biostatistics, 2(1):1–40, 2006
2006
-
[16]
van der Laan and Sherri Rose.Targeted Learning: Causal Inference for Observational and Experimental Data
Mark J. van der Laan and Sherri Rose.Targeted Learning: Causal Inference for Observational and Experimental Data. Springer, 2011
2011
-
[17]
Food and Drug Administration
U.S. Food and Drug Administration. Considerations for the use of artificial intelligence to support regulatory decision-making for drug and biological products. Draft Guidance for Industry and Other Interested Parties, January 2025
2025
-
[18]
Food and Drug Administration
U.S. Food and Drug Administration. Considerations for the design and conduct of externally controlled trials for drug and biological products. Draft guidance for industry, Center for Drug Evaluation and Research (CDER), Center for Biologics Evaluation and Research (CBER), and Oncology Center of Excellence (OCE), Silver Spring, MD, February 2023. Docket No...
2023
-
[19]
Food and Drug Administration
U.S. Food and Drug Administration. Real-world data: Assessing electronic health records and medical claims data to support regulatory decision-making for drug and biological products. Guid- ance for industry, Center for Drug Evaluation and Research (CDER) and Center for Biologics Evaluation and Research (CBER), Silver Spring, MD, July 2024
2024
-
[20]
Food and Drug Administration
U.S. Food and Drug Administration. Real-world data: Assessing registries to support regulatory decision-making for drug and biological products. Guidance for industry, Center for Drug Eval- uation and Research (CDER) and Center for Biologics Evaluation and Research (CBER), Silver Spring, MD, December 2023. Docket No. FDA-2021-D-1146
2023
-
[21]
Cudkowicz, Jeremy M
Merit E. Cudkowicz, Jeremy M. Shefner, David A. Schoenfeld, Robert H. Brown, Heather Johnson, Mohsin Qureshi, Alan Pestronk, James Caress, Peter Donofrio, Erik Sorenson, Walter G. Bradley, William E. Antholine, Sherry Shrader, Tom Ferguson, , and ALS CNTF Treatment Study Group. Trial of celecoxib in amyotrophic lateral sclerosis.Annals of Neurology, 60(1)...
2006
-
[22]
McDermott, Karl Kieburtz, Elizabeth A
Aileen McGarry, Michael P. McDermott, Karl Kieburtz, Elizabeth A. de Blieck, M. Flint Beal, Rong Chen, Jody Corey-Bloom, Andrew Feigin, Tamara Pringsheim, Ira Shoulson, John Tetrud, Richard L. Watts, Hui Zhao, and Huntington Study Group. A randomized, double-blind, placebo- controlled trial of coenzyme q10 in huntington disease.Neurology, 88(2):152–159, 2017
2017
-
[23]
Peter C. Austin. Optimal caliper widths for propensity-score matching when estimating differ- ences in means and differences in proportions in observational studies.Pharmaceutical Statistics, 10(2):150–161, 2011
2011
-
[24]
Politis and Joseph P
Dimitris N. Politis and Joseph P. Romano. Large sample confidence regions based on subsamples under minimal assumptions.The Annals of Statistics, 22(4):2031–2050, 1994
2031
-
[25]
Alberto Abadie and Guido W. Imbens. On the failure of the bootstrap for matching estimators. Econometrica, 76(6):1537–1557, 2008
2008
-
[26]
Edward H. Kennedy. Semiparametric doubly robust targeted double machine learning: A re- view. InHandbook of Statistical Methods for Precision Medicine, pages 207–236. Chapman and Hall/CRC, 2024. 26
2024
-
[27]
Demystifying statistical learning based on efficient influence functions.The American Statistician, 76(3):292–304, 2022
Oliver Hines, Oliver Dukes, Karla Diaz-Ordaz, and Stijn Vansteelandt. Demystifying statistical learning based on efficient influence functions.The American Statistician, 76(3):292–304, 2022
2022
-
[28]
X x µ1(x)p(x|A= 1) # = X x [EIF[µ1(x)]p(x|A= 1) +µ 1(x)EIF[p(x|A= 1)]], EIF[τ0] =EIF
Nameyeh Alam, Jake Basilico, Daniele Bertolini, Satish Casie Chetty, Heather D’Angelo, Ryan Douglas, Charles K. Fisher, Franklin Fuller, Melissa Gomes, Rishabh Gupta, Alex Lang, Anton Loukianov, Rachel Mak-McCully, Cary Murray, Hanalei Pham, Susanna Qiao, Elena Ryapolova- Webb, Aaron Smith, Dimitri Theoharatos, Anil Tolwani, Eric W. Tramel, Anna Vidovszky...
-
[29]
D Case Study Datasets In this appendix, we describe the datasets used for the case studies analysis of Sec
in terms of the marginal varianceσ 2 0 and the correlation ρ0 = Corr[µ0(x), Y|A= 0], assuming a constant treatment effect. D Case Study Datasets In this appendix, we describe the datasets used for the case studies analysis of Sec. 9. For each indica- tion, we report the baseline characteristics of the trial analyzed as well as the baseline characteristics...
1943
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.