arxiv: 2605.03282 · v1 · submitted 2026-05-05 · 📊 stat.ME

Recognition: unknown

Externally Controlled Trials: A Review of Design and Borrowing Through a Causal Lens

Ke Zhu , Rima Izem , Peng Yang , Ying Yuan , Herbert Pang , Mark van der Laan , Lei Nie , Birol Emir

show 3 more authors

Pallavi Mishra-Kalyani Hana Lee Shu Yang

Authors on Pith no claims yet

Pith reviewed 2026-05-07 14:41 UTC · model grok-4.3

classification 📊 stat.ME

keywords externally controlled trialscausal inferenceBayesian borrowinghybrid trial designsreal-world datasensitivity analysissingle-arm trialscovariate shift

0 comments

The pith

A six-step causal roadmap organizes methods for externally controlled trials to clarify estimands, assumptions, and borrowing strategies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews externally controlled trials where randomized controls are not feasible, such as in rare diseases or oncology. It organizes the scattered literature on causal inference, Bayesian borrowing, and hybrid designs into a single six-step scientific roadmap. This roadmap distinguishes single-arm trials that compare to external controls from hybrid trials that augment internal controls with external data. The approach shows how statistical methods arise from causal identification and how they balance efficiency against robustness when covariate shift or outcome drift occurs. A reader would care because the framework aims to support more reliable integration of real-world or historical data into clinical and regulatory decisions.

Core claim

By framing ECT methodology through a causal lens, the review establishes that a six-step scientific roadmap can organize the field for both single-arm trials evaluating efficacy against external controls and hybrid controlled trials that augment internal controls with external data from real-world sources or historical studies, while clarifying causal estimands, identifiability assumptions, the derivation of statistical parameters, and the efficiency-robustness trade-offs of modeling and borrowing approaches under covariate shift and outcome drift.

What carries the argument

The six-step scientific roadmap that structures ECT methods by first defining causal estimands and identifiability assumptions before deriving statistical parameters and evaluating borrowing strategies.

If this is right

Regulatory and clinical decisions can integrate external data more systematically by following explicit causal estimands and sensitivity analyses.
Borrowing strategies in hybrid trials can be chosen based on explicit trade-offs between efficiency and robustness under covariate shift.
Both Bayesian dynamic borrowing and frequentist methods can be compared within the same roadmap for operating characteristics and software availability.
Sensitivity analysis becomes a required step to assess the impact of outcome drift in single-arm and hybrid settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The roadmap could be extended to post-approval effectiveness studies by adding steps that handle time-varying external data sources.
Future work might test the roadmap against real trial applications to quantify how often identifiability assumptions hold in oncology or pediatric settings.
Connecting the causal steps to specific software tools could reduce the gap between methodological guidance and practical implementation.

Load-bearing premise

The rapidly expanding but fragmented literature on ECTs can be accurately synthesized into one coherent six-step causal roadmap without material omissions or selection bias in the methods reviewed.

What would settle it

A documented ECT method or recent development that cannot be placed within the six-step roadmap while preserving the stated causal estimands and identifiability assumptions would falsify the claimed coherence of the synthesis.

Figures

Figures reproduced from arXiv: 2605.03282 by Birol Emir, Hana Lee, Herbert Pang, Ke Zhu, Lei Nie, Mark van der Laan, Pallavi Mishra-Kalyani, Peng Yang, Rima Izem, Shu Yang, Ying Yuan.

**Figure 1.** Figure 1: Two ECT designs and scientific roadmap. Gray assignment ratios are illustrative view at source ↗

**Figure 2.** Figure 2: Illustration of covariate shift and outcome drift. view at source ↗

read the original abstract

Externally controlled trials (ECTs) are increasingly used when randomized controls are infeasible, unethical, or insufficient, including applications in rare diseases, oncology, pediatrics, and post-approval effectiveness research. Although methodological work has expanded rapidly across causal inference, Bayesian dynamic borrowing, and hybrid trial designs, the literature remains fragmented. We adopt a six-step scientific roadmap to organize modern ECT methodology in two primary settings: (i) single-arm trials that evaluate efficacy through comparison with external controls, and (ii) hybrid controlled trials that augment the internal control arm with external controls drawn from real-world data or historical studies. The roadmap clarifies causal estimands, identifiability assumptions, and how statistical parameters arise from identification, and shows how modeling and borrowing strategies trade off efficiency and robustness, especially under covariate shift and outcome drift. Within this framework, we synthesize and evaluate recent Bayesian and frequentist developments, compare their strengths, limitations, operating characteristics, and available software, and emphasize the role of sensitivity analysis. By re-framing ECT methodology through a causal lens, this work establishes a coherent foundation for integrating external data into regulatory and clinical decision-making and highlights core challenges and opportunities for future research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a review that organizes ECT methods under a causal roadmap but introduces no new results or derivations.

read the letter

The main thing here is that the authors take the fragmented work on externally controlled trials and try to line it up with a six-step causal roadmap for single-arm and hybrid designs. They cover causal estimands, identifiability assumptions, and how borrowing strategies from Bayesian and frequentist sides handle covariate shift and outcome drift, while pointing to software and the need for sensitivity checks. That organizing effort is the core contribution and it can help people working in rare disease or oncology trials see the options in one place. The synthesis of recent developments and the direct comparison of strengths and limitations is practical and fairly even-handed. The abstract roadmap is clear on paper. The soft spots are the usual ones for a review. The value of the claimed coherent foundation depends on how complete the literature selection actually is; if important identifiability conditions under drift or certain hybrid design papers are left out, the roadmap loses some of its force. There are no new simulations, formal proofs, or operating characteristic evaluations here, so it does not test or extend the methods themselves. Readers already deep in causal borrowing or regulatory trial design may find little they have not seen in the cited sources. This paper is for statisticians and trialists who need a structured overview rather than a new tool or theorem. It is not aimed at readers chasing the technical frontier. It deserves peer review because the topic matters for practice and a careful synthesis can be useful if the coverage holds up. Reviewers should focus on checking the completeness of the methods reviewed and whether the six-step structure actually clarifies the trade-offs without material gaps.

Referee Report

2 major / 2 minor

Summary. The paper reviews externally controlled trials (ECTs) by adopting a causal inference lens to organize the fragmented literature. It proposes a six-step scientific roadmap for two settings: single-arm trials evaluating efficacy via external controls and hybrid trials augmenting internal controls with external real-world or historical data. The roadmap addresses causal estimands, identifiability assumptions, derivation of statistical parameters from identification, modeling and borrowing strategies (Bayesian dynamic borrowing and frequentist), efficiency-robustness trade-offs under covariate shift and outcome drift, comparisons of strengths/limitations/operating characteristics/software, and the role of sensitivity analysis. The central aim is to establish a coherent foundation for integrating external data into regulatory and clinical decision-making while highlighting future challenges.

Significance. If the synthesis holds, this work could meaningfully unify developments across causal inference, Bayesian borrowing, and hybrid designs, providing a structured framework that promotes transparency and rigor in ECT applications, especially in rare diseases, oncology, and pediatrics. The causal reframing, emphasis on identifiability, sensitivity analyses, and explicit comparison of Bayesian versus frequentist operating characteristics represent strengths that could guide practitioners and regulators. The roadmap format offers a practical tool for future research on robustness under realistic violations like outcome drift.

major comments (2)

[Six-step causal roadmap] Six-step causal roadmap (as described in the abstract and detailed in the main sections): the central claim of a comprehensive, coherent synthesis requires explicit documentation of literature search strategy, inclusion/exclusion criteria, and coverage of identifiability conditions under outcome drift and covariate shift. Without these, the roadmap risks material omissions that undermine the foundation for regulatory integration.
[Borrowing strategies and operating characteristics] Section on borrowing strategies and operating characteristics: the evaluation of Bayesian versus frequentist approaches should include specific quantitative comparisons (e.g., bias, coverage, or power under drift scenarios) tied to the roadmap steps, as these are load-bearing for the claimed trade-off insights and software recommendations.

minor comments (2)

[Abstract] Abstract: while the roadmap is outlined, briefly enumerating the six steps would enhance immediate clarity for readers scanning the paper.
[Software and implementation] References and software: ensure all cited software tools and packages include direct links or DOIs to support reproducibility claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive review, which highlights opportunities to strengthen the transparency and practical utility of our synthesis. We address each major comment below and outline the revisions we will make to the manuscript.

read point-by-point responses

Referee: [Six-step causal roadmap] Six-step causal roadmap (as described in the abstract and detailed in the main sections): the central claim of a comprehensive, coherent synthesis requires explicit documentation of literature search strategy, inclusion/exclusion criteria, and coverage of identifiability conditions under outcome drift and covariate shift. Without these, the roadmap risks material omissions that undermine the foundation for regulatory integration.

Authors: We agree that greater transparency regarding literature selection will improve the manuscript. Although the paper is a targeted synthesis organized by the causal roadmap rather than a formal systematic review, we will add a new subsection (likely in the Introduction) that explicitly describes the scope of the literature considered, the primary search terms and databases used, and the inclusion criteria centered on methodological contributions to causal estimands, identifiability, borrowing, and sensitivity analysis in ECTs. On identifiability under outcome drift and covariate shift, Sections 3–4 and Step 6 already derive the relevant assumptions and discuss sensitivity to these violations; we will expand these passages with additional explicit statements and citations to recent work on drift-robust identification, ensuring the roadmap steps directly reference how such violations are diagnosed and mitigated. revision: yes
Referee: [Borrowing strategies and operating characteristics] Section on borrowing strategies and operating characteristics: the evaluation of Bayesian versus frequentist approaches should include specific quantitative comparisons (e.g., bias, coverage, or power under drift scenarios) tied to the roadmap steps, as these are load-bearing for the claimed trade-off insights and software recommendations.

Authors: We acknowledge that while the current text reviews the conceptual trade-offs and cites studies reporting operating characteristics, it does not extract or tabulate specific numerical results (bias, coverage, power) under drift. We will revise the borrowing-strategies section to include concise summaries of quantitative findings from key referenced simulation studies, explicitly mapping each result to the relevant roadmap step (e.g., Step 4 identifiability and Step 5 modeling). This will be presented in a new table or expanded prose that contrasts Bayesian dynamic borrowing and frequentist methods under covariate shift and outcome drift, thereby grounding the efficiency-robustness discussion and software recommendations in concrete evidence from the literature without requiring new simulations. revision: partial

Circularity Check

0 steps flagged

Review paper with no internal derivations or predictions

full rationale

This is a review paper that synthesizes and organizes existing literature on externally controlled trials via a causal lens and a six-step roadmap. It references prior work across causal inference, Bayesian borrowing, and hybrid designs without performing any new derivations, predictions, or parameter fittings that could reduce to the paper's own inputs by construction. The central claim of establishing a coherent foundation rests on accurate literature synthesis rather than self-referential steps, and no equations or claims exhibit the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a review paper; no new free parameters, axioms, or invented entities are introduced by the authors. The work relies on standard causal inference concepts and assumptions drawn from the cited literature.

pith-pipeline@v0.9.0 · 5542 in / 1069 out tokens · 65547 ms · 2026-05-07T14:41:22.077881+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

235 extracted references · 38 canonical work pages

[1]

M., Chang, X., Jiang, X., Liu, Q., Mo, M., Xia, H

Alt, E. M., Chang, X., Jiang, X., Liu, Q., Mo, M., Xia, H. A., and Ibrahim, J. G. (2024). LEAP: The latent exchangeability prior for borrowing information from historical data. Biometrics, 80(3):ujae083

2024
[1]

M., Chang, X., Jiang, X., Liu, Q., Mo, M., Xia, H

Alt, E. M., Chang, X., Jiang, X., Liu, Q., Mo, M., Xia, H. A., and Ibrahim, J. G. (2024). LEAP: The latent exchangeability prior for borrowing information from historical data. Biometrics, 80(3):ujae083. Alt, E. M., Chang, X., Liu, Q., Jiang, X., Mo, M., Xia, H. A., and Ibrahim, J. G. (2025). Control arm augmentation and hierarchical modeling in time-to-e...

2024
[2]

M., Chang, X., Liu, Q., Jiang, X., Mo, M., Xia, H

Alt, E. M., Chang, X., Liu, Q., Jiang, X., Mo, M., Xia, H. A., and Ibrahim, J. G. (2025). Control arm augmentation and hierarchical modeling in time-to-event trials: advantages and pitfalls.Biostatistics, 26(1):kxaf021

2025
[2]

Bi, D., Zhou, T., Zhong, W., and Ji, Y. (2026). SAM-HC: a bayesian nonparametric con- struction of hybrid control for randomized clinical trials using external data.Biostatistics, 27(1):kxag003. Bind, M.-A. C. and Rubin, D. B. (2020). When possible, report a Fisher-exactPvalue and display its underlying null randomization distribution.Proceedings of the N...

work page arXiv 2026
[3]

Aronow, P., Chang, H., and Lopatto, P. (2026). Randomization-based confidence sets for the local average treatment effect.Biometrika, page asag010

2026
[3]

EMA (2025)

Cambridge University Press. EMA (2025). Workshop on the use of external controls in evidence generation for regulatory decision-making. Fang, Y., Mishra-Kalyani, P., Zhang, X., Gruber, S., Yang, S., Ding, P., Shan, M., Lee, J.-Y., van der Laan, M., Faries, D., et al. (2025). Sensitivity analysis for unmeasured 46 confounding in medical product development...

work page arXiv 2025
[4]

M., Robins, J

Aronow, P. M., Robins, J. M., Saarinen, T., Sävje, F., and Sekhon, J. S. (2025). Nonpara- metric identification is not enough, but randomized controlled trials are.Observational studies, 11(1):3

2025
[4]

H., and Dahabreh, I

Karlsson, R., Wang, G., Krijthe, J. H., and Dahabreh, I. J. (2025). Robust integration of external control data in randomized trials.Biometrics, in press. Kojima, M., Orihara, S., Hanada, K., and Ohigashi, T. (2026). Sample size re-estimation in blinded hybrid-control design using inverse probability weighting.Statistics in Medicine, 45(3-5):e70429. Kopp-...

work page arXiv 2025
[5]

Austin, P. C. (2011). An introduction to propensity score methods for reducing the effects of confounding in observational studies.Multivariate Behavioral Research, 46(3):399–424

2011
[5]

Li, R., Lin, R., Huang, J., Tian, L., and Zhu, J. (2023b). A frequentist approach to dynamic borrowing.Biometrical Journal, 65(7):2100406. Li, T., Shi, C., Wen, Q., Sui, Y., Qin, Y., Lai, C., and Zhu, H. (2024). Combining experi- mental and historical data for policy evaluation. InProceedings of the 41st International Conference on Machine Learning, volume

2024
[6]

S., Shao, J., Liu, J., Du, Y., Yi, Y., and Ye, T

Bannick, M. S., Shao, J., Liu, J., Du, Y., Yi, Y., and Ye, T. (2025). A general form of covariate adjustment in clinical trials under covariate-adaptive randomization.Biometrika, 112(3):asaf029

2025
[6]

Li, W., Liu, F., and Snavely, D. (2020). Revisit of test-then-pool methods and some practical considerations.Pharmaceutical Statistics, 19(5):498–517. Li, X., Miao, W., Lu, F., and Zhou, X.-H. (2023c). Improving efficiency of inference in clinical trials with external control data.Biometrics, 79(1):394–403. Liao, L. D., Højbjerre-Frandsen, E., Hubbard, A....

2020
[7]

Bennett, M., White, S., Best, N., and Mander, A. (2021). A novel equivalence probability weighted power prior for using historical control data in an adaptive clinical trial design: A comparison to standard methods.Pharmaceutical statistics, 20(3):462–484

2021
[7]

nonparametric identification is not enough, but randomized controlled trials are

Lin, J., Gamalo-Siebers, M., and Tiwari, R. (2019). Propensity-score-based priors for bayesian augmented control design.Pharmaceutical Statistics, 18(2):223–238. Lin, J. and Lin, J. (2022). Incorporating propensity scores for evidence synthesis un- der bayesian framework: review and recommendations for clinical studies.Journal of biopharmaceutical statist...

work page arXiv 2019
[8]

M., Broglio, K

Berry, S. M., Broglio, K. R., Groshen, S., and Berry, D. A. (2013). Bayesian hierarchical modeling of patient subpopulations: efficient designs of phase ii oncology clinical trials. Clinical Trials, 10(5):720–734

2013
[8]

Rosenbaum, P

Springer. Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects.Biometrika, 70(1):41–55. Rosenbaum, P. R. and Rubin, D. B. (1984). Reducing bias in observational studies using subclassification on the propensity score.Journal of the American statistical Association, 79(387):516–524. ...

work page arXiv 1983
[9]

Bi, D., Liu, M., Lin, J., and Liu, R. (2023). BEATS: Bayesian hybrid design with flexible 41 sample size adaptation for time-to-event endpoints.Statistics in Medicine, 42(30):5708– 5722

2023
[9]

Vanderbeek, A

Springer. Vanderbeek, A. M., Sabbaghi, A., Walsh, J. R., and Fisher, C. K. (2023). Bayesian prognostic covariate adjustment with additive mixture priors.arXiv:2310.18027. 64 VanderWeele, T. J. (2019). Principles of confounder selection.European journal of epidemi- ology, 34(3):211–219. VanderWeele, T. J. and Ding, P. (2017). Sensitivity analysis in observ...

work page arXiv 2023
[10]

Bi, D., Zhou, T., Zhong, W., and Ji, Y. (2026). SAM-HC: a bayesian nonparametric con- struction of hybrid control for randomized clinical trials using external data.Biostatistics, 27(1):kxag003

2026
[10]

Tabpfn: One model to rule them all? arXiv preprint arXiv:2505.20003, 2025

69 Zhang, A., Brown, L. D., and Cai, T. T. (2019). Semi-supervised inference: General theory and estimation of means.The Annals of Statistics, 47(5):2538–2566. Zhang, H., Deng, L., Schiffman, M., Qin, J., and Yu, K. (2020). Generalized integration model for improved statistical inference by leveraging external summary data.Biometrika, 107(3):689–703. Zhan...

work page arXiv 2019
[11]

Bind, M.-A. C. and Rubin, D. B. (2020). When possible, report a Fisher-exactPvalue and display its underlying null randomization distribution.Proceedings of the National Academy of Sciences, 117(32):19151–19158

2020
[11]

and the conditional outcome meanµa(x) = E[Y|A=a,X =x](Robins, 1986), also referred to as theprognostic score(Hansen, 2008). B.1 Statistical design Covariate selection.Baseline covariatesX can be categorized as (i)confounders, associ- ated with both the outcome and sampling; (ii)precision variables(or prognostic variables), associated only with the outcome...

1986
[12]

Boughdiri, A., Berenfeld, C., Josse, J., and Scornet, E. (2025). A unified framework for the transportability of population-level causal measures.Advances in Neural Information Processing Systems

2025
[12]

in sensitivity analyses. B.2 Treatment effect estimation and inference Matching aims to construct an EC sample whose covariate distribution resembles that of treated SAT patients.Nearest-neighbor propensity score matching, often implemented with a caliper of0.1–0.2times the standard deviation of the logit of the propensity score (Austin, 2011), is widely ...

2011
[13]

L., Chang, T

Brantner, C. L., Chang, T. H., Nguyen, T. Q., Hong, H., Di Stefano, L., and Stuart, E. A. (2023). Methods for integrating trials and non-experimental data to examine treatment effect heterogeneity.Statistical Science, 38(4):640–654

2023
[13]

Valid inference for the CATE often requires parametric assumptions or flexible machine learning methods such as the Highly Adaptive Lasso (Nizam et al., 2025)

and individualized treatment rules (Chu et al., 2023; Wu and Yang, 2023; Zhao et al., 2025a; Li et al., 2024). Valid inference for the CATE often requires parametric assumptions or flexible machine learning methods such as the Highly Adaptive Lasso (Nizam et al., 2025). 76 External summary information and federated learningIn this paper, we focus on scena...

2023
[14]

Breunig, C., Liu, R., and Yu, Z. (2025). Double robust bayesian inference on average treatment effects.Econometrica, 93(2):539–568

2025
[14]

subgroup

and data fusion for policy learning (Williams et al., 2025), 77 multi-regional clinical trials (Li et al., 2026). Auxiliary variablesprovide complementary information that can help integrate ECs when direct outcome exchangeability is doubtful.Control variatesexploit auxiliary statistics that are correlated with the primary outcome but whose expectation is...

2025
[16]

and Yang, S

Cao, S. and Yang, S. (2025). Heterogeneity-aware federated causal inference leveraging effect-measure transportability.arXiv:2510.16317

work page arXiv 2025
[17]

Sverdlov, O. (2024). Regulatory guidance on randomization and the use of randomization 42 tests in clinical trials: a systematic review.Statistics in Biopharmaceutical Research, 16(4):428–440

2024
[18]

Caughey, D., Dafoe, A., Li, X., and Miratrix, L. (2023). Randomisation inference beyond the sharp null: bounded null hypotheses and quantiles of individual treatment effects.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 85(5):1471–1491

2023
[19]

H., Russo, M., and Paul, S

Chang, J. H., Russo, M., and Paul, S. (2024). Heterogeneous transfer learning for high dimensional regression with feature mismatch.arXiv:2412.18081

work page arXiv 2024
[20]

Chatterjee, N., Chen, Y.-H., Maas, P., and Carroll, R. J. (2016). Constrained maximum likelihood estimation for model calibration using summary-level information from external big data sources.Journal of the American Statistical Association, 111(513):107–117

2016
[21]

Chen, C., Liang, J., Chen, E., and Wang, M. (2024a). Advancing information integration through empirical likelihood: Selective reviews and a new idea.arXiv:2407.00561

work page arXiv
[22]

and Ibrahim, J

Chen, M.-H. and Ibrahim, J. G. (2000). Power prior distributions for regression models. Statistical Science, 15(1):46–60

2000
[23]

Chen, S., Li, S., Zhang, B., and Ye, T. (2025). Minimax rates and adaptivity in combining experimental and observational data.Journal of Causal Inference, 13(1):20240024

2025
[24]

Chen, W.-C., Wang, C., Li, H., Lu, N., Tiwari, R., Xu, Y., and Yue, L. Q. (2020). Propensity score-integrated composite likelihood approach for augmenting the control 43 arm of a randomized controlled trial by incorporating real-world data.Journal of Biopharmaceutical Statistics, 30(3):508–520

2020
[25]

Chen, Y., Feng, Y., Sonksen, M., Wang, T., and Song, J. J. (2026). Propensity score-based stratified win ratio for augmented control designs.Statistics in Medicine, 45:e70487

2026
[26]

N., and Cai, T

Cheng, D., Ananthakrishnan, A. N., and Cai, T. (2021). Robust and efficient semi-supervised estimation of average treatment effects with application to electronic health records data. Biometrics, 77(2):413–423

2021
[27]

Adaptive combination of randomized and observational data.arXiv preprint arXiv:2111.15012,

Cheng, D. and Cai, T. (2021). Adaptive combination of randomized and observational data. arXiv:2111.15012

work page arXiv 2021
[28]

Cheng, Y., Wu, L., and Yang, S. (2023). Enhancing treatment effect estimation: A model robust approach integrating randomized experiments and external controls using the double penalty integration estimator. InProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, volume 216 ofProceedings of Machine Learning Research, pages 381–390

2023
[29]

C., König, F., and Posch, M

Chiam, H. C., König, F., and Posch, M. (2025). Selection bias in hybrid randomized controlled trials using external controls: A simulation study.arXiv:2510.04829

work page arXiv 2025
[30]

and Yang, S

Cho, E. and Yang, S. (2024). Variable selection for doubly robust causal inference.Statistics and its interface, 18(1):93

2024
[31]

Chu, J., Lu, W., and Yang, S. (2023). Targeted optimal treatment regime learning using summary statistics.Biometrika, 110(4):913–931

2023
[32]

Colnet, B., Josse, J., Varoquaux, G., and Scornet, E. (2023). Risk ratio, odds ratio, risk difference... which causal measure is easier to generalize?arXiv:2303.16008

work page arXiv 2023
[33]

Colnet, B., Josse, J., Varoquaux, G., and Scornet, E. (2025). Re-weighting the randomized controlled trial for generalization: finite-sample error and variable selection.Journal of the Royal Statistical Society Series A: Statistics in Society, 188(2):345–372

2025
[34]

Colnet, B., Mayer, I., Chen, G., Dieng, A., Li, R., Varoquaux, G., Vert, J.-P., Josse, J., and Yang, S. (2024). Causal inference methods for combining randomized trials and observational studies: A review.Statistical Science, 39(1):165–191

2024
[35]

and van der Laan, M

Coyle, J. and van der Laan, M. J. (2018). Targeted bootstrap. InTargeted Learning in Data Science: Causal Inference for Complex Longitudinal Studies, pages 523–539. Springer

2018
[36]

Dai, C.-S., Ying, C., Ning, Y., and Zhao, J. (2025). Incorporating external controls for estimating the average treatment effect on the treated with high-dimensional data: Retaining double robustness and ensuring double safety.arXiv:2509.20586. D’Alessandro, A., Kim, J., Adhikari, S., Goff, D., Bargagli-Stoffi, F. J., and Santacatterina, M. (2026). Modern...

work page arXiv 2025
[37]

Dang, L. E. and Balzer, L. B. (2023). Start with the target trial protocol, then follow the roadmap for causal inference.Epidemiology, 34(5):619–623

2023
[38]

E., Gruber, S., Lee, H., Dahabreh, I

Dang, L. E., Gruber, S., Lee, H., Dahabreh, I. J., Stuart, E. A., Williamson, B. D., Wyss, R., Díaz, I., Ghosh, D., Kıcıman, E., et al. (2023). A causal roadmap for generating high-quality real-world evidence.Journal of Clinical and Translational Science, 7(1):e212

2023
[39]

E., Tarp, J

Dang, L. E., Tarp, J. M., Abrahamsen, T. J., Kvist, K., Buse, J. B., Petersen, M., 45 and van der Laan, M. (2025). Experiment-selector cross-validated targeted maximum likelihood estimator for hybrid rct-external data studies.Journal of Causal Inference, 13(1):20240041. De Bartolomeis, P., Abad, J., Wang, G., Donhauser, K., Duch, R. M., Yang, F., and

2025
[40]

Dahabreh, I. (2025). Efficient randomized experiments using foundation models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems

2025
[41]

and Rose, S

Degtiar, I. and Rose, S. (2023). A review of generalizability and transportability.Annual Review of Statistics and Its Application, 10(1):501–524

2023
[42]

Deng, D., Han, P., Chen, S., Wang, M., and Chen, C. (2025). A new integrative learning framework for integrating multiple secondary outcomes into primary outcome analysis: A case study on liver health.Journal of the Royal Statistical Society Series B: Statistical Methodology, page qkaf081. Díaz, I. and van der Laan, M. J. (2013). Sensitivity analysis for ...

2025
[43]

Duan, Y., Ye, K., and Smith, E. P. (2006). Evaluating water quality using power priors to incorporate historical information.Environmetrics: The Official Journal of the International Environmetrics Society, 17(1):95–106

2006
[44]

and Hastie, T

Efron, B. and Hastie, T. (2021).Computer age statistical inference, student edition: algorithms, evidence, and data science, volume 6. Cambridge University Press. EMA (2025). Workshop on the use of external controls in evidence generation for regulatory decision-making

2021
[45]

Fang, Y., Mishra-Kalyani, P., Zhang, X., Gruber, S., Yang, S., Ding, P., Shan, M., Lee, J.-Y., van der Laan, M., Faries, D., et al. (2025). Sensitivity analysis for unmeasured 46 confounding in medical product development and evaluation using real world evidence. Statistics in Biopharmaceutical Research, pages 1–12

2025
[46]

Sheffield, K., and Dreyer, N. (2025). Real effect or bias? good practices for evaluating the robustness of evidence from comparative observational studies through quantitative sensitivity analysis for unmeasured confounding.Pharmaceutical Statistics, 24(2):e2457. FDA (2019a). Adaptive design clinical trials for drugs and biologics: Guidance for industry. ...

2025
[47]

Fiksel, J. (2024). On exact randomization-based covariate-adjusted confidence intervals. Biometrics, 80(2):ujae051

2024
[48]

Fisher, R. A. (1935).The Design of Experiments. Oliver and Boyd, Edinburgh, 1st edition

1935
[49]

and Korn, E

Freidlin, B. and Korn, E. L. (2013). Borrowing information across subgroups in phase ii trials: is it useful?Clinical Cancer Research, 19(6):1326–1334. 47

2013
[50]

and Korn, E

Freidlin, B. and Korn, E. L. (2023). Augmenting randomized clinical trial data with historical control data: precision medicine applications.JNCI: Journal of the National Cancer Institute, 115(1):14–20

2023
[51]

A., Sales, A

Gagnon-Bartsch, J. A., Sales, A. C., Wu, E., Botelho, A. F., Erickson, J. A., Miratrix, L. W., and Heffernan, N. T. (2023). Precise unbiased estimation in randomized experiments using auxiliary observational data.Journal of Causal Inference, 11(1):20220011

2023
[52]

Galwey, N. (2017). Supplementation of a clinical trial by historical control data: is the prospect of dynamic borrowing an illusion?Statistics in Medicine, 36(6):899–916

2017
[54]

and Schuler, A

Gordon, A. and Schuler, A. (2025). A non-parametric sensitivity analysis for bounding bias in hybrid control trials.arXiv:2507.18876

work page arXiv 2025
[55]

Gravestock, I., Held, L., and consortium, C.-N. (2017). Adaptive power priors with empirical bayes for clinical trials.Pharmaceutical statistics, 16(5):349–360

2017
[56]

Greifer, N. (2020). Covariate balance tables and plots: a guide to the cobalt package. Accessed March, 10:2020

2020
[57]

V., Lee, H., Concato, J., and van der Laan, M

Gruber, S., Phillips, R. V., Lee, H., Concato, J., and van der Laan, M. (2023). Evaluating and improving real-world evidence with targeted learning.BMC Medical Research Methodology, 23(1):178. 48

2023
[58]

Gu, Y., Liu, H., and Ma, W. (2024). Incorporating external data for analyzing randomized clinical trials: A transfer learning approach.arXiv:2409.04126

work page arXiv 2024
[59]

Guo, B., Laird, G., Song, Y., Chen, J., and Yuan, Y. (2024). Adaptive hybrid control design for comparative clinical trials with historical control data.Journal of the Royal Statistical Society Series C: Applied Statistics, 73(2):444–459

2024
[60]

and Basse, G

Guo, K. and Basse, G. (2023). The generalized oaxaca-blinder estimator.Journal of the American Statistical Association, 118(541):524–536

2023
[61]

L., Ding, P., Wang, Y., and Jordan, M

Guo, W., Wang, S. L., Ding, P., Wang, Y., and Jordan, M. (2022). Multi-source causal inference using control variates under outcome selection bias.Transactions on Machine Learning Research

2022
[62]

Degtyarev, E. (2024). Combining the target trial and estimand frameworks to define the causal estimand: an application using real-world data to contextualize a single-arm trial. Statistics in Biopharmaceutical Research, 16(1):1–10

2024
[63]

Hampson, L. V. and Izem, R. (2023). Innovative hybrid designs and analytical approaches leveraging real-world data and clinical trial data.Real-World Evidence in Medical Product Development, pages 211–232

2023
[64]

Han, L., Hou, J., Cho, K., Duan, R., and Cai, T. (2025). Federated adaptive causal estimation (FACE) of target treatment effects.Journal of the American Statistical Association, 120(551):1503–1516

2025
[65]

Han, L., Shen, Z., and Zubizarreta, J. (2023). Multiply robust federated estimation of targeted average treatment effects.Advances in Neural Information Processing Systems, 36:70453–70482

2023
[66]

K., Mukherjee, B., and Taylor, J

Han, P., Li, H., Park, S. K., Mukherjee, B., and Taylor, J. M. (2024). Improving predic- 49 tion of linear regression models by integrating external information from heterogeneous populations: James–stein estimators.Biometrics, 80(3):ujae072

2024
[67]

Hansen, B. B. (2008). The prognostic analogue of the propensity score.Biometrika, 95(2):481–488

2008
[68]

Harton, J., Segal, B., Mamtani, R., Mitra, N., and Hubbard, R. A. (2023). Combining real-world and randomized control trial data using data-adaptive weighting via the on-trial score.Statistics in Biopharmaceutical Research, 15(2):408–420. Hernán, M. A. and Robins, J. M. (2016). Using big data to emulate a target trial when a randomized trial is not availa...

2023

Showing first 80 references.