Recognition: unknown
Adaptive Influence-Based Borrowing Framework for Improving Treatment Effect Estimation in RCTs Using External Controls
Pith reviewed 2026-05-09 15:52 UTC · model grok-4.3
The pith
The adaptive influence-based borrowing framework selects external controls by their perturbation to the RCT outcome model to minimize error in treatment effect estimates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The adaptive influence-based borrowing framework measures the influence of each external control on the outcome regression model estimated from RCT controls, orders them to form nested subsets, and selects the subset that minimizes the mean squared error of the average treatment effect estimator, thereby improving estimation efficiency when external data are compatible.
What carries the argument
The influence score, which quantifies the change induced in the outcome model parameters when an individual external control is added to the RCT control sample.
If this is right
- RCTs with limited sample size can achieve lower mean squared error for treatment effect estimates by borrowing from a selected compatible subset of external controls rather than using all external data or none.
- The nested-subset construction gives a transparent, ordered way to decide how much external data to include without requiring the researcher to pre-specify the number of controls.
- When initial differences between RCT and external groups are large, the optional outcome calibration step extends the method to cases where direct borrowing would otherwise be invalid.
- The accompanying R package supplies a reproducible workflow that includes visualization of influence scores and the resulting MSE curve.
Where Pith is reading between the lines
- The same influence-based ordering could be applied when multiple external data sources are available, ranking patients across sources before subset selection.
- The selected subset might be further examined for effects on secondary outcomes or subgroups that were not used in the influence calculation.
- Empirical checks could compare the method against alternatives that adjust for selection uncertainty rather than relying solely on the chosen subset.
Load-bearing premise
The influence score accurately identifies which external patients are compatible with the RCT controls, and minimizing the mean squared error selects a subset whose bias-variance tradeoff is optimal without extra bias introduced by the selection step.
What would settle it
A simulation study that generates external controls with known degrees of incompatibility and checks whether the procedure consistently selects the subset that achieves the lowest true mean squared error or instead includes incompatible data and produces biased estimates.
Figures
read the original abstract
Randomized controlled trials (RCTs) often suffer from limited sample sizes due to high costs and lengthy recruitment periods, compromising precision in treatment effect estimation. External real-world control data offer a valuable opportunity for augmentation, but na\"ive integration may introduce bias without careful compatibility assessment. This paper presents a practical tutorial on the adaptive influence-based borrowing framework~\citep{Yang-etal2026}, which addresses this challenge through a principled, individual-level borrowing strategy. The core intuition is straightforward: rather than indiscriminately pooling all external controls (ECs), the framework first asks how much each external patient would perturb the outcome model fitted using RCT controls. External patients whose inclusion barely changes this model are deemed comparable and prioritized for borrowing, whereas those who substantially shift it are flagged as potentially incompatible. This individual-level compatibility metric, based on the influence score, is then used to construct a sequence of nested candidate subsets of ECs, from which the optimal subset is selected by minimizing the mean squared error of the treatment effect estimator, balancing the competing risks of bias from over-borrowing and imprecision from under-borrowing. When systematic differences between ECs and RCT controls are substantial, an optional outcome calibration step can align the two groups before influence-based selection proceeds. We provide a clear, step-by-step workflow with emphasis on methodological intuition, practical considerations, and visualization, thereby offering a principled, transparent, and practical method for leveraging ECs when RCTs alone are underpowered. Implementation is supported by an accompanying \texttt{R} package InfluenceBorrowing.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper is a tutorial on the adaptive influence-based borrowing framework for augmenting RCTs with external controls (ECs). It computes influence scores measuring how much each EC perturbs the outcome model fitted on RCT controls, orders ECs by these scores to form nested candidate subsets, selects the subset minimizing the estimated MSE of the treatment-effect estimator, and optionally applies outcome calibration when systematic differences are large. An R package is provided for implementation.
Significance. If the post-selection inference issue is resolved, the framework could offer a transparent, individual-level approach to borrowing that balances bias and variance in underpowered RCTs, with practical emphasis on intuition, visualization, and software support.
major comments (2)
- [Workflow description and subset selection procedure] The subset selection step (described in the workflow and abstract) chooses the optimal nested subset by minimizing an estimated MSE computed on the same observed outcomes used for final estimation. This data-dependent selection induces post-selection bias and variance inflation that is not accounted for by standard model-based variance estimators or confidence intervals; the manuscript provides no selective-inference correction, bootstrap that re-runs the full selection procedure, or explicit bias adjustment.
- [Influence score definition and compatibility assessment] The influence-score compatibility metric is presented as flagging exchangeable ECs, yet the manuscript does not supply simulation results or theoretical bounds showing that the MSE-minimizing choice recovers valid coverage or MSE reduction when compatibility is marginal or when the outcome model is misspecified.
minor comments (2)
- [Abstract] The abstract contains the LaTeX fragment “na”ive”; render as “naïve” for readability.
- [Implementation section] Clarify in the text how the R package InfluenceBorrowing implements the MSE minimization and whether it returns selection-adjusted standard errors.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback on our tutorial manuscript. We have carefully considered each major comment and provide point-by-point responses below, indicating where revisions have been made to strengthen the presentation.
read point-by-point responses
-
Referee: [Workflow description and subset selection procedure] The subset selection step (described in the workflow and abstract) chooses the optimal nested subset by minimizing an estimated MSE computed on the same observed outcomes used for final estimation. This data-dependent selection induces post-selection bias and variance inflation that is not accounted for by standard model-based variance estimators or confidence intervals; the manuscript provides no selective-inference correction, bootstrap that re-runs the full selection procedure, or explicit bias adjustment.
Authors: We agree that the data-dependent subset selection, which minimizes an estimated MSE using the same outcomes, introduces post-selection inference challenges not addressed by standard variance estimators. As the manuscript is a tutorial focused on methodological intuition, workflow, and software implementation rather than theoretical inference guarantees, it does not include selective-inference corrections. In the revised version, we have added a new subsection under practical considerations that explicitly discusses this limitation, recommends bootstrap procedures that re-run the full selection process for valid inference, and clarifies that the framework primarily targets improved point estimation of the treatment effect with appropriate caution for interval estimation. revision: yes
-
Referee: [Influence score definition and compatibility assessment] The influence-score compatibility metric is presented as flagging exchangeable ECs, yet the manuscript does not supply simulation results or theoretical bounds showing that the MSE-minimizing choice recovers valid coverage or MSE reduction when compatibility is marginal or when the outcome model is misspecified.
Authors: The influence score and its behavior under marginal compatibility or outcome model misspecification are analyzed in the foundational paper by Yang et al. (2026), which includes simulation studies demonstrating MSE reduction and coverage properties across scenarios. This tutorial emphasizes practical explanation, visualization, and implementation rather than re-deriving or repeating those results. We have revised the manuscript to include direct citations to the relevant simulation findings from Yang et al. (2026) in the sections on influence scores and subset selection, along with a brief discussion of expected performance under marginal compatibility and misspecification based on the original framework. revision: partial
Circularity Check
No significant circularity in derivation chain
full rationale
The paper is a tutorial describing a workflow that computes influence scores from the RCT-control outcome model, orders external controls into nested candidate subsets, and selects the subset minimizing an estimated MSE for the treatment-effect estimator. This selection rule is a proposed algorithmic step for bias-variance trade-off and does not reduce any claimed result or prediction to its own inputs by construction. The single self-citation to Yang-etal2026 refers to the authors' prior introduction of the framework; the present manuscript adds practical exposition, visualization, and R-package implementation rather than deriving a new theorem whose validity rests on that citation. No equations are shown that equate a first-principles output to a fitted parameter or that smuggle an ansatz via self-reference. The method is therefore self-contained as a statistical procedure whose correctness can be evaluated against external benchmarks or simulation studies independent of the paper's own fitted values.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
G. W. Imbens and D. B. Rubin , date-added =. Causal Inference For Statistics Social and Biomedical Science , year =
-
[2]
Circulation , volume=
CASS Investigators , title =. Circulation , volume=
-
[3]
arXiv preprint arXiv:2307.01449 , year=
Harsh Parikh and Marco Morucci and Vittorio Orlandi and Sudeepa Roy and Cynthia Rudin and Alexander Volfovsky , title =. arXiv preprint arXiv:2307.01449 , year=
-
[4]
Improving Treatment Effect Estimation in Trials through Adaptive Borrowing of External Controls
Qinwei Yang and Jingyi Li and Peng Wu and Shu Yang , title =. arXiv preprint arXiv:2604.13973 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[5]
arXiv preprint arXiv:2405.07186 , year=
Mark van der Laan and Sky Qiu and Jens Magelund Tarp and Lars van der Laan , title =. arXiv preprint arXiv:2405.07186 , year=
-
[6]
arXiv preprint arXiv:2501.17835 , year=
Sky Qiu and Jens Tarp and Andrew Mertens and Mark van der Laan , title =. arXiv preprint arXiv:2501.17835 , year=
-
[7]
, author=
Estimating causal effects of treatments in randomized and nonrandomized studies. , author=. Journal of educational Psychology , volume=. 1974 , publisher=
1974
-
[8]
Essay on principles
On the application of probability theory to agricultural experiments. Essay on principles. Section 9 , author=. Statistical Science , pages=. 1990 , publisher=
1990
-
[9]
Robins , date-added =
Heejung Bang and James M. Robins , date-added =. Doubly robust estimation in missing data and causal inference models , volume =. Biometrics , pages =
-
[10]
Kang and Joseph L
Joseph D.Y. Kang and Joseph L. Schafer , date-added =. Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data , volume =. Statistical Science , pages =
-
[11]
Technometrics , volume=
Characterizations of an empirical influence function for detecting influential cases in regression , author=. Technometrics , volume=. 1980 , publisher=
1980
-
[12]
Proceedings of the 34th International Conference on Machine Learning , pages=
Understanding black-box predictions via influence functions , author=. Proceedings of the 34th International Conference on Machine Learning , pages=. 2017 , organization=
2017
-
[13]
Bernoulli , number =
Shu Yang and Siyi Liu and Donglin Zeng and Xiaofei Wang , title =. Bernoulli , number =
-
[14]
Enhancing Statistical Validity and Power in Hybrid Controlled Trials: A Randomization Inference Approach with Conformal Selective Borrowing , organization =
Ke Zhu and Shu Yang and Xiaofei Wang , booktitle=. Enhancing Statistical Validity and Power in Hybrid Controlled Trials: A Randomization Inference Approach with Conformal Selective Borrowing , organization =
-
[15]
Proceedings of the 42th International Conference on Machine Learning , pages =
Gao, Chenyin and Yang, Shu and Shan, Mingyang and Ye, Wenyu Wendy and Lipkovich, Ilya and Faries, Douglas , title =. Proceedings of the 42th International Conference on Machine Learning , pages =. 2025 , organization=
2025
-
[16]
Biometrika , volume=
Improving randomized controlled trial analysis via data-adaptive borrowing , author=. Biometrika , volume=. 2025 , publisher=
2025
-
[17]
Causal inference in statistics, social, and biomedical sciences , year =
Imbens, Guido W and Rubin, Donald B , publisher =. Causal inference in statistics, social, and biomedical sciences , year =
-
[18]
M.A. Hern. Causal Inference: What If , year =
-
[19]
Adaptive Data-Borrowing for Improving Treatment Effect Estimation using External Controls , organization =
Yang, Qinwei and Li, Jingyi and Wu, Peng , booktitle =. Adaptive Data-Borrowing for Improving Treatment Effect Estimation using External Controls , organization =
-
[20]
Causality: Models, Reasoning, and Inference , year =
Pearl, Judea , edition =. Causality: Models, Reasoning, and Inference , year =
-
[21]
, journal =
Stuart, Elizabeth A. , journal =. Matching methods for causal inference: A review and a look forward , volume =
-
[22]
and Robertson, Sarah E
Dahabreh, Issa J. and Robertson, Sarah E. and Steingrimsson, Jon A. and Stuart, Elizabeth A. and Hern. Extending inferences from a randomized trial to a target population , volume =. European Journal of Epidemiology , pages =
-
[23]
Observational studies in the era of real-world evidence: Strengths and limitations , volume =
Concato, John and Corrigan-Curay, Janet , journal =. Observational studies in the era of real-world evidence: Strengths and limitations , volume =
-
[24]
Guideline on Registry-Based Studies , year =
-
[25]
Points to Consider on Adjustment for Baseline Covariates , year =
-
[26]
Using External Controls in Oncology Trials: Methodological Considerations and Regulatory Experience , volume =
Carrigan, Gemma and Whipple, Shana and Capkun-Niggli, Gisela and Villanueva, Claudio and Cox, Edward , doi =. Using External Controls in Oncology Trials: Methodological Considerations and Regulatory Experience , volume =. Clinical Cancer Research , number =. 2020 , bdsk-url-1 =
2020
-
[27]
Use of External Controls for the Evaluation of Oncology Therapies , volume =
Fralick, Michael and Colacci, Marco and Schneeweiss, Sebastian , doi =. Use of External Controls for the Evaluation of Oncology Therapies , volume =. JAMA Oncology , number =. 2020 , bdsk-url-1 =
2020
-
[28]
Rare Diseases: Natural History Studies for Drug Development , type =
-
[29]
Woodcock, Janet and LaVange, Lisa M. , doi =. Master Protocols to Study Multiple Therapies, Multiple Diseases, or Both , volume =. New England Journal of Medicine , number =. 2017 , bdsk-url-1 =
2017
-
[30]
and Berry, Scott M
Saville, Benjamin R. and Berry, Scott M. , doi =. Efficiencies of Platform Clinical Trials: A Vision of the Future , volume =. Clinical Trials , number =. 2016 , bdsk-url-1 =
2016
-
[31]
and Anderson, Steven A
Sherman, Rachel E. and Anderson, Steven A. and Dal Pan, Gerald J. and Gray, Gregory W. and Gross, Thomas and Hunter, Nina L. and LaVange, Lisa and Marinac-Dabic, Danica and Marks, Peter W. and Robb, Melissa A. and Shuren, Jeffrey and Temple, Robert and Woodcock, Janet and Yue, Lisa Q. , doi =. Real-World Evidence --- What Is It and What Can It Tell Us? , ...
2016
-
[32]
Journal of the American Statistical Association , volume =
On the comparative analysis of average treatment effects estimation via data combination , author=. Journal of the American Statistical Association , volume =
-
[33]
Clinical Trials , volume=
Design and analysis of a clinical trial using previous trials as historical control , author=. Clinical Trials , volume=. 2019 , publisher=
2019
-
[34]
Statistics in medicine , volume=
A note on the power prior , author=. Statistics in medicine , volume=. 2009 , publisher=
2009
-
[35]
Journal of Educational and Behavioral Statistics , volume=
Matching with multiple control groups with adjustment for group differences , author=. Journal of Educational and Behavioral Statistics , volume=. 2008 , publisher=
2008
-
[36]
Dahabreh and Sarah E
Issa J. Dahabreh and Sarah E. Robertson and Jon A. Steingrimsson and Elizabeth A. Stuart and Miguel A. Hernán , journal =. Extending inferences from a randomized trial to a new target population , volume =
-
[37]
Biometrics , volume=
Generalizing causal inferences from individuals in randomized trials to all trial-eligible individuals , author=. Biometrics , volume=. 2019 , publisher=
2019
-
[38]
Statistical Science , title =
B. Statistical Science , title =
-
[39]
Biometrics , volume=
Improving efficiency of inference in clinical trials with external control data , author=. Biometrics , volume=. 2023 , publisher=
2023
-
[40]
Use of Historical Control Data for Assessing Treatment Effects in Clinical Trials , volume =
Kert Viele and Scott Berry and Beat Neuenschwander and Billy Amzal and Fang Chen and Nathan Enas and Brian Hobbs and Joseph G Ibrahim and Nelson Kinnersley and Stacy Lindborg and Sandrine Micallef and Satrajit Roychoudhury and Laura Thompson , journal =. Use of Historical Control Data for Assessing Treatment Effects in Clinical Trials , volume =. 2014 , b...
2014
-
[41]
Robust Meta-Analytic-Predictive Priors in Clinical Trials with Historical Control Information , volume =
Schmidli, Beat and Gsteiger, Sebastian and Roychoudhury, Shreya and O'Hagan, Anthony and Spiegelhalter, David and Neuenschwander, Beat , doi =. Robust Meta-Analytic-Predictive Priors in Clinical Trials with Historical Control Information , volume =. Biometrics , number =. 2014 , bdsk-url-1 =
2014
-
[42]
Neuenschwander, Beat and Capkun-Niggli, Gisela and Branson, Michael and Spiegelhalter, David J. , doi =. Summarizing Historical Information on Controls in Clinical Trials , volume =. Clinical Trials , number =. 2010 , bdsk-url-1 =
2010
-
[43]
and Chen, Ming-Hui , doi =
Ibrahim, Joseph G. and Chen, Ming-Hui , doi =. Power Prior Distributions for Regression Models , volume =. Statistical Science , number =. 2000 , bdsk-url-1 =
2000
-
[44]
Statistics in Biosciences , volume=
A simulation-based evaluation of statistical methods for hybrid real-world control arms in clinical trials , author=. Statistics in Biosciences , volume=. 2022 , publisher=
2022
-
[45]
Biometrics , volume=
Hierarchical commensurate and power prior models for adaptive incorporation of historical information in clinical trials , author=. Biometrics , volume=. 2011 , publisher=
2011
-
[46]
and Klaassen, Chris A
Bickel, Peter J. and Klaassen, Chris A. J. and Ritov, Ya'acov and Wellner, Jon A. , publisher =. Efficient and Adaptive Estimation for Semiparametric Models , year =
-
[47]
, publisher =
Tsiatis, Anastasios A. , publisher =. Semiparametric Theory and Missing Data , year =
-
[48]
, publisher =
van der Vaart, Aad W. , publisher =. Asymptotic Statistics , year =
-
[49]
and Rotnitzky, Andrea and Zhao, Lue Ping , doi =
Robins, James M. and Rotnitzky, Andrea and Zhao, Lue Ping , doi =. Estimation of Regression Coefficients When Some Regressors Are Not Always Observed , volume =. Journal of the American Statistical Association , number =. 1994 , bdsk-url-1 =
1994
-
[50]
and Rotnitzky, Andrea , doi =
Robins, James M. and Rotnitzky, Andrea , doi =. Semiparametric Efficiency in Multivariate Regression Models with Missing Data , volume =. Journal of the American Statistical Association , number =. 1995 , bdsk-url-1 =
1995
-
[51]
and Rubin, Donald B
Rosenbaum, Paul R. and Rubin, Donald B. , doi =. The Central Role of the Propensity Score in Observational Studies for Causal Effects , volume =. Biometrika , number =. 1983 , bdsk-url-1 =
1983
-
[52]
Considerations for the Design and Conduct of Externally Controlled Trials for Drug and Biological Products , type =
FDA , institution =. Considerations for the Design and Conduct of Externally Controlled Trials for Drug and Biological Products , type =
-
[53]
Real-World Data: Assessing Registries To Support Regulatory Decision-Making for Drug and Biological Products , type =
FDA , institution =. Real-World Data: Assessing Registries To Support Regulatory Decision-Making for Drug and Biological Products , type =
-
[54]
, title =
Ferwerda, Jeremy and Hainmueller, Jens and Hazlett, Chad J. , title =. Journal of Statistical Software , year =
-
[55]
Proceedings of the First Conference on Causal Learning and Reasoning , pages =
Integrative R -learner of heterogeneous treatment effects combining experimental and observational studies , author =. Proceedings of the First Conference on Causal Learning and Reasoning , pages =. 2022 , publisher =
2022
-
[56]
Biometrika , volume =
Nie, X and Wager, S , title =. Biometrika , volume =
-
[57]
The American Economic Review , pages=
Evaluating the econometric evaluations of training programs with experimental data , author=. The American Economic Review , pages=. 1986 , publisher=
1986
-
[58]
Review of Economics and Statistics , volume=
Propensity score-matching methods for nonexperimental causal studies , author=. Review of Economics and Statistics , volume=. 2002 , publisher=
2002
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.