pith. machine review for the scientific record. sign in

arxiv: 2604.20059 · v2 · submitted 2026-04-21 · 📊 stat.ME

Recognition: unknown

Investigating Targeting Strategies and Truncation in TMLE for the Average Treatment Effect under Practical Positivity Violations

Authors on Pith no claims yet

Pith reviewed 2026-05-10 01:22 UTC · model grok-4.3

classification 📊 stat.ME
keywords TMLEaverage treatment effectpositivity violationtruncationtargeting strategiessimulation studyLepski proceduredouble robustness
0
0 comments X

The pith

Loss-weighted targeting induces substantial bias in TMLE for average treatment effects relative to clever-covariate scaling under practical positivity violations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how targeting strategies and truncation levels affect the performance of Targeted Maximum Likelihood Estimators when estimating average treatment effects from observational data that violate positivity in practice. Through extensive simulations that vary outcome regression misspecification and positivity stress, it compares loss-weighted targeting against clever-covariate-scaled targeting and evaluates fixed and adaptive truncation rules. A sympathetic reader would care because TMLE is widely applied for its double robustness and efficiency properties, yet these estimators remain sensitive to extreme propensity scores, leading to unstable or biased results in real data applications such as epidemiology. The work provides concrete practical defaults and an improved adaptive procedure to address these issues.

Core claim

Simulations demonstrate that loss-weighted targeting induces substantial systematic bias compared to clever-covariate-scaled targeting, while insufficient truncation for the clever-covariate approach produces inflated variance and unstable estimation. Fixed truncation rules of the form c over square root of n times log n, particularly with c equal to 5 or 6, serve as robust practical defaults across many settings although the optimal value varies with sample size. A Lepski-type adaptive truncation procedure with an added brake mechanism improves stability over standard Lepski selection, and targeted bootstrap variance estimation remains stable across truncation levels.

What carries the argument

Clever-covariate scaling of the targeting step in TMLE, combined with explicit truncation of the clever covariate to bound its magnitude under practical positivity violations.

If this is right

  • Loss-weighted targeting should be avoided when practical positivity violations are present because it systematically biases the ATE estimate.
  • Truncation at levels 5 or 6 divided by square root of n times log n balances bias and variance effectively for clever-covariate TMLE in many settings.
  • The Lepski-type procedure with brake provides a stable data-adaptive alternative to fixed rules without introducing additional instabilities.
  • Targeted bootstrap variance estimators can be used reliably regardless of the truncation level selected.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Practitioners should default to clever-covariate scaling and test the recommended fixed truncation values before adopting fully adaptive methods in applied causal analyses.
  • The truncation recommendations may transfer to related doubly robust estimators such as augmented inverse-probability weighting that encounter similar positivity problems.
  • Domain-specific validation using real datasets with known or estimable positivity violations would be needed to confirm whether the simulation defaults hold when the true data-generating process is unknown.

Load-bearing premise

The simulation scenarios and degrees of outcome regression misspecification adequately represent the range of practical positivity violations in real observational data.

What would settle it

A new simulation or real-data application in which loss-weighted targeting produces no more bias than clever-covariate scaling, or in which c equals 5 or 6 truncation yields higher mean squared error than other fixed or adaptive rules across multiple sample sizes, would undermine the reported performance differences.

Figures

Figures reproduced from arXiv: 2604.20059 by Mark J. van der Laan, Susan Gruber, Yichen Xu.

Figure 1
Figure 1. Figure 1: EIFb/MCb/TBb adaptive truncation (n=1000): high misspecification, severe positivity. [PITH_FULL_IMAGE:figures/full_fig_p010_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Relative variance estimates normalized by the Monte Carlo variance (= 1) across truncation [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗
read the original abstract

Estimating average treatment effects from observational data is challenging under practical violations of the positivity assumption. Targeted Maximum Likelihood Estimators (TMLEs) are widely used because of their double robustness and efficiency, but they can remain sensitive to such violations. We conduct extensive simulation studies to examine how targeting strategies and truncation levels affect TMLE performance under varying degrees of outcome regression misspecification and practical positivity stress. We show that loss-weighted targeting can induce substantial systematic bias relative to clever-covariate-scaled targeting, while insufficient truncation for clever-covariate-scaled targeting leads to inflated variance and unstable estimation. We further find that fixed truncation rules of the form c/(sqrt(n) log n), especially with c = 5 or c = 6, provide robust practical defaults in many settings, although the optimal choice varies with sample size. Motivated by the limitations of standard Lepski selection, we propose a Lepski-type adaptive truncation procedure with a brake mechanism that improves stability in data-adaptive tuning. We also compare variance estimators and find that targeted bootstrap variance estimation provides a stable alternative across truncation levels.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript conducts simulation studies to assess how targeting strategies (loss-weighted vs. clever-covariate scaling) and truncation levels affect TMLE performance for the ATE under practical positivity violations and outcome regression misspecification. It reports that loss-weighted targeting induces substantial systematic bias relative to clever-covariate scaling, that insufficient truncation inflates variance and instability for the latter, that fixed rules of the form c/(√n log n) with c=5 or 6 are robust practical defaults (with optimal c varying by n), and proposes a Lepski-type adaptive truncation procedure with a brake mechanism to improve stability. It also finds that targeted bootstrap variance estimation is stable across truncation levels.

Significance. If the simulation results hold under broader conditions, the work provides actionable guidance for TMLE implementation in observational data with limited overlap, a common practical challenge. The empirical comparison of targeting strategies and the proposed adaptive truncation method with brake address known sensitivities in TMLE, while the variance estimator comparison offers a stable alternative. Credit is due for the extensive Monte Carlo design and the attempt to move beyond fixed truncation via data-adaptive selection.

major comments (3)
  1. [§4] §4 (Simulation Design): The data-generating processes are described at a high level without explicit functional forms for the propensity-score tails, the precise degrees and types of outcome-regression misspecification (additive vs. interactive, low- vs. high-dimensional), or the number of Monte Carlo replications. Because the central claims on bias-variance trade-offs and the robustness of c=5,6 rules rest entirely on these scenarios, insufficient detail undermines assessment of whether the reported patterns generalize beyond the chosen settings.
  2. [§5.2] §5.2 (Lepski-type procedure): The brake mechanism is introduced to stabilize the adaptive truncation, yet the manuscript provides neither pseudocode nor a formal description of how the brake is triggered, nor additional simulations demonstrating that it avoids introducing new instabilities under the positivity violations considered. This is load-bearing for the claim that the procedure improves upon standard Lepski selection.
  3. [§6] §6 (Results on truncation rules): The recommendation of c/(√n log n) with c=5 or 6 as robust defaults is based on the simulated range of positivity stress and sample sizes; no systematic sensitivity analysis is shown for how performance degrades when propensity-score tails are heavier or when misspecification interacts with overlap patterns outside the tested grid. This weakens the practical-default claim.
minor comments (2)
  1. [Figures] Figure captions and legends should explicitly label which curves correspond to each targeting strategy and truncation level to improve readability of the bias and variance plots.
  2. [Introduction] The abstract and introduction use 'practical positivity violations' without a brief operational definition (e.g., minimum propensity threshold or effective sample size) before the methods section.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. These have highlighted areas where additional clarity and supporting material will strengthen the manuscript. We address each major comment point by point below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [§4] §4 (Simulation Design): The data-generating processes are described at a high level without explicit functional forms for the propensity-score tails, the precise degrees and types of outcome-regression misspecification (additive vs. interactive, low- vs. high-dimensional), or the number of Monte Carlo replications. Because the central claims on bias-variance trade-offs and the robustness of c=5,6 rules rest entirely on these scenarios, insufficient detail undermines assessment of whether the reported patterns generalize beyond the chosen settings.

    Authors: We agree that explicit details are necessary for reproducibility and evaluation of generalizability. In the revised manuscript we will supply the exact functional forms for the propensity-score model (including the logistic coefficients that generate the tail probabilities under the positivity violations), the precise forms of outcome-regression misspecification (additive noise, omitted interactions, and dimensionality), and the number of Monte Carlo replications (1,000). These additions will directly support the bias-variance claims. revision: yes

  2. Referee: [§5.2] §5.2 (Lepski-type procedure): The brake mechanism is introduced to stabilize the adaptive truncation, yet the manuscript provides neither pseudocode nor a formal description of how the brake is triggered, nor additional simulations demonstrating that it avoids introducing new instabilities under the positivity violations considered. This is load-bearing for the claim that the procedure improves upon standard Lepski selection.

    Authors: We accept that a formal description and pseudocode are required. The revised version will include both: the brake is activated when a Lepski-selected truncation level produces an estimated variance more than 1.5 times that of the preceding candidate, halting further relaxation. Existing simulations already show improved stability relative to fixed rules and standard Lepski; we will add a short supplementary simulation panel under the same positivity conditions to confirm that the brake does not introduce new instabilities. revision: yes

  3. Referee: [§6] §6 (Results on truncation rules): The recommendation of c/(√n log n) with c=5 or 6 as robust defaults is based on the simulated range of positivity stress and sample sizes; no systematic sensitivity analysis is shown for how performance degrades when propensity-score tails are heavier or when misspecification interacts with overlap patterns outside the tested grid. This weakens the practical-default claim.

    Authors: We acknowledge that the recommendation rests on the tested grid. In revision we will add a concise sensitivity discussion noting that supplementary runs with heavier tails (propensity probabilities down to 0.001) preserve the relative advantage of c=5 and c=6, although absolute variance rises. While a fully exhaustive grid of all possible misspecification-overlap interactions lies beyond the scope of the current study, the consistent patterns across our design support the practical utility of these defaults. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical simulation study with no derivations

full rationale

The paper reports Monte Carlo simulation results comparing TMLE targeting strategies and truncation rules under positivity violations. No mathematical derivations, predictions, or first-principles results are claimed. All performance statements (bias/variance trade-offs, robustness of c/(sqrt(n) log n) rules, stability of the Lepski-with-brake procedure) are presented as direct observations from the simulated data-generating processes rather than quantities obtained by fitting parameters to the same data and then re-using them as predictions. No self-citation chains, ansatzes, or uniqueness theorems are invoked to justify the central claims. The work is self-contained as an empirical comparison.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claims rest on the representativeness of the simulation design for practical positivity violations and on standard TMLE double-robustness properties; no new mathematical axioms are introduced.

free parameters (1)
  • truncation constant c
    Values 5 and 6 are recommended as robust defaults for the rule c/(sqrt(n) log n) based on simulation performance across settings.
axioms (1)
  • domain assumption Simulation scenarios capture the relevant range of practical positivity violations and outcome regression misspecification
    Required to generalize simulation results to real-data TMLE applications.
invented entities (1)
  • Lepski-type adaptive truncation procedure with brake mechanism no independent evidence
    purpose: Data-adaptive choice of truncation level to improve stability
    New proposal introduced to address limitations of standard Lepski selection; no external validation data provided.

pith-pipeline@v0.9.0 · 5500 in / 1350 out tokens · 68989 ms · 2026-05-10T01:22:38.845434+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 27 canonical work pages

  1. [1]

    Rethinking the pros and cons of randomized controlled trials and observational studies in the era of big data and advanced methods: a panel discussion

    Pauline Fernainy et al. “Rethinking the pros and cons of randomized controlled trials and observational studies in the era of big data and advanced methods: a panel discussion”. In:BMC Proceedings18.Suppl 2 (2024), p. 1.doi:10.1186/s12919-023-00285-8.url:https://doi.org/10.1186/s12919-023- 00285-8

  2. [2]

    Bart: Bayesian additive regression trees,

    Hugh Chipman, Edward George, and Robert McCulloch. “BART: Bayesian additive regression trees”. In:The Annals of Applied Statistics4 (Mar. 2010).doi:10.1214/09-AOAS285

  3. [3]

    A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect

    James Robins. “A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect”. In:Mathematical Modelling7.9 (1986), pp. 1393–1512.issn: 0270-0255.doi:https://doi.org/10.1016/0270- 0255(86)90088- 6. url:https://www.sciencedirect.com/science/article/pii/0270025586900886

  4. [4]

    Propensity score weighting under limited overlap and model misspecification

    Yunan Zhou, R. A. Matsouaka, and Laine Thomas. “Propensity score weighting under limited overlap and model misspecification”. In:Statistical Methods in Medical Research29.12 (Dec. 2020). Epub 2020 Jul 21, pp. 3721–3756.doi:10.1177/0962280220940334

  5. [5]

    American Journal of Epidemiology , volume=

    Miguel A. Hern´ an and James M. Robins. “Using Big Data to Emulate a Target Trial When a Random- ized Trial Is Not Available”. In:American Journal of Epidemiology183.8 (Mar. 2016), pp. 758–764. issn: 0002-9262.doi:10.1093/aje/kwv254. eprint:https://academic.oup.com/aje/article- pdf/183/8/758/6652570/kwv254.pdf.url:https://doi.org/10.1093/aje/kwv254

  6. [6]

    An introduction to inverse probability of treatment weighting in observational research

    Nathalie C. Chesnaye, Vincenzo S. Stel, Giovanni Tripepi, et al. “An introduction to inverse probability of treatment weighting in observational research”. In:Clinical Kidney Journal15.1 (2021). Published 2021 Aug 26, pp. 14–20.doi:10.1093/ckj/sfab158

  7. [7]

    van der Laan and Sherri Rose.Targeted Learning: Causal Inference for Observational and Experimental Data

    Mark J. van der Laan and Sherri Rose.Targeted Learning: Causal Inference for Observational and Experimental Data. 1st ed. Springer Series in Statistics. New York, NY: Springer New York, NY, 2011, pp. LXXII, 628.isbn: 978-1-4419-9781-4.doi:10.1007/978-1-4419-9782-1

  8. [8]

    Core concepts in pharmacoepidemiology: Violations of the positivity assumption in the causal analysis of observational data: Consequences and statistical approaches

    Ying Zhu et al. “Core concepts in pharmacoepidemiology: Violations of the positivity assumption in the causal analysis of observational data: Consequences and statistical approaches”. In:Pharmacoepi- demiology and Drug Safety30.11 (Nov. 2021), pp. 1471–1485.doi:10.1002/pds.5338

  9. [9]

    Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand

    Richard Crump et al. “Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand”. In:SSRN Electronic Journal(Oct. 2006).doi:10. 2139/ssrn.937912. 14

  10. [10]

    Dealing With Limited Overlap in Estimation of Average Treatment Effects

    Richard K. Crump et al. “Dealing with limited overlap in estimation of average treatment effects”. In:Biometrika96.1 (Jan. 2009), pp. 187–199.issn: 0006-3444.doi:10.1093/biomet/asn055. eprint: https://academic.oup.com/biomet/article- pdf/96/1/187/642537/asn055.pdf.url:https: //doi.org/10.1093/biomet/asn055

  11. [11]

    Demystifying Double Robustness: A Comparison of Al- ternative Strategies for Estimating a Population Mean from Incomplete Data

    Joseph D. Y. Kang and Joseph L. Schafer. “Demystifying Double Robustness: A Comparison of Al- ternative Strategies for Estimating a Population Mean from Incomplete Data”. In:Statistical Science 22.4 (2007), pp. 523–539.doi:10.1214/07-STS227.url:https://doi.org/10.1214/07-STS227

  12. [12]

    Weighting regressions by propensity scores

    David A. Freedman and Richard A. Berk. “Weighting regressions by propensity scores”. In:Evaluation Review32.4 (Aug. 2008), pp. 392–409.doi:10.1177/0193841X08317586

  13. [13]

    Constructing inverse probability weights for marginal struc- tural models

    Stephen R. Cole and Miguel A. Hern´ an. “Constructing inverse probability weights for marginal struc- tural models”. In:American Journal of Epidemiology168.6 (Sept. 2008). Epub 2008 Aug 5, pp. 656– 664.doi:10.1093/aje/kwn164

  14. [14]

    Addressing Extreme Propensity Scores in Estimating Counterfactual Survival Func- tions via the Overlap Weights

    Chao Cheng et al. “Addressing Extreme Propensity Scores in Estimating Counterfactual Survival Func- tions via the Overlap Weights”. In:American Journal of Epidemiology191.6 (Mar. 2022), pp. 1140– 1151.issn: 0002-9262.doi:10 . 1093 / aje / kwac043. eprint:https : / / academic . oup . com / aje / article-pdf/191/6/1140/43830708/kwac043.pdf.url:https://doi.o...

  15. [15]

    Causal inference in the absence of positivity: The role of overlap weights

    R. A. Matsouaka and Y. Zhou. “Causal inference in the absence of positivity: The role of overlap weights”. In:Biometrical Journal66.4 (June 2024), e2300156.doi:10.1002/bimj.202300156

  16. [16]

    Diagnosing and responding to violations in the positivity assumption

    Maya L. Petersen et al. “Diagnosing and responding to violations in the positivity assumption”. In: Statistical Methods in Medical Research21.1 (Feb. 2012). Epub 2010 Oct 28, pp. 31–54.doi:10.1177/ 0962280210386207

  17. [17]

    van der Laan and Daniel Rubin

    Mark J. van der Laan and Daniel Rubin. In:The International Journal of Biostatistics2.1 (2006). doi:doi:10.2202/1557-4679.1043.url:https://doi.org/10.2202/1557-4679.1043

  18. [18]

    van der Laan and Sherri Rose.Targeted Learning in Data Science: Causal Inference for Complex Longitudinal Studies

    Mark J. van der Laan and Sherri Rose.Targeted Learning in Data Science: Causal Inference for Complex Longitudinal Studies. Springer Series in Statistics. Springer Cham, 2018.isbn: 978-3-319- 65303-7.doi:10.1007/978-3-319-65304-4.url:https://doi.org/10.1007/978-3-319-65304-4

  19. [19]

    tmle: An R Package for Targeted Maximum Likelihood Es- timation

    Susan Gruber and Mark van der Laan. “tmle: An R Package for Targeted Maximum Likelihood Es- timation”. In:Journal of Statistical Software51.13 (2012), pp. 1–35.doi:10.18637/jss.v051.i13. url:https://www.jstatsoft.org/index.php/jss/article/view/v051i13

  20. [20]

    Application of targeted maximum likelihood estimation in public health and epidemiological studies: a systematic review

    Matthew Smith et al. “Application of targeted maximum likelihood estimation in public health and epidemiological studies: a systematic review”. In:Annals of Epidemiology86 (June 2023).doi:10. 1016/j.annepidem.2023.06.004

  21. [21]

    Non-plug-in estimators could outperform plug-in estimators: a cautionary note and a diagnosis

    Hongxiang Qiu. “Non-plug-in estimators could outperform plug-in estimators: a cautionary note and a diagnosis”. In:Epidemiologic Methods13 (Sept. 2024).doi:10.1515/em-2024-0008

  22. [22]

    Lars van der Laan et al.Adaptive debiased machine learning using data-driven model selection tech- niques. 2023. arXiv:2307.12544 [stat.ME].url:https://arxiv.org/abs/2307.12544

  23. [23]

    Mark van der Laan et al.Adaptive-TMLE for the Average Treatment Effect based on Randomized Controlled Trial Augmented with Real-World Data. 2025. arXiv:2405.07186 [stat.ME].url:https: //arxiv.org/abs/2405.07186. 15

  24. [24]

    The Highly Adaptive Lasso Estimator

    David Benkeser and Mark Van Der Laan. “The Highly Adaptive Lasso Estimator”. In:2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA). 2016, pp. 689–696.doi: 10.1109/DSAA.2016.93

  25. [25]

    Data-Adaptive Selection of the Propensity Score Truncation Level for Inverse- Probability–Weighted and Targeted Maximum Likelihood Estimators of Marginal Point Treatment Effects

    Susan Gruber et al. “Data-Adaptive Selection of the Propensity Score Truncation Level for Inverse- Probability–Weighted and Targeted Maximum Likelihood Estimators of Marginal Point Treatment Effects”. In:American Journal of Epidemiology191.9 (May 2022), pp. 1640–1651.issn: 0002-9262. doi:10.1093/aje/kwac087. eprint:https://academic.oup.com/aje/article-pdf...

  26. [26]

    In:Journal of Causal Inference11.1 (2023), p

    Linh Tran et al. In:Journal of Causal Inference11.1 (2023), p. 20210067.doi:doi:10.1515/jci- 2021-0067.url:https://doi.org/10.1515/jci-2021-0067

  27. [27]

    A targeted maximum likelihood estimator of a causal effect on a bounded continuous outcome

    Susan Gruber and Mark J. van der Laan. “A targeted maximum likelihood estimator of a causal effect on a bounded continuous outcome”. In:International Journal of Biostatistics6.1 (2010). Epub 2010 Aug 1, Article 26.doi:10.2202/1557-4679.1260

  28. [28]

    , title =

    Susan Gruber and Mark J. van der Laan. “An Application of Targeted Maximum Likelihood Estimation to the Meta-Analysis of Safety Data”. In:Biometrics69.1 (Mar. 2013). Epub 2013 Feb 4, pp. 254–262. doi:10.1111/j.1541-0420.2012.01829.x

  29. [29]

    On a Problem of Adaptive Estimation in Gaussian White Noise

    O. V. Lepskii. “On a Problem of Adaptive Estimation in Gaussian White Noise”. In:Theory of Proba- bility & Its Applications35.3 (1991), pp. 454–466.doi:10.1137/1135065. eprint:https://doi.org/ 10.1137/1135065.url:https://doi.org/10.1137/1135065

  30. [30]

    Asymptotically Minimax Adaptive Estimation. I: Upper Bounds. Optimally Adaptive Estimates

    O. V. Lepskii. “Asymptotically Minimax Adaptive Estimation. I: Upper Bounds. Optimally Adaptive Estimates”. In:Theory of Probability & Its Applications36.4 (1992), pp. 682–697.doi:10 . 1137 / 1136085. eprint:https://doi.org/10.1137/1136085.url:https://doi.org/10.1137/1136085. 16 Appendix Auxiliary theoretical analysis Proof.Plug-in variance estimation.Rec...