Recognition: unknown
Synergy Area with FDR-controlled Evaluation (SAFE) to robustly assess safety profile in clinical trials
Pith reviewed 2026-05-08 01:55 UTC · model grok-4.3
The pith
The SAFE framework assesses drug safety in clinical trials by evaluating predefined synergy areas with clinical evidence in one layer and controlling false discovery rates across them in the second.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a two-layer Synergy Area with FDR-controlled Evaluation (SAFE) structural framework can robustly assess safety profiles in clinical trials. The first layer investigates each clinically meaningful synergy area based on compelling evidence. The next layer controls the false discovery rate for potential findings across all synergy areas. Simulation studies show that SAFE properly controls error rates within and across synergy areas at the nominal level, and applications to historical trial data demonstrate screening of extreme data for solid safety conclusions.
What carries the argument
The two-layer Synergy Area with FDR-controlled Evaluation (SAFE) framework, where the first layer assesses each synergy area individually using clinical evidence and the second layer applies false discovery rate control across all such areas.
If this is right
- Error rates remain controlled at the nominal level both within each synergy area and across multiple areas.
- The framework screens out extreme data in applications to real clinical trial datasets from historical sources.
- It supports reaching solid safety conclusions by incorporating clinical knowledge more systematically than direct methods.
- SAFE can serve as a building block within larger frameworks or allow incorporation of additional statistical components.
Where Pith is reading between the lines
- If synergy areas prove easy to define consistently, this approach could reduce dependence on purely manual review for safety data in large trials.
- The structure might extend naturally to post-approval safety monitoring where similar predefined areas could apply.
- Testing how conclusions change with different ways of choosing synergy areas would reveal sensitivity in practice.
- Integration into existing clinical data platforms could standardize quantitative safety checks across multiple studies.
Load-bearing premise
Clinically meaningful synergy areas can be reliably predefined in advance using existing clinical knowledge and evidence.
What would settle it
A simulation study or real trial dataset where the observed false discovery rate across synergy areas exceeds the nominal level, or where extreme data points are not screened out as expected.
read the original abstract
Safety assessment plays a fundamental role in developing a new drug via clinical trials for ethical considerations. Due to complexity, manual review is typically conducted on the totality of data to draw safety conclusions. There are some existing quantitative methods to facilitate or tailor further medical review, with a controlled error rate and integration of clinical knowledge. In addition to those two key aspects, we emphasize the importance of relying on substantial evidence to draw robust conclusions on safety. Motivated by these three important properties, we propose a two-layer Synergy Area with FDR-controlled Evaluation (SAFE) structural framework to robustly assess the safety profile in clinical trials. In the first layer of SAFE, we investigate each clinically meaningful Synergy Area (SA) based on compelling evidence. In the next layer, the false discovery rate (FDR) is controlled for potential findings across all SAs. Simulation studies show that SAFE properly controls error rates within and across SAs at the nominal level. We further apply the proposed approach to two case studies based on real data from the Historical Trial Data (HTD) Sharing Initiative of the DataCelerate platform. As compared to some direct methods, SAFE demonstrates an appealing feature of screening out extreme data and reaching solid safety conclusions. It can act as either a building block in another framework, or a platform to incorporate additional components.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a two-layer Synergy Area with FDR-controlled Evaluation (SAFE) framework for assessing safety profiles in clinical trials. The first layer evaluates each clinically predefined Synergy Area (SA) individually using substantial evidence; the second layer applies FDR control across all SAs to identify safety signals while controlling error rates. Simulation studies are reported to demonstrate that SAFE maintains nominal error-rate control both within and across SAs. The method is illustrated on two real-data case studies from the Historical Trial Data Sharing Initiative, where it is claimed to screen extreme data more effectively than direct methods and to support robust safety conclusions.
Significance. If the FDR control across SAs holds under realistic dependence, SAFE would provide a practical bridge between clinical knowledge (via predefined SAs) and statistical multiplicity control, potentially improving the reliability of safety assessments in drug development where manual review of complex endpoints is standard. The two-layer structure and emphasis on external validation via simulations and real data are strengths that could make the framework reusable as a building block in larger safety-analysis pipelines.
major comments (2)
- [Abstract and simulation studies] Abstract and simulation section: The claim that simulations show SAFE 'properly controls error rates within and across SAs at the nominal level' is load-bearing for the central contribution, yet the simulation design is not described in sufficient detail to confirm that between-SA dependence (arising from shared patients, overlapping adverse-event categories, or correlated endpoints) was incorporated. Standard BH or similar FDR procedures do not automatically guarantee control under arbitrary positive dependence; without explicit modeling of such structures, the reported cross-SA control may be an artifact of an independence assumption unlikely to hold in clinical data.
- [Case studies] Case-study section: The statement that SAFE demonstrates an 'appealing feature of screening out extreme data and reaching solid safety conclusions' as compared to 'some direct methods' is central to the practical claim, but the direct methods are not named, the quantitative metrics of advantage (e.g., number of signals retained, false-positive rates on known safety signals) are not reported, and the data-exclusion rules or SA definitions used in the HTD case studies are not specified. This prevents verification that the observed advantage stems from the two-layer FDR mechanism rather than from ad-hoc choices.
minor comments (2)
- [Methods] Notation for Synergy Areas (SAs) and the two-layer structure should be introduced with a clear diagram or pseudocode early in the methods section to avoid ambiguity when the FDR procedure is applied across SAs.
- [Methods] The manuscript should state the exact FDR procedure (BH, BY, or other) and the nominal level used in both layers, as well as any adjustments for dependence.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which help us improve the clarity and rigor of the manuscript. We address each major comment point by point below, indicating the revisions we will make.
read point-by-point responses
-
Referee: [Abstract and simulation studies] Abstract and simulation section: The claim that simulations show SAFE 'properly controls error rates within and across SAs at the nominal level' is load-bearing for the central contribution, yet the simulation design is not described in sufficient detail to confirm that between-SA dependence (arising from shared patients, overlapping adverse-event categories, or correlated endpoints) was incorporated. Standard BH or similar FDR procedures do not automatically guarantee control under arbitrary positive dependence; without explicit modeling of such structures, the reported cross-SA control may be an artifact of an independence assumption unlikely to hold in clinical data.
Authors: We appreciate the referee's emphasis on this key point. The simulations in the manuscript were designed to include between-SA dependence via shared patient structures and correlated adverse-event indicators, consistent with the positive dependence settings under which the BH procedure is known to control FDR (PRDS condition). However, we acknowledge that the current description of the simulation design is high-level and does not provide sufficient explicit detail on the dependence structures. We will revise the simulation section (and update the abstract if needed) to include a fuller account of how dependence was generated, including the specific correlation mechanisms and parameter values used. This will allow readers to confirm that the reported control is not an artifact of independence. revision: yes
-
Referee: [Case studies] Case-study section: The statement that SAFE demonstrates an 'appealing feature of screening out extreme data and reaching solid safety conclusions' as compared to 'some direct methods' is central to the practical claim, but the direct methods are not named, the quantitative metrics of advantage (e.g., number of signals retained, false-positive rates on known safety signals) are not reported, and the data-exclusion rules or SA definitions used in the HTD case studies are not specified. This prevents verification that the observed advantage stems from the two-layer FDR mechanism rather than from ad-hoc choices.
Authors: We agree that additional specificity is required to substantiate the practical advantages claimed for the case studies. In the revised manuscript we will explicitly name the direct methods used for comparison (direct application of BH-FDR to all individual adverse events without SA grouping, and unadjusted p-value screening), report quantitative metrics such as the number of signals retained by each approach and their concordance with known safety signals, and provide the SA definitions together with any data-exclusion rules applied in the HTD analyses (via a new supplementary table). These additions will clarify that the observed screening behavior arises from the two-layer structure rather than ad-hoc choices. revision: yes
Circularity Check
SAFE framework proposal contains no circular derivation
full rationale
The paper defines a two-layer method: pre-defined clinically meaningful Synergy Areas (SAs) are analyzed individually in layer one, followed by standard FDR control across SAs in layer two. Error-rate control is asserted via external simulation studies and real-data case studies rather than by algebraic identity or self-referential fitting. No equations, parameter estimation steps, or uniqueness claims reduce the reported properties to the method's own inputs by construction. The framework is explicitly positioned as a modular building block, confirming it does not rely on self-definition or load-bearing self-citation chains.
Axiom & Free-Parameter Ledger
free parameters (1)
- nominal FDR level
axioms (1)
- domain assumption Clinically meaningful synergy areas can be identified a priori based on clinical knowledge
invented entities (1)
-
Synergy Area (SA)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Efficacy as an important facet of “safety” in clinical trials: how can we do our best for our patients?.Clinical Infectious Diseases2008; 47(Supplement_3): S180–S185
Talbot GH. Efficacy as an important facet of “safety” in clinical trials: how can we do our best for our patients?.Clinical Infectious Diseases2008; 47(Supplement_3): S180–S185
-
[2]
Drug safety assessment in clinical trials: methodological challenges and opportunities.Trials2012; 13(1): 138
Singh S, Loke YK. Drug safety assessment in clinical trials: methodological challenges and opportunities.Trials2012; 13(1): 138. Zhan et al. 17
-
[3]
Principles and procedures for data and safety monitoring in pragmatic clinical trials.Trials2019; 20(1): 690
Simon GE, Shortreed SM, Rossom RC, Penfold RB, Sperl-Hillen JAM, O’Connor P. Principles and procedures for data and safety monitoring in pragmatic clinical trials.Trials2019; 20(1): 690
-
[4]
Safety Reporting Requirements for INDs (Investigational New Drug Applications) and BA/BE (Bioavailability/Bioequivalence) Studies
Food and Drug Administration . Safety Reporting Requirements for INDs (Investigational New Drug Applications) and BA/BE (Bioavailability/Bioequivalence) Studies. 2012. https://www.fda.gov/regulatory-information/search-fda-guidance- documents/safety-reporting-requirements-inds-investigational-new-drug-applications-and-babe
2012
-
[5]
Multiplicity problems in clinical trials: a regulatory perspective
Huque M, Röhmel J. Multiplicity problems in clinical trials: a regulatory perspective. In: Taylor and Francis Group. 2010 (pp. 1–34)
2010
-
[6]
Traditional multiplicity adjustment methods in clinical trials.Statistics in Medicine2013; 32(29): 5172–5218
Dmitrienko A, D’Agostino Sr R. Traditional multiplicity adjustment methods in clinical trials.Statistics in Medicine2013; 32(29): 5172–5218
-
[7]
Perspective: Multiplicity and Subgroups in the Context of Benefit–Risk Assessment.Statistics in Biopharmaceutical Research2016; 8(4): 404–408
Norton JD, Arani R, He W, Jiang Q, Wen S, Chuang-Stein C. Perspective: Multiplicity and Subgroups in the Context of Benefit–Risk Assessment.Statistics in Biopharmaceutical Research2016; 8(4): 404–408
-
[8]
Multiplicity corrections in life sciences: challenges and consequences.International Journal of Epidemiology2025; 54(4): dyaf098
Menyhárt O, Gy ˝orffy B. Multiplicity corrections in life sciences: challenges and consequences.International Journal of Epidemiology2025; 54(4): dyaf098
-
[9]
Accounting for multiplicities in assessing drug safety: a three-level hierarchical mixture model
Berry SM, Berry DA. Accounting for multiplicities in assessing drug safety: a three-level hierarchical mixture model. Biometrics2004; 60(2): 418–426
-
[10]
Bayesian selection and clustering of polymorphisms in functionally related genes
Dunson DB, Herring AH, Engel SM. Bayesian selection and clustering of polymorphisms in functionally related genes. Journal of the American Statistical Association2008; 103(482): 534–546
-
[11]
CRC press
Berry SM, Carlin BP, Lee JJ, Muller P.Bayesian adaptive methods for clinical trials. CRC press . 2010
2010
-
[12]
Flagging clinical adverse experiences: reducing false discoveries without materially compro- mising power for detecting true signals.Statistics in Medicine2012; 31(18): 1918–1930
Mehrotra DV , Adewale AJ. Flagging clinical adverse experiences: reducing false discoveries without materially compro- mising power for detecting true signals.Statistics in Medicine2012; 31(18): 1918–1930
1918
-
[13]
The utility of troponin measurement to detect myocardial infarction: review of the current findings.Vascular Health and Risk Management2010: 691–699
Daubert MA, Jeremias A. The utility of troponin measurement to detect myocardial infarction: review of the current findings.Vascular Health and Risk Management2010: 691–699
-
[14]
Abreu Nunes dL, Hooper R, McGettigan P, Phillips R. Statistical methods leveraging the hierarchical structure of adverse events for signal detection in clinical trials: a scoping review of the methodological literature.BMC Medical Research Methodology2024; 24(1): 253
-
[15]
Controlling the false discovery rate: a practical and powerful approach to multiple testing
Benjamini Y , Hochberg Y . Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological)1995; 57(1): 289–300. 18 Zhan et al
1995
-
[16]
A simple sequentially rejective multiple test procedure.Scandinavian Journal of Statistics1979: 65–70
Holm S. A simple sequentially rejective multiple test procedure.Scandinavian Journal of Statistics1979: 65–70
-
[17]
The control of the false discovery rate in multiple testing under dependency.Annals of Statistics 2001: 1165–1188
Benjamini Y , Yekutieli D. The control of the false discovery rate in multiple testing under dependency.Annals of Statistics 2001: 1165–1188
2001
-
[18]
ICH MedDRA website
MedDRA . ICH MedDRA website. 2025. https://www.meddra.org/how-to-use/support-documentation/english/welcome
2025
-
[19]
Re III VL, Haynes K, Forde KA, et al. Risk of acute liver failure in patients with drug-induced liver injury: evaluation of Hy’s law and a new prognostic model.Clinical Gastroenterology and Hepatology2015; 13(13): 2360–2368
-
[20]
A stagewise rejective multiple test procedure based on a modified Bonferroni test.Biometrika1988; 75(2): 383–386
Hommel G. A stagewise rejective multiple test procedure based on a modified Bonferroni test.Biometrika1988; 75(2): 383–386
-
[21]
Multiparameter hypothesis testing and acceptance sampling.Technometrics1982; 24(4): 295–300
Berger RL. Multiparameter hypothesis testing and acceptance sampling.Technometrics1982; 24(4): 295–300
-
[22]
Screening for partial conjunction hypotheses.Biometrics2008; 64(4): 1215–1222
Benjamini Y , Heller R. Screening for partial conjunction hypotheses.Biometrics2008; 64(4): 1215–1222
-
[23]
A direct approach to false discovery rates.Journal of the Royal Statistical Society Series B: Statistical Methodology2002; 64(3): 479–498
Storey JD. A direct approach to false discovery rates.Journal of the Royal Statistical Society Series B: Statistical Methodology2002; 64(3): 479–498
-
[24]
Statistical significance for genomewide studies.Proceedings of the National Academy of Sciences 2003; 100(16): 9440–9445
Storey JD, Tibshirani R. Statistical significance for genomewide studies.Proceedings of the National Academy of Sciences 2003; 100(16): 9440–9445
2003
-
[25]
Hidradenitis suppurativa.Nature Reviews Disease Primers 2020; 6(1): 18
Sabat R, Jemec GB, Matusiak Ł, Kimball AB, Prens E, Wolk K. Hidradenitis suppurativa.Nature Reviews Disease Primers 2020; 6(1): 18
2020
-
[26]
Ackerman LS, Schlosser BJ, Zhan T, et al. Improvements in moderate-to-severe hidradenitis suppurativa with upadacitinib: Results from a phase 2, randomized, placebo-controlled study.Journal of the American Academy of Dermatology2025
-
[27]
Pathophysiology, clinical presentation, and treatment of psoriasis: a review.JAMA2020; 323(19): 1945–1960
Armstrong AW, Read C. Pathophysiology, clinical presentation, and treatment of psoriasis: a review.JAMA2020; 323(19): 1945–1960
1945
-
[28]
Strober B, Menter A, Leonardi C, et al. Efficacy of risankizumab in patients with moderate-to-severe plaque psoriasis by baseline demographics, disease characteristics and prior biologic therapy: an integrated analysis of the phase III UltIMMa-1 and UltIMMa-2 studies.Journal of the European Academy of Dermatology and Venereology2020; 34(12): 2830–2838
-
[29]
Association of hidradenitis suppurativa with inflammatory bowel disease: a systematic review and meta-analysis.JAMA Dermatology2019; 155(9): 1022–1027
Chen WT, Chi CC. Association of hidradenitis suppurativa with inflammatory bowel disease: a systematic review and meta-analysis.JAMA Dermatology2019; 155(9): 1022–1027
-
[30]
Comorbidities of hidradenitis suppurativa: a review of the literature.International Journal of Women’s Dermatology2019; 5(5): 330–334
Cartron A, Driscoll MS. Comorbidities of hidradenitis suppurativa: a review of the literature.International Journal of Women’s Dermatology2019; 5(5): 330–334. Zhan et al. 19
-
[31]
Risk of irritable bowel syndrome in patients with hidradenitis suppurativa: a global-federated, multicenter cohort study.Scientific Reports2026
Chang HC, Hsu YH, Chen SJ, Wu MC, Gau SY . Risk of irritable bowel syndrome in patients with hidradenitis suppurativa: a global-federated, multicenter cohort study.Scientific Reports2026
-
[32]
Atopic dermatitis.The Lancet2016; 387(10023): 1109–1122
Weidinger S, Novak N. Atopic dermatitis.The Lancet2016; 387(10023): 1109–1122
-
[33]
Benefit-risk evaluation using a framework of joint modeling and joint evaluations of multiple efficacy and safety endpoints
He W, Fu B. Benefit-risk evaluation using a framework of joint modeling and joint evaluations of multiple efficacy and safety endpoints. 2016
2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.