arxiv: 2605.03041 · v1 · submitted 2026-05-04 · 📊 stat.AP

Recognition: unknown

Synergy Area with FDR-controlled Evaluation (SAFE) to robustly assess safety profile in clinical trials

Thao Doan, Tianyu Zhan, Xun Chen, Yabing Mai, Yihua Gu

Pith reviewed 2026-05-08 01:55 UTC · model grok-4.3

classification 📊 stat.AP

keywords safety assessmentclinical trialsfalse discovery ratesynergy areadrug safetyerror controlstatistical framework

0 comments

The pith

The SAFE framework assesses drug safety in clinical trials by evaluating predefined synergy areas with clinical evidence in one layer and controlling false discovery rates across them in the second.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Safety conclusions for new drugs depend on reviewing complex trial data, often through manual processes that can lack quantitative rigor. The paper proposes the SAFE framework to combine controlled error rates, integration of clinical knowledge, and reliance on substantial evidence for more robust assessments. In the first layer, each synergy area is examined individually based on compelling evidence; the second layer then applies false discovery rate control to manage findings across all areas. Simulations confirm that error rates stay at nominal levels both within areas and overall, while real data applications show the method can exclude extreme observations to support firmer safety statements.

Core claim

The central claim is that a two-layer Synergy Area with FDR-controlled Evaluation (SAFE) structural framework can robustly assess safety profiles in clinical trials. The first layer investigates each clinically meaningful synergy area based on compelling evidence. The next layer controls the false discovery rate for potential findings across all synergy areas. Simulation studies show that SAFE properly controls error rates within and across synergy areas at the nominal level, and applications to historical trial data demonstrate screening of extreme data for solid safety conclusions.

What carries the argument

The two-layer Synergy Area with FDR-controlled Evaluation (SAFE) framework, where the first layer assesses each synergy area individually using clinical evidence and the second layer applies false discovery rate control across all such areas.

If this is right

Error rates remain controlled at the nominal level both within each synergy area and across multiple areas.
The framework screens out extreme data in applications to real clinical trial datasets from historical sources.
It supports reaching solid safety conclusions by incorporating clinical knowledge more systematically than direct methods.
SAFE can serve as a building block within larger frameworks or allow incorporation of additional statistical components.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If synergy areas prove easy to define consistently, this approach could reduce dependence on purely manual review for safety data in large trials.
The structure might extend naturally to post-approval safety monitoring where similar predefined areas could apply.
Testing how conclusions change with different ways of choosing synergy areas would reveal sensitivity in practice.
Integration into existing clinical data platforms could standardize quantitative safety checks across multiple studies.

Load-bearing premise

Clinically meaningful synergy areas can be reliably predefined in advance using existing clinical knowledge and evidence.

What would settle it

A simulation study or real trial dataset where the observed false discovery rate across synergy areas exceeds the nominal level, or where extreme data points are not screened out as expected.

read the original abstract

Safety assessment plays a fundamental role in developing a new drug via clinical trials for ethical considerations. Due to complexity, manual review is typically conducted on the totality of data to draw safety conclusions. There are some existing quantitative methods to facilitate or tailor further medical review, with a controlled error rate and integration of clinical knowledge. In addition to those two key aspects, we emphasize the importance of relying on substantial evidence to draw robust conclusions on safety. Motivated by these three important properties, we propose a two-layer Synergy Area with FDR-controlled Evaluation (SAFE) structural framework to robustly assess the safety profile in clinical trials. In the first layer of SAFE, we investigate each clinically meaningful Synergy Area (SA) based on compelling evidence. In the next layer, the false discovery rate (FDR) is controlled for potential findings across all SAs. Simulation studies show that SAFE properly controls error rates within and across SAs at the nominal level. We further apply the proposed approach to two case studies based on real data from the Historical Trial Data (HTD) Sharing Initiative of the DataCelerate platform. As compared to some direct methods, SAFE demonstrates an appealing feature of screening out extreme data and reaching solid safety conclusions. It can act as either a building block in another framework, or a platform to incorporate additional components.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper offers a two-layer SAFE framework that layers clinical synergy areas with cross-area FDR control for trial safety data, but dependence between areas is a real risk to the claimed error control.

read the letter

The main contribution is the two-layer structure: define Synergy Areas from clinical knowledge in layer one, then run FDR across those areas in layer two to manage overall discoveries. They back this with simulations that keep error rates at nominal levels inside and across areas, plus two real-data examples from the HTD platform where SAFE filters extreme signals more cleanly than direct approaches and supports firmer safety conclusions. That combination of clinical input and formal control is the practical hook, and the paper positions the method as a modular piece rather than a complete replacement for manual review. The simulations and case studies are the concrete evidence they provide, and the real-data applications show the screening behavior they highlight. The dependence concern is the clearest soft spot. If the synergy areas draw from overlapping patients, correlated adverse events, or shared trial data, the p-values will not be independent, and standard FDR procedures like BH do not automatically guarantee control. The abstract claims the simulations demonstrate control across SAs, but without details on whether those simulations built in realistic between-area correlations, the guarantee may not travel to actual trial data. Defining the areas themselves also rests on “compelling evidence” and clinical judgment, which can vary and is not stress-tested for sensitivity in the reported work. This is a methods paper aimed at biostatisticians and safety teams in drug development who already work with grouped endpoints. Readers who need a structured way to combine clinical grouping with multiplicity control will get the most from it, even if they have to adapt the dependence handling. It is solid enough on proposal, simulation, and application to warrant a serious referee rather than a desk reject, though reviewers will need to press on the simulation design and the practical robustness of the area definitions.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a two-layer Synergy Area with FDR-controlled Evaluation (SAFE) framework for assessing safety profiles in clinical trials. The first layer evaluates each clinically predefined Synergy Area (SA) individually using substantial evidence; the second layer applies FDR control across all SAs to identify safety signals while controlling error rates. Simulation studies are reported to demonstrate that SAFE maintains nominal error-rate control both within and across SAs. The method is illustrated on two real-data case studies from the Historical Trial Data Sharing Initiative, where it is claimed to screen extreme data more effectively than direct methods and to support robust safety conclusions.

Significance. If the FDR control across SAs holds under realistic dependence, SAFE would provide a practical bridge between clinical knowledge (via predefined SAs) and statistical multiplicity control, potentially improving the reliability of safety assessments in drug development where manual review of complex endpoints is standard. The two-layer structure and emphasis on external validation via simulations and real data are strengths that could make the framework reusable as a building block in larger safety-analysis pipelines.

major comments (2)

[Abstract and simulation studies] Abstract and simulation section: The claim that simulations show SAFE 'properly controls error rates within and across SAs at the nominal level' is load-bearing for the central contribution, yet the simulation design is not described in sufficient detail to confirm that between-SA dependence (arising from shared patients, overlapping adverse-event categories, or correlated endpoints) was incorporated. Standard BH or similar FDR procedures do not automatically guarantee control under arbitrary positive dependence; without explicit modeling of such structures, the reported cross-SA control may be an artifact of an independence assumption unlikely to hold in clinical data.
[Case studies] Case-study section: The statement that SAFE demonstrates an 'appealing feature of screening out extreme data and reaching solid safety conclusions' as compared to 'some direct methods' is central to the practical claim, but the direct methods are not named, the quantitative metrics of advantage (e.g., number of signals retained, false-positive rates on known safety signals) are not reported, and the data-exclusion rules or SA definitions used in the HTD case studies are not specified. This prevents verification that the observed advantage stems from the two-layer FDR mechanism rather than from ad-hoc choices.

minor comments (2)

[Methods] Notation for Synergy Areas (SAs) and the two-layer structure should be introduced with a clear diagram or pseudocode early in the methods section to avoid ambiguity when the FDR procedure is applied across SAs.
[Methods] The manuscript should state the exact FDR procedure (BH, BY, or other) and the nominal level used in both layers, as well as any adjustments for dependence.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help us improve the clarity and rigor of the manuscript. We address each major comment point by point below, indicating the revisions we will make.

read point-by-point responses

Referee: [Abstract and simulation studies] Abstract and simulation section: The claim that simulations show SAFE 'properly controls error rates within and across SAs at the nominal level' is load-bearing for the central contribution, yet the simulation design is not described in sufficient detail to confirm that between-SA dependence (arising from shared patients, overlapping adverse-event categories, or correlated endpoints) was incorporated. Standard BH or similar FDR procedures do not automatically guarantee control under arbitrary positive dependence; without explicit modeling of such structures, the reported cross-SA control may be an artifact of an independence assumption unlikely to hold in clinical data.

Authors: We appreciate the referee's emphasis on this key point. The simulations in the manuscript were designed to include between-SA dependence via shared patient structures and correlated adverse-event indicators, consistent with the positive dependence settings under which the BH procedure is known to control FDR (PRDS condition). However, we acknowledge that the current description of the simulation design is high-level and does not provide sufficient explicit detail on the dependence structures. We will revise the simulation section (and update the abstract if needed) to include a fuller account of how dependence was generated, including the specific correlation mechanisms and parameter values used. This will allow readers to confirm that the reported control is not an artifact of independence. revision: yes
Referee: [Case studies] Case-study section: The statement that SAFE demonstrates an 'appealing feature of screening out extreme data and reaching solid safety conclusions' as compared to 'some direct methods' is central to the practical claim, but the direct methods are not named, the quantitative metrics of advantage (e.g., number of signals retained, false-positive rates on known safety signals) are not reported, and the data-exclusion rules or SA definitions used in the HTD case studies are not specified. This prevents verification that the observed advantage stems from the two-layer FDR mechanism rather than from ad-hoc choices.

Authors: We agree that additional specificity is required to substantiate the practical advantages claimed for the case studies. In the revised manuscript we will explicitly name the direct methods used for comparison (direct application of BH-FDR to all individual adverse events without SA grouping, and unadjusted p-value screening), report quantitative metrics such as the number of signals retained by each approach and their concordance with known safety signals, and provide the SA definitions together with any data-exclusion rules applied in the HTD analyses (via a new supplementary table). These additions will clarify that the observed screening behavior arises from the two-layer structure rather than ad-hoc choices. revision: yes

Circularity Check

0 steps flagged

SAFE framework proposal contains no circular derivation

full rationale

The paper defines a two-layer method: pre-defined clinically meaningful Synergy Areas (SAs) are analyzed individually in layer one, followed by standard FDR control across SAs in layer two. Error-rate control is asserted via external simulation studies and real-data case studies rather than by algebraic identity or self-referential fitting. No equations, parameter estimation steps, or uniqueness claims reduce the reported properties to the method's own inputs by construction. The framework is explicitly positioned as a modular building block, confirming it does not rely on self-definition or load-bearing self-citation chains.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The framework rests on the ability to define clinically meaningful synergy areas and apply FDR control, validated only at a high level in the abstract.

free parameters (1)

nominal FDR level
Target false discovery rate is set nominally but value and selection process not specified in abstract.

axioms (1)

domain assumption Clinically meaningful synergy areas can be identified a priori based on clinical knowledge
First layer of SAFE relies on investigating each SA based on compelling evidence.

invented entities (1)

Synergy Area (SA) no independent evidence
purpose: Grouping of data for focused safety evaluation with strong evidence
New concept introduced to structure the analysis in the first layer.

pith-pipeline@v0.9.0 · 5551 in / 1328 out tokens · 51990 ms · 2026-05-08T01:55:20.931066+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references

[1]

Efficacy as an important facet of “safety” in clinical trials: how can we do our best for our patients?.Clinical Infectious Diseases2008; 47(Supplement_3): S180–S185

Talbot GH. Efficacy as an important facet of “safety” in clinical trials: how can we do our best for our patients?.Clinical Infectious Diseases2008; 47(Supplement_3): S180–S185
[2]

Drug safety assessment in clinical trials: methodological challenges and opportunities.Trials2012; 13(1): 138

Singh S, Loke YK. Drug safety assessment in clinical trials: methodological challenges and opportunities.Trials2012; 13(1): 138. Zhan et al. 17
[3]

Principles and procedures for data and safety monitoring in pragmatic clinical trials.Trials2019; 20(1): 690

Simon GE, Shortreed SM, Rossom RC, Penfold RB, Sperl-Hillen JAM, O’Connor P. Principles and procedures for data and safety monitoring in pragmatic clinical trials.Trials2019; 20(1): 690
[4]

Safety Reporting Requirements for INDs (Investigational New Drug Applications) and BA/BE (Bioavailability/Bioequivalence) Studies

Food and Drug Administration . Safety Reporting Requirements for INDs (Investigational New Drug Applications) and BA/BE (Bioavailability/Bioequivalence) Studies. 2012. https://www.fda.gov/regulatory-information/search-fda-guidance- documents/safety-reporting-requirements-inds-investigational-new-drug-applications-and-babe

2012
[5]

Multiplicity problems in clinical trials: a regulatory perspective

Huque M, Röhmel J. Multiplicity problems in clinical trials: a regulatory perspective. In: Taylor and Francis Group. 2010 (pp. 1–34)

2010
[6]

Traditional multiplicity adjustment methods in clinical trials.Statistics in Medicine2013; 32(29): 5172–5218

Dmitrienko A, D’Agostino Sr R. Traditional multiplicity adjustment methods in clinical trials.Statistics in Medicine2013; 32(29): 5172–5218
[7]

Perspective: Multiplicity and Subgroups in the Context of Benefit–Risk Assessment.Statistics in Biopharmaceutical Research2016; 8(4): 404–408

Norton JD, Arani R, He W, Jiang Q, Wen S, Chuang-Stein C. Perspective: Multiplicity and Subgroups in the Context of Benefit–Risk Assessment.Statistics in Biopharmaceutical Research2016; 8(4): 404–408
[8]

Multiplicity corrections in life sciences: challenges and consequences.International Journal of Epidemiology2025; 54(4): dyaf098

Menyhárt O, Gy ˝orffy B. Multiplicity corrections in life sciences: challenges and consequences.International Journal of Epidemiology2025; 54(4): dyaf098
[9]

Accounting for multiplicities in assessing drug safety: a three-level hierarchical mixture model

Berry SM, Berry DA. Accounting for multiplicities in assessing drug safety: a three-level hierarchical mixture model. Biometrics2004; 60(2): 418–426
[10]

Bayesian selection and clustering of polymorphisms in functionally related genes

Dunson DB, Herring AH, Engel SM. Bayesian selection and clustering of polymorphisms in functionally related genes. Journal of the American Statistical Association2008; 103(482): 534–546
[11]

CRC press

Berry SM, Carlin BP, Lee JJ, Muller P.Bayesian adaptive methods for clinical trials. CRC press . 2010

2010
[12]

Flagging clinical adverse experiences: reducing false discoveries without materially compro- mising power for detecting true signals.Statistics in Medicine2012; 31(18): 1918–1930

Mehrotra DV , Adewale AJ. Flagging clinical adverse experiences: reducing false discoveries without materially compro- mising power for detecting true signals.Statistics in Medicine2012; 31(18): 1918–1930

1918
[13]

The utility of troponin measurement to detect myocardial infarction: review of the current findings.Vascular Health and Risk Management2010: 691–699

Daubert MA, Jeremias A. The utility of troponin measurement to detect myocardial infarction: review of the current findings.Vascular Health and Risk Management2010: 691–699
[14]

Abreu Nunes dL, Hooper R, McGettigan P, Phillips R. Statistical methods leveraging the hierarchical structure of adverse events for signal detection in clinical trials: a scoping review of the methodological literature.BMC Medical Research Methodology2024; 24(1): 253
[15]

Controlling the false discovery rate: a practical and powerful approach to multiple testing

Benjamini Y , Hochberg Y . Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological)1995; 57(1): 289–300. 18 Zhan et al

1995
[16]

A simple sequentially rejective multiple test procedure.Scandinavian Journal of Statistics1979: 65–70

Holm S. A simple sequentially rejective multiple test procedure.Scandinavian Journal of Statistics1979: 65–70
[17]

The control of the false discovery rate in multiple testing under dependency.Annals of Statistics 2001: 1165–1188

Benjamini Y , Yekutieli D. The control of the false discovery rate in multiple testing under dependency.Annals of Statistics 2001: 1165–1188

2001
[18]

ICH MedDRA website

MedDRA . ICH MedDRA website. 2025. https://www.meddra.org/how-to-use/support-documentation/english/welcome

2025
[19]

Re III VL, Haynes K, Forde KA, et al. Risk of acute liver failure in patients with drug-induced liver injury: evaluation of Hy’s law and a new prognostic model.Clinical Gastroenterology and Hepatology2015; 13(13): 2360–2368
[20]

A stagewise rejective multiple test procedure based on a modified Bonferroni test.Biometrika1988; 75(2): 383–386

Hommel G. A stagewise rejective multiple test procedure based on a modified Bonferroni test.Biometrika1988; 75(2): 383–386
[21]

Multiparameter hypothesis testing and acceptance sampling.Technometrics1982; 24(4): 295–300

Berger RL. Multiparameter hypothesis testing and acceptance sampling.Technometrics1982; 24(4): 295–300
[22]

Screening for partial conjunction hypotheses.Biometrics2008; 64(4): 1215–1222

Benjamini Y , Heller R. Screening for partial conjunction hypotheses.Biometrics2008; 64(4): 1215–1222
[23]

A direct approach to false discovery rates.Journal of the Royal Statistical Society Series B: Statistical Methodology2002; 64(3): 479–498

Storey JD. A direct approach to false discovery rates.Journal of the Royal Statistical Society Series B: Statistical Methodology2002; 64(3): 479–498
[24]

Statistical significance for genomewide studies.Proceedings of the National Academy of Sciences 2003; 100(16): 9440–9445

Storey JD, Tibshirani R. Statistical significance for genomewide studies.Proceedings of the National Academy of Sciences 2003; 100(16): 9440–9445

2003
[25]

Hidradenitis suppurativa.Nature Reviews Disease Primers 2020; 6(1): 18

Sabat R, Jemec GB, Matusiak Ł, Kimball AB, Prens E, Wolk K. Hidradenitis suppurativa.Nature Reviews Disease Primers 2020; 6(1): 18

2020
[26]

Ackerman LS, Schlosser BJ, Zhan T, et al. Improvements in moderate-to-severe hidradenitis suppurativa with upadacitinib: Results from a phase 2, randomized, placebo-controlled study.Journal of the American Academy of Dermatology2025
[27]

Pathophysiology, clinical presentation, and treatment of psoriasis: a review.JAMA2020; 323(19): 1945–1960

Armstrong AW, Read C. Pathophysiology, clinical presentation, and treatment of psoriasis: a review.JAMA2020; 323(19): 1945–1960

1945
[28]

Strober B, Menter A, Leonardi C, et al. Efficacy of risankizumab in patients with moderate-to-severe plaque psoriasis by baseline demographics, disease characteristics and prior biologic therapy: an integrated analysis of the phase III UltIMMa-1 and UltIMMa-2 studies.Journal of the European Academy of Dermatology and Venereology2020; 34(12): 2830–2838
[29]

Association of hidradenitis suppurativa with inflammatory bowel disease: a systematic review and meta-analysis.JAMA Dermatology2019; 155(9): 1022–1027

Chen WT, Chi CC. Association of hidradenitis suppurativa with inflammatory bowel disease: a systematic review and meta-analysis.JAMA Dermatology2019; 155(9): 1022–1027
[30]

Comorbidities of hidradenitis suppurativa: a review of the literature.International Journal of Women’s Dermatology2019; 5(5): 330–334

Cartron A, Driscoll MS. Comorbidities of hidradenitis suppurativa: a review of the literature.International Journal of Women’s Dermatology2019; 5(5): 330–334. Zhan et al. 19
[31]

Risk of irritable bowel syndrome in patients with hidradenitis suppurativa: a global-federated, multicenter cohort study.Scientific Reports2026

Chang HC, Hsu YH, Chen SJ, Wu MC, Gau SY . Risk of irritable bowel syndrome in patients with hidradenitis suppurativa: a global-federated, multicenter cohort study.Scientific Reports2026
[32]

Atopic dermatitis.The Lancet2016; 387(10023): 1109–1122

Weidinger S, Novak N. Atopic dermatitis.The Lancet2016; 387(10023): 1109–1122
[33]

Benefit-risk evaluation using a framework of joint modeling and joint evaluations of multiple efficacy and safety endpoints

He W, Fu B. Benefit-risk evaluation using a framework of joint modeling and joint evaluations of multiple efficacy and safety endpoints. 2016

2016