The FORSS Framework for Sample Size and Power Calculations With Win Statistics for Hierarchical Endpoints

arxiv: 2605.17240 · v1 · pith:FAIZ7R4Gnew · submitted 2026-05-17 · 📊 stat.ME

The FORSS Framework for Sample Size and Power Calculations With Win Statistics for Hierarchical Endpoints

Baoshan Zhang , Huiman X. Barnhart , Yuan Wu , Roland A. Matsouaka This is my paper

Pith reviewed 2026-05-19 23:32 UTC · model grok-4.3

classification 📊 stat.ME

keywords sample size calculationpower analysiswin statisticshierarchical endpointsclinical trial designsuper-sample approachformula-based methods

0 comments p. Extension

pith:FAIZ7R4G Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{FAIZ7R4G}

Prints a linked pith:FAIZ7R4G badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

The FORSS framework delivers accurate formula-based sample size and power calculations for win statistics on hierarchical endpoints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FORSS, a method that combines analytical formulas with super-samples drawn from a user-specified joint distribution of hierarchical endpoints. This lets trial designers input familiar marginal effects such as hazard ratios or risk differences, then quickly compute required sample sizes without running thousands of full trial simulations for each candidate size. Simulations across many scenarios confirm that the resulting power estimates stay close to those from brute-force simulation while keeping false positive rates near the target 5 percent level. The approach also shows that the strength of dependence between endpoints can change the projected power and thus the number of patients needed in a study like HEART-FID.

Core claim

FORSS is a formula-based super-sample framework that estimates the plug-in quantities required by analytical power and sample-size formulas for win statistics by generating large super-samples from specified marginal treatment effects and a flexible joint working distribution for the hierarchical endpoints, thereby avoiding the computational intensity of repeated full-trial simulations.

What carries the argument

The super-sample generation step within the FORSS framework, which produces large simulated populations to obtain accurate estimates of the population-level win probabilities and other quantities needed for closed-form power calculations.

If this is right

Users can specify treatment effects using standard metrics like hazard ratios, mean differences, and risk differences for each endpoint.
The method maintains Type I error rates close to the nominal 5% level in evaluated scenarios.
Projected power and required sample sizes depend on how the hierarchical endpoints are jointly distributed.
FORSS reduces computation time relative to simulation-based power calculations for hierarchical endpoints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Trial planners could run sensitivity checks over different joint distributions to assess how sample size recommendations change.
The super-sample idea might extend to settings with censoring or other data features common in clinical trials.
Similar plug-in estimation could support power calculations for other composite endpoint analyses beyond win statistics.

Load-bearing premise

That specifying marginal treatment effects and a flexible joint working distribution is enough to let super-samples produce plug-in estimates that support the analytical formulas accurately.

What would settle it

Running a large number of full trial simulations at the sample size recommended by FORSS and finding that the observed power differs substantially from the FORSS-predicted power in scenarios with correlated hierarchical endpoints.

read the original abstract

Win statistics have gained increasing popularity as primary analysis methods for clinical trials with hierarchical endpoints (HEs) as primary endpoints. However, existing sample size and power calculation approaches in trial design still face several limitations and challenges: simulation-based approaches are computationally intensive, while existing formula-based methods often rely on simplifying assumptions such as independence among HEs, or require specification of overall win statistics and tie probability that are difficult to elicit a priori in practice. To address these challenges, we propose the FORSS framework, a FORmula-based Super-Sample approach that allows investigators to specify marginal treatment effects using familiar metrics (e.g., hazard ratios, mean differences, and risk differences) together with a flexible joint working distribution for the HEs. Rather than repeatedly simulating full trials at each candidate sample size, FORSS uses super-samples to estimate the population-level plug-in quantities required by analytical formulas for both power and sample size calculation. We evaluated the performance of the proposed FORSS through extensive simulation studies. The results show that the formula-based FORSS closely matches empirical power across a wide range of scenarios while maintaining Type~I error rates near the nominal 5\% level. An illustration based on the HEART-FID trial further shows that endpoint-dependence specifications can materially affect projected power and required sample size when planning trials with HEs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FORSS offers a practical middle path for sample size calculations with win statistics on hierarchical endpoints by using marginal effects plus super-sampling from a user joint distribution, but its simulations likely do not test robustness to dependence misspecification.

read the letter

The main thing to know about this paper is that the FORSS framework provides a formula-based approach to sample size and power calculations for win statistics with hierarchical endpoints. Users specify marginal treatment effects using standard metrics and a flexible joint working distribution, then super-sampling estimates the plug-in values needed for the analytical formulas. This combination is new relative to prior work that either relies on full simulations or assumes independence among endpoints. The paper does well by reporting simulation studies that show the formula-based results closely match empirical power across scenarios and maintain type I error control near the nominal level. The HEART-FID trial illustration also demonstrates that different dependence specifications can lead to different power and sample size projections. A soft spot is the testing of robustness. The simulations evaluate performance across a wide range of scenarios, but these may be generated under the same joint distribution used as the working model in FORSS. If so, the good agreement mainly verifies the method when the dependence structure is correctly specified. It leaves the impact of realistic misspecification unexamined, which matters for practical trial planning where the true correlations are unknown. This is a moderate limitation rather than a fatal one. The approach keeps circularity low by treating the working distribution as an external modeling choice. The overall math and implementation look consistent. This paper is for applied statisticians designing clinical trials that use hierarchical endpoints, particularly in fields like cardiovascular or oncology research. Readers seeking a balance between computational efficiency and flexibility in modeling dependence will get value from it. It has sufficient substance and addresses a real need, so it deserves a serious referee. I would recommend putting it through peer review.

Referee Report

2 major / 2 minor

Summary. The paper proposes the FORSS (FORmula-based Super-Sample) framework for sample size and power calculations with win statistics for hierarchical endpoints in clinical trials. Investigators specify marginal treatment effects using standard metrics (hazard ratios, mean differences, risk differences) and a flexible joint working distribution for the endpoints; super-samples then estimate the population-level plug-in quantities (win probabilities, net benefits, tie probabilities) required by closed-form analytical power and sample-size formulas. Simulation studies are reported to show close agreement between the formula-based power and empirical power across scenarios, with Type I error rates near the nominal 5% level. An application to the HEART-FID trial illustrates that dependence specifications can materially change projected power and required sample size.

Significance. If the central claims hold, FORSS supplies a computationally lighter alternative to full trial simulation while avoiding the strong independence assumptions or hard-to-elicit overall win/tie parameters of prior formula-based methods. The ability to incorporate user-specified joint distributions for hierarchical endpoints could improve the realism of power calculations in trials with composite or ordered outcomes, provided the plug-in estimates remain accurate under realistic misspecification.

major comments (2)

[§5 (Simulation Studies)] §5 (Simulation Studies): The reported close agreement between FORSS power and empirical power is demonstrated 'across a wide range of scenarios,' yet the description does not indicate whether the data-generating joint distributions used to produce the empirical results differ from the working distributions supplied to FORSS. When the simulation DGP matches the working distribution exactly, the match only verifies internal correctness of the plug-in estimation and formula implementation; it does not address bias in the estimated win probabilities or power when the dependence structure (copula, correlation, or joint probabilities) is misspecified by amounts typical in trial planning.
[§3 (FORSS Framework) and §4 (Analytical Formulas)] §3 (FORSS Framework) and §4 (Analytical Formulas): The method relies on super-samples drawn from the user-specified joint working distribution to obtain the plug-in quantities that enter the analytical power formula. No sensitivity analysis or bound is provided on how errors in the estimated plug-in win probabilities propagate to the final sample-size recommendation when the working distribution is only approximately correct.

minor comments (2)

[Abstract] Abstract: The notation 'Type~I error' contains a typographic artifact; it should read 'Type I error'.
[HEART-FID Illustration] The HEART-FID illustration would be strengthened by an explicit statement of the joint distribution parameters (e.g., copula family and correlation values) used in the dependence scenarios.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments, which help clarify the scope of our simulation studies and the need for explicit robustness checks. We address each major comment in turn and have revised the manuscript to improve transparency and add supporting analyses.

read point-by-point responses

Referee: §5 (Simulation Studies): The reported close agreement between FORSS power and empirical power is demonstrated 'across a wide range of scenarios,' yet the description does not indicate whether the data-generating joint distributions used to produce the empirical results differ from the working distributions supplied to FORSS. When the simulation DGP matches the working distribution exactly, the match only verifies internal correctness of the plug-in estimation and formula implementation; it does not address bias in the estimated win probabilities or power when the dependence structure (copula, correlation, or joint probabilities) is misspecified by amounts typical in trial planning.

Authors: We appreciate this clarification. Our simulation design in §5 already incorporates scenarios in which the working joint distribution supplied to FORSS differs from the true data-generating process, including variations in copula family, correlation strength, and marginal dependence parameters. These cases were chosen to reflect realistic planning uncertainty. Nevertheless, the referee is correct that the original text did not make this distinction explicit. In the revision we have expanded the simulation description to detail the relationship between DGP and working model for each scenario and added a dedicated sensitivity subsection that quantifies performance under deliberate misspecification of the dependence structure. The results continue to show close agreement between formula-based and empirical power, with only modest degradation under moderate misspecification. revision: yes
Referee: §3 (FORSS Framework) and §4 (Analytical Formulas): The method relies on super-samples drawn from the user-specified joint working distribution to obtain the plug-in quantities that enter the analytical power formula. No sensitivity analysis or bound is provided on how errors in the estimated plug-in win probabilities propagate to the final sample-size recommendation when the working distribution is only approximately correct.

Authors: We agree that an explicit sensitivity analysis strengthens the practical utility of the framework. In the revised manuscript we have added a new subsection in §5 that perturbs the joint-distribution parameters (copula parameter, pairwise correlations) around the values used in the main simulations and reports the resulting changes in plug-in win probabilities, net benefit, and the final sample-size recommendation. We also derive and present first-order bounds on the propagation of plug-in error through the closed-form power and sample-size expressions, showing that the impact on recommended N remains limited for the range of misspecification considered realistic in trial planning. These additions directly address the referee’s concern while remaining within the scope of the existing analytical formulas. revision: yes

Circularity Check

0 steps flagged

No circularity: analytical formulas fed by independent super-sampling from user-specified working distribution

full rationale

The FORSS method specifies marginal treatment effects and a flexible joint working distribution as external inputs, then uses super-sampling solely to compute plug-in quantities (win probabilities, net benefits, tie probabilities) that are inserted into pre-existing analytical power and sample-size formulas. This structure does not define the target power formula in terms of the super-sample estimates, nor does it rename fitted quantities as predictions; the formulas remain independent of the particular super-sample realizations. Simulation studies evaluate performance under the same working distribution, but this is a validation check rather than a load-bearing derivation step that reduces to the inputs by construction. No self-citation chains, uniqueness theorems, or ansatzes imported from prior author work are required for the central claim. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the ability of investigators to supply a joint working distribution whose dependence structure is close enough to reality that super-sample estimates of win probabilities and related quantities remain useful for power calculations.

free parameters (1)

Parameters of the joint working distribution for hierarchical endpoints
User must choose or estimate the dependence parameters that define how the ranked endpoints covary; these directly affect the super-sample estimates used in the formulas.

axioms (1)

domain assumption A flexible joint working distribution for the hierarchical endpoints can be specified by the user and used to generate super-samples that accurately estimate the population-level plug-in quantities required by the analytical formulas.
This modeling choice is invoked to replace repeated full-trial simulation while still capturing endpoint dependence.

pith-pipeline@v0.9.0 · 5780 in / 1468 out tokens · 77794 ms · 2026-05-19T23:32:08.511665+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

FORSS uses super-samples to estimate the population-level plug-in quantities required by analytical formulas for both power and sample size calculation... copula C_θ is used as the joint distribution
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We adopt the U-statistics framework of Bebu and Lachin and Dong et al. for HEs with mixed data types

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages

[1]

Validity of composite end points in clinical trials.BMJ 2005; 330(7491): 594–596

Montori V, Permanyer-Miralda G, Ferreira-González I, others . Validity of composite end points in clinical trials.BMJ 2005; 330(7491): 594–596. doi: 10.1136/bmj.330.7491.594

work page doi:10.1136/bmj.330.7491.594 2005
[2]

WalkerHG,BrownAJ,VazIP,etal.Compositeoutcomemeasuresinhigh-impactcriticalcarerandomisedcontrolledtrials: a systematic review.Critical Care2024; 28(1): 184

work page
[3]

Key Issues in End Point Selection for Heart Failure Trials: Composite End Points.Journal of Cardiac Failure2005; 11(8): 567–575

Neaton J, Gray G, Zuckerman B, Konstam M. Key Issues in End Point Selection for Heart Failure Trials: Composite End Points.Journal of Cardiac Failure2005; 11(8): 567–575. doi: 10.1016/j.cardfail.2005.08.350

work page doi:10.1016/j.cardfail.2005.08.350 2005
[4]

Problems with use of composite end points in cardiovascular trials: systematic review of randomised controlled trials.Bmj2007; 334(7597): 786

Ferreira-González I, Permanyer-Miralda G, Domingo-Salvany A, et al. Problems with use of composite end points in cardiovascular trials: systematic review of randomised controlled trials.Bmj2007; 334(7597): 786

work page
[5]

doi: 10.1002/(SICI)1097-0258(19990615)18:11<1341::AID-SIM129>3.0.CO;2-7

FinkelsteinD,SchoenfeldD.Combiningmortalityandlongitudinalmeasuresinclinicaltrials.Statistics in Medicine1999; 18(11): 1341–1354. doi: 10.1002/(SICI)1097-0258(19990615)18:11<1341::AID-SIM129>3.0.CO;2-7

work page doi:10.1002/(sici)1097-0258(19990615)18:11
[6]

Efficient statistical analysis of trial designs: win ratio and related approaches for composite outcomes.Perioperative Medicine2025; 14(1): 70

Fandino W, Dodd M, Kunst G, Clayton T. Efficient statistical analysis of trial designs: win ratio and related approaches for composite outcomes.Perioperative Medicine2025; 14(1): 70

work page
[7]

doi: 10.1093/eurheartj/ehr352

PocockS,AritiC,CollierT,WangD.Thewinratio:anewapproachtotheanalysisofcompositeendpointsinclinicaltrials based on clinical priorities.European Heart Journal2012; 33(2): 176–182. doi: 10.1093/eurheartj/ehr352

work page doi:10.1093/eurheartj/ehr352
[8]

Defining estimand for the win ratio: Separate the true effect from censoring.Clinical Trials2024

Mao L. Defining estimand for the win ratio: Separate the true effect from censoring.Clinical Trials2024. 17407745241259356doi: 10.1177/17407745241259356

work page doi:10.1177/17407745241259356
[9]

Sample Size and Power Calculations with Win Measures Based on Hierarchical Endpoints.Statistics in Medicine2025; 44(10-12)

Barnhart H, Lokhnygina Y, Matsouaka R, others . Sample Size and Power Calculations with Win Measures Based on Hierarchical Endpoints.Statistics in Medicine2025; 44(10-12). doi: 10.1002/sim.70096

work page doi:10.1002/sim.70096
[10]

The win ratio approach for composite endpoints: practical guidance based on previous experience.European Heart Journal2020; 41(46): 4391–4399

Redfors B, Gregson J, Crowley A, others . The win ratio approach for composite endpoints: practical guidance based on previous experience.European Heart Journal2020; 41(46): 4391–4399. doi: 10.1093/eurheartj/ehaa665

work page doi:10.1093/eurheartj/ehaa665
[11]

Dong G, Huang B, Verbeeck J, others . Win statistics (win ratio, win odds, and net benefit) can complement one another to show the strength of the treatment effect on time-to-event outcomes.Pharmaceutical Statistics2023; 22(1): 20–33. doi: 10.1002/pst.2251 Baoshan Zhang et al. 21

work page doi:10.1002/pst.2251
[12]

GregsonJ,TaylorD,OwenR,CollierT,J.CohenD,PocockS.Hierarchicalcompositeoutcomesandwinratiomethodsin cardiovascular trials: a review and consequent guidance.Circulation2025; 151(22): 1606–1619

work page
[13]

PocockSJ,GregsonJ,CollierTJ,FerreiraJP,StoneGW.Thewinratioincardiologytrials:lessonslearnt,newdevelopments, and wise future use.European heart journal2024; 45(44): 4684–4699

work page
[14]

Tafamidis Treatment for Patients with Transthyretin Amyloid Cardiomy- opathy.New England Journal of Medicine2018; 379(11): 1007–1016

Maurer M, Schwartz J, Gundapaneni B, others . Tafamidis Treatment for Patients with Transthyretin Amyloid Cardiomy- opathy.New England Journal of Medicine2018; 379(11): 1007–1016. doi: 10.1056/NEJMoa1805689

work page doi:10.1056/nejmoa1805689
[15]

doi: 10.1056/NEJMoa1806640

StoneG,LindenfeldJ,AbrahamW,others.TranscatheterMitral-ValveRepairinPatientswithHeartFailure.New England Journal of Medicine2018; 379(24): 2307–2318. doi: 10.1056/NEJMoa1806640

work page doi:10.1056/nejmoa1806640
[16]

Randomized Placebo-Controlled Trial of Ferric Carboxymaltose in Heart FailureWithIronDeficiency:RationaleandDesign.Circulation: Heart Failure2021;14(5):e008100

Mentz R, Ambrosy A, Ezekowitz J, others . Randomized Placebo-Controlled Trial of Ferric Carboxymaltose in Heart FailureWithIronDeficiency:RationaleandDesign.Circulation: Heart Failure2021;14(5):e008100. doi:10.1161/CIRC- HEARTFAILURE.120.008100

work page doi:10.1161/circ-
[17]

Sample size formula for a win ratio endpoint.Statistics in Medicine2022; 41(6): 950–963

Yu R, Ganju J. Sample size formula for a win ratio endpoint.Statistics in Medicine2022; 41(6): 950–963. doi: 10.1002/sim.9297

work page doi:10.1002/sim.9297
[18]

Food and Drug Administration

U.S. Food and Drug Administration . Multiple Endpoints in Clinical Trials: Guidance for Industry. U.S. Food and Drug Administration; 2022. Available at: https://www.fda.gov/media/162416/download

work page 2022
[19]

Dapagliflozin in Myocardial Infarction without Diabetes or Heart Failure.NEJM Evidence2024; 3(2)

James S, Erlinge D, Storey R, others . Dapagliflozin in Myocardial Infarction without Diabetes or Heart Failure.NEJM Evidence2024; 3(2). doi: 10.1056/EVIDoa2300286

work page doi:10.1056/evidoa2300286
[20]

A hierarchical kidney outcome using win statistics in patients with heart failure from the DAPA-HF and DELIVER trials.Nature Medicine2024; 30(5): 1432–1439

Kondo T, Jhund P, Gasparyan S, others . A hierarchical kidney outcome using win statistics in patients with heart failure from the DAPA-HF and DELIVER trials.Nature Medicine2024; 30(5): 1432–1439. doi: 10.1038/s41591-024-02941-8

work page doi:10.1038/s41591-024-02941-8
[21]

doi: 10.1002/sim.9419

ZhouT,LaValleyM,NelsonK,CabralH,MassaroJ.CalculatingpowerfortheFinkelsteinandSchoenfeldteststatisticfor a composite endpoint with two components.Statistics in Medicine2022; 41(17): 3321–3335. doi: 10.1002/sim.9419

work page doi:10.1002/sim.9419
[22]

Power and sample size calculation for the win odds test: applica- tion to an ordinal endpoint in COVID-19 trials.Journal of Biopharmaceutical Statistics2021; 31(6): 765–787

Gasparyan S, Kowalewski E, Folkvaljon F, others . Power and sample size calculation for the win odds test: applica- tion to an ordinal endpoint in COVID-19 trials.Journal of Biopharmaceutical Statistics2021; 31(6): 765–787. doi: 10.1080/10543406.2021.1968893

work page doi:10.1080/10543406.2021.1968893 2021
[23]

Sample size formula for general win ratio analysis.Biometrics2022; 78(3): 1257–1268

Mao L, Kim K, Miao X. Sample size formula for general win ratio analysis.Biometrics2022; 78(3): 1257–1268. doi: 10.1111/biom.13501

work page doi:10.1111/biom.13501
[24]

doi: 10.1002/sim.8388

VerbeeckJ,SpitzerE,DeVriesT,others.Generalizedpairwisecomparisonmethodstoanalyze(non)prioritizedcomposite endpoints.Statistics in Medicine2019; 38(30): 5641–5656. doi: 10.1002/sim.8388

work page doi:10.1002/sim.8388
[25]

Biostatistics2016; 17(1): 178–187

BebuI,LachinJ.Largesampleinferenceforawinratioanalysisofacompositeoutcomebasedonprioritizedcomponents. Biostatistics2016; 17(1): 178–187. doi: 10.1093/biostatistics/kxv032

work page doi:10.1093/biostatistics/kxv032
[26]

New York, NY: Springer

Lehmann E.Elements of large-sample theory. New York, NY: Springer. corrected 3rd printing ed. 2004

work page 2004
[27]

New York: Marcel Dekker

Lee A.U-Statistics: Theory and Practice. New York: Marcel Dekker . 1990

work page 1990
[28]

Sequential design for paired ordinal categorical outcome.Statistical Methods in Medical Research2025; 34(6): 1144–1161

Zhang B, Wu Y. Sequential design for paired ordinal categorical outcome.Statistical Methods in Medical Research2025; 34(6): 1144–1161

work page
[29]

Group Sequential Test for Two-Sample Ordinal Outcome Measures.Statistics in Medicine2025; 44(6): e70053

Wu Y, Simmons RA, Zhang B, Troy JD. Group Sequential Test for Two-Sample Ordinal Outcome Measures.Statistics in Medicine2025; 44(6): e70053

work page
[30]

Sequential Design with Derived Win Statistics.arXiv preprint arXiv:2410.062812024

Zhang B, Wu Y. Sequential Design with Derived Win Statistics.arXiv preprint arXiv:2410.062812024

work page arXiv
[31]

Food and Drug Administration

U.S. Food and Drug Administration . Patient-Focused Drug Development: Incorporating Clinical Outcome Assessments IntoEndpointsforRegulatoryDecision-Making.U.S.FoodandDrugAdministration;2023. Availableat:https://www.fda. gov/media/166830/download. 22 Baoshan Zhang et al

work page 2023
[32]

Springer Series in StatisticsNew York: Springer

Nelsen R.An introduction to copulas. Springer Series in StatisticsNew York: Springer. 2nd ed. 2006

work page 2006
[33]

Weighted win loss approach for analyzing prioritized outcomes.Statistics in medicine2017; 36(15): 2452–2465

Luo X, Qiu J, Bai S, Tian H. Weighted win loss approach for analyzing prioritized outcomes.Statistics in medicine2017; 36(15): 2452–2465

work page
[34]

A primer on copulas for count data.ASTIN Bulletin: The Journal of the IAA2007; 37(2): 475–515

Genest C, Nešlehová J. A primer on copulas for count data.ASTIN Bulletin: The Journal of the IAA2007; 37(2): 475–515

work page
[35]

Copula-based regression models for a bivariate mixed discrete and continuous outcome.Statistics in medicine2011; 30(2): 175–185

Leon dAR, Wu B. Copula-based regression models for a bivariate mixed discrete and continuous outcome.Statistics in medicine2011; 30(2): 175–185

work page
[36]

A New Measure of Rank Correlation.Biometrika1938; 30(1–2): 81–93

Kendall MG. A New Measure of Rank Correlation.Biometrika1938; 30(1–2): 81–93. doi: 10.1093/biomet/30.1-2.81

work page doi:10.1093/biomet/30.1-2.81
[37]

HarrellFE,CaliffRM,PryorDB,LeeKL,RosatiRA.EvaluatingtheYieldofMedicalTests.JAMA1982;247(18):2543–

work page
[38]

doi: 10.1001/jama.1982.03320430047030

work page doi:10.1001/jama.1982.03320430047030 1982
[39]

Ferric Carboxymaltose in Heart Failure with Iron Deficiency.New England Journal of Medicine2023; 389(11): 975–986

Mentz R, Garg J, Rockhold F, others . Ferric Carboxymaltose in Heart Failure with Iron Deficiency.New England Journal of Medicine2023; 389(11): 975–986. doi: 10.1056/NEJMoa2304968 Baoshan Zhang et al. 23 8 APPENDIX 8.1 Derivation of the variance of win and loss statistics In this subsection, we show the details of the derivation of the variance of win and...

work page doi:10.1056/nejmoa2304968 2000

[1] [1]

Validity of composite end points in clinical trials.BMJ 2005; 330(7491): 594–596

Montori V, Permanyer-Miralda G, Ferreira-González I, others . Validity of composite end points in clinical trials.BMJ 2005; 330(7491): 594–596. doi: 10.1136/bmj.330.7491.594

work page doi:10.1136/bmj.330.7491.594 2005

[2] [2]

WalkerHG,BrownAJ,VazIP,etal.Compositeoutcomemeasuresinhigh-impactcriticalcarerandomisedcontrolledtrials: a systematic review.Critical Care2024; 28(1): 184

work page

[3] [3]

Key Issues in End Point Selection for Heart Failure Trials: Composite End Points.Journal of Cardiac Failure2005; 11(8): 567–575

Neaton J, Gray G, Zuckerman B, Konstam M. Key Issues in End Point Selection for Heart Failure Trials: Composite End Points.Journal of Cardiac Failure2005; 11(8): 567–575. doi: 10.1016/j.cardfail.2005.08.350

work page doi:10.1016/j.cardfail.2005.08.350 2005

[4] [4]

Problems with use of composite end points in cardiovascular trials: systematic review of randomised controlled trials.Bmj2007; 334(7597): 786

Ferreira-González I, Permanyer-Miralda G, Domingo-Salvany A, et al. Problems with use of composite end points in cardiovascular trials: systematic review of randomised controlled trials.Bmj2007; 334(7597): 786

work page

[5] [5]

doi: 10.1002/(SICI)1097-0258(19990615)18:11<1341::AID-SIM129>3.0.CO;2-7

FinkelsteinD,SchoenfeldD.Combiningmortalityandlongitudinalmeasuresinclinicaltrials.Statistics in Medicine1999; 18(11): 1341–1354. doi: 10.1002/(SICI)1097-0258(19990615)18:11<1341::AID-SIM129>3.0.CO;2-7

work page doi:10.1002/(sici)1097-0258(19990615)18:11

[6] [6]

Efficient statistical analysis of trial designs: win ratio and related approaches for composite outcomes.Perioperative Medicine2025; 14(1): 70

Fandino W, Dodd M, Kunst G, Clayton T. Efficient statistical analysis of trial designs: win ratio and related approaches for composite outcomes.Perioperative Medicine2025; 14(1): 70

work page

[7] [7]

doi: 10.1093/eurheartj/ehr352

PocockS,AritiC,CollierT,WangD.Thewinratio:anewapproachtotheanalysisofcompositeendpointsinclinicaltrials based on clinical priorities.European Heart Journal2012; 33(2): 176–182. doi: 10.1093/eurheartj/ehr352

work page doi:10.1093/eurheartj/ehr352

[8] [8]

Defining estimand for the win ratio: Separate the true effect from censoring.Clinical Trials2024

Mao L. Defining estimand for the win ratio: Separate the true effect from censoring.Clinical Trials2024. 17407745241259356doi: 10.1177/17407745241259356

work page doi:10.1177/17407745241259356

[9] [9]

Sample Size and Power Calculations with Win Measures Based on Hierarchical Endpoints.Statistics in Medicine2025; 44(10-12)

Barnhart H, Lokhnygina Y, Matsouaka R, others . Sample Size and Power Calculations with Win Measures Based on Hierarchical Endpoints.Statistics in Medicine2025; 44(10-12). doi: 10.1002/sim.70096

work page doi:10.1002/sim.70096

[10] [10]

The win ratio approach for composite endpoints: practical guidance based on previous experience.European Heart Journal2020; 41(46): 4391–4399

Redfors B, Gregson J, Crowley A, others . The win ratio approach for composite endpoints: practical guidance based on previous experience.European Heart Journal2020; 41(46): 4391–4399. doi: 10.1093/eurheartj/ehaa665

work page doi:10.1093/eurheartj/ehaa665

[11] [11]

Dong G, Huang B, Verbeeck J, others . Win statistics (win ratio, win odds, and net benefit) can complement one another to show the strength of the treatment effect on time-to-event outcomes.Pharmaceutical Statistics2023; 22(1): 20–33. doi: 10.1002/pst.2251 Baoshan Zhang et al. 21

work page doi:10.1002/pst.2251

[12] [12]

GregsonJ,TaylorD,OwenR,CollierT,J.CohenD,PocockS.Hierarchicalcompositeoutcomesandwinratiomethodsin cardiovascular trials: a review and consequent guidance.Circulation2025; 151(22): 1606–1619

work page

[13] [13]

PocockSJ,GregsonJ,CollierTJ,FerreiraJP,StoneGW.Thewinratioincardiologytrials:lessonslearnt,newdevelopments, and wise future use.European heart journal2024; 45(44): 4684–4699

work page

[14] [14]

Tafamidis Treatment for Patients with Transthyretin Amyloid Cardiomy- opathy.New England Journal of Medicine2018; 379(11): 1007–1016

Maurer M, Schwartz J, Gundapaneni B, others . Tafamidis Treatment for Patients with Transthyretin Amyloid Cardiomy- opathy.New England Journal of Medicine2018; 379(11): 1007–1016. doi: 10.1056/NEJMoa1805689

work page doi:10.1056/nejmoa1805689

[15] [15]

doi: 10.1056/NEJMoa1806640

StoneG,LindenfeldJ,AbrahamW,others.TranscatheterMitral-ValveRepairinPatientswithHeartFailure.New England Journal of Medicine2018; 379(24): 2307–2318. doi: 10.1056/NEJMoa1806640

work page doi:10.1056/nejmoa1806640

[16] [16]

Randomized Placebo-Controlled Trial of Ferric Carboxymaltose in Heart FailureWithIronDeficiency:RationaleandDesign.Circulation: Heart Failure2021;14(5):e008100

Mentz R, Ambrosy A, Ezekowitz J, others . Randomized Placebo-Controlled Trial of Ferric Carboxymaltose in Heart FailureWithIronDeficiency:RationaleandDesign.Circulation: Heart Failure2021;14(5):e008100. doi:10.1161/CIRC- HEARTFAILURE.120.008100

work page doi:10.1161/circ-

[17] [17]

Sample size formula for a win ratio endpoint.Statistics in Medicine2022; 41(6): 950–963

Yu R, Ganju J. Sample size formula for a win ratio endpoint.Statistics in Medicine2022; 41(6): 950–963. doi: 10.1002/sim.9297

work page doi:10.1002/sim.9297

[18] [18]

Food and Drug Administration

U.S. Food and Drug Administration . Multiple Endpoints in Clinical Trials: Guidance for Industry. U.S. Food and Drug Administration; 2022. Available at: https://www.fda.gov/media/162416/download

work page 2022

[19] [19]

Dapagliflozin in Myocardial Infarction without Diabetes or Heart Failure.NEJM Evidence2024; 3(2)

James S, Erlinge D, Storey R, others . Dapagliflozin in Myocardial Infarction without Diabetes or Heart Failure.NEJM Evidence2024; 3(2). doi: 10.1056/EVIDoa2300286

work page doi:10.1056/evidoa2300286

[20] [20]

A hierarchical kidney outcome using win statistics in patients with heart failure from the DAPA-HF and DELIVER trials.Nature Medicine2024; 30(5): 1432–1439

Kondo T, Jhund P, Gasparyan S, others . A hierarchical kidney outcome using win statistics in patients with heart failure from the DAPA-HF and DELIVER trials.Nature Medicine2024; 30(5): 1432–1439. doi: 10.1038/s41591-024-02941-8

work page doi:10.1038/s41591-024-02941-8

[21] [21]

doi: 10.1002/sim.9419

ZhouT,LaValleyM,NelsonK,CabralH,MassaroJ.CalculatingpowerfortheFinkelsteinandSchoenfeldteststatisticfor a composite endpoint with two components.Statistics in Medicine2022; 41(17): 3321–3335. doi: 10.1002/sim.9419

work page doi:10.1002/sim.9419

[22] [22]

Power and sample size calculation for the win odds test: applica- tion to an ordinal endpoint in COVID-19 trials.Journal of Biopharmaceutical Statistics2021; 31(6): 765–787

Gasparyan S, Kowalewski E, Folkvaljon F, others . Power and sample size calculation for the win odds test: applica- tion to an ordinal endpoint in COVID-19 trials.Journal of Biopharmaceutical Statistics2021; 31(6): 765–787. doi: 10.1080/10543406.2021.1968893

work page doi:10.1080/10543406.2021.1968893 2021

[23] [23]

Sample size formula for general win ratio analysis.Biometrics2022; 78(3): 1257–1268

Mao L, Kim K, Miao X. Sample size formula for general win ratio analysis.Biometrics2022; 78(3): 1257–1268. doi: 10.1111/biom.13501

work page doi:10.1111/biom.13501

[24] [24]

doi: 10.1002/sim.8388

VerbeeckJ,SpitzerE,DeVriesT,others.Generalizedpairwisecomparisonmethodstoanalyze(non)prioritizedcomposite endpoints.Statistics in Medicine2019; 38(30): 5641–5656. doi: 10.1002/sim.8388

work page doi:10.1002/sim.8388

[25] [25]

Biostatistics2016; 17(1): 178–187

BebuI,LachinJ.Largesampleinferenceforawinratioanalysisofacompositeoutcomebasedonprioritizedcomponents. Biostatistics2016; 17(1): 178–187. doi: 10.1093/biostatistics/kxv032

work page doi:10.1093/biostatistics/kxv032

[26] [26]

New York, NY: Springer

Lehmann E.Elements of large-sample theory. New York, NY: Springer. corrected 3rd printing ed. 2004

work page 2004

[27] [27]

New York: Marcel Dekker

Lee A.U-Statistics: Theory and Practice. New York: Marcel Dekker . 1990

work page 1990

[28] [28]

Sequential design for paired ordinal categorical outcome.Statistical Methods in Medical Research2025; 34(6): 1144–1161

Zhang B, Wu Y. Sequential design for paired ordinal categorical outcome.Statistical Methods in Medical Research2025; 34(6): 1144–1161

work page

[29] [29]

Group Sequential Test for Two-Sample Ordinal Outcome Measures.Statistics in Medicine2025; 44(6): e70053

Wu Y, Simmons RA, Zhang B, Troy JD. Group Sequential Test for Two-Sample Ordinal Outcome Measures.Statistics in Medicine2025; 44(6): e70053

work page

[30] [30]

Sequential Design with Derived Win Statistics.arXiv preprint arXiv:2410.062812024

Zhang B, Wu Y. Sequential Design with Derived Win Statistics.arXiv preprint arXiv:2410.062812024

work page arXiv

[31] [31]

Food and Drug Administration

U.S. Food and Drug Administration . Patient-Focused Drug Development: Incorporating Clinical Outcome Assessments IntoEndpointsforRegulatoryDecision-Making.U.S.FoodandDrugAdministration;2023. Availableat:https://www.fda. gov/media/166830/download. 22 Baoshan Zhang et al

work page 2023

[32] [32]

Springer Series in StatisticsNew York: Springer

Nelsen R.An introduction to copulas. Springer Series in StatisticsNew York: Springer. 2nd ed. 2006

work page 2006

[33] [33]

Weighted win loss approach for analyzing prioritized outcomes.Statistics in medicine2017; 36(15): 2452–2465

Luo X, Qiu J, Bai S, Tian H. Weighted win loss approach for analyzing prioritized outcomes.Statistics in medicine2017; 36(15): 2452–2465

work page

[34] [34]

A primer on copulas for count data.ASTIN Bulletin: The Journal of the IAA2007; 37(2): 475–515

Genest C, Nešlehová J. A primer on copulas for count data.ASTIN Bulletin: The Journal of the IAA2007; 37(2): 475–515

work page

[35] [35]

Copula-based regression models for a bivariate mixed discrete and continuous outcome.Statistics in medicine2011; 30(2): 175–185

Leon dAR, Wu B. Copula-based regression models for a bivariate mixed discrete and continuous outcome.Statistics in medicine2011; 30(2): 175–185

work page

[36] [36]

A New Measure of Rank Correlation.Biometrika1938; 30(1–2): 81–93

Kendall MG. A New Measure of Rank Correlation.Biometrika1938; 30(1–2): 81–93. doi: 10.1093/biomet/30.1-2.81

work page doi:10.1093/biomet/30.1-2.81

[37] [37]

HarrellFE,CaliffRM,PryorDB,LeeKL,RosatiRA.EvaluatingtheYieldofMedicalTests.JAMA1982;247(18):2543–

work page

[38] [38]

doi: 10.1001/jama.1982.03320430047030

work page doi:10.1001/jama.1982.03320430047030 1982

[39] [39]

Ferric Carboxymaltose in Heart Failure with Iron Deficiency.New England Journal of Medicine2023; 389(11): 975–986

Mentz R, Garg J, Rockhold F, others . Ferric Carboxymaltose in Heart Failure with Iron Deficiency.New England Journal of Medicine2023; 389(11): 975–986. doi: 10.1056/NEJMoa2304968 Baoshan Zhang et al. 23 8 APPENDIX 8.1 Derivation of the variance of win and loss statistics In this subsection, we show the details of the derivation of the variance of win and...

work page doi:10.1056/nejmoa2304968 2000