Sample Size Determination Under Selection Bias: Robust Tolerance Limits for Prevalent Cohort Data
Pith reviewed 2026-05-20 01:52 UTC · model grok-4.3
The pith
Tolerance limits remain valid for samples drawn under selection bias or censoring by modifying the order-statistics coverage relation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We derive extensions of the Scheffé-Tukey tolerance limit formula which accommodate a large class of biased sampling schemes including weight bias and censoring. The modified formulae are validated through a simulation study and compared to its unmodified counterpart. We illustrate the use of the modified formulae using the partially observed failure times for individuals with dementia using data collected from the Canadian Study of Health and Aging.
What carries the argument
The modified order-statistics coverage relation that directly adjusts the original Scheffé-Tukey probability statement for the effects of weight bias and censoring while preserving distribution-freeness.
If this is right
- Required sample sizes can now be calculated for tolerance limits in studies that use prevalent cohort sampling with right censoring.
- The same coverage guarantee applies across a broad family of weight functions that distort the original sampling probabilities.
- Simulation checks confirm that the adjusted formulae maintain nominal coverage rates under the stated bias mechanisms.
- Applied to real partially observed dementia times, the formulae produce sample-size recommendations that differ from the unbiased case.
Where Pith is reading between the lines
- Researchers planning observational studies with non-random entry can use the adjusted formulae to avoid under-powered tolerance-limit analyses.
- The approach may extend to other common biases such as length bias if the coverage adjustment can be derived in closed form.
- Software implementations of the modified formulae would let applied statisticians plug in their observed censoring rate and bias weight directly.
Load-bearing premise
The weight bias and censoring must take forms that permit a direct algebraic adjustment to the coverage probability of the order statistics.
What would settle it
A Monte Carlo experiment in which the actual coverage probability for the adjusted sample-size formula falls systematically below the nominal level when the bias matches the assumed class would show the extension fails.
read the original abstract
Tolerance limits have received considerable attention in the statistical literature, with applications reaching far beyond their initial role in quality control. The well-known formula of Scheff\'e and Tukey (1944) establishes a simple, distribution-free relation between sample size and population coverage by two given order statistics and a given confidence level. A key requirement in applying this formula is the availability of an unbiased, representative sample from the population of interest. However, as it often happens in biological and medical applications, various logistical constraints may preclude the possibility of obtaining an unbiased sample. We derive extensions of this formula which accommodate a large class of biased sampling schemes including weight bias and censoring. The modified formulae are validated through a simulation study and compared to its unmodified counterpart. We illustrate the use of the modified formulae using the partially observed failure times for individuals with dementia using data collected from the Canadian Study of Health and Aging.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper extends the Scheffé-Tukey (1944) distribution-free formula for tolerance limits based on order statistics to accommodate biased sampling in prevalent cohort data, specifically weight bias and right-censoring. It derives modified coverage relations claimed to preserve the original distribution-free property, validates the extensions via simulation, and applies them to partially observed failure times from the Canadian Study of Health and Aging dementia cohort.
Significance. If the extensions rigorously preserve distribution-freeness under the stated biases, the work would offer a practical advance for sample-size planning in medical and epidemiological studies where unbiased sampling is infeasible. The real-data illustration and simulation comparison to the unmodified formula are positive features that ground the contribution in applied settings.
major comments (2)
- [§3] §3 (derivation of censored extension): The modified order-statistics coverage relation for right-censoring must exhibit an explicit identity showing that the probability equals the original Scheffé-Tukey value independently of both F and the censoring distribution; the joint distribution of observed order statistics depends on the censoring law, so any adjustment that restores the numerical relation must cancel this dependence uniformly rather than under a specific parametric assumption.
- [§5] §5 (simulation study): The reported coverage probabilities are close to nominal levels, but the section lacks explicit statements of the Monte Carlo replication count, the precise mechanisms used to generate weight-biased and censored samples, and the data-exclusion rules applied before computing empirical coverage; these omissions prevent confirmation that the results are free of post-hoc tuning that could affect the central robustness claim.
minor comments (2)
- [Abstract] The abstract states that extensions accommodate 'a large class' of schemes but does not delineate the precise conditions on the weight function or censoring mechanism that allow the distribution-free property to be retained; a short clarifying sentence would help readers assess applicability.
- [§3] Notation for the modified tolerance limits (e.g., the adjusted r and s indices) should be introduced with a clear contrast to the original Scheffé-Tukey indices to avoid confusion when the formulae are applied.
Simulated Author's Rebuttal
We are grateful to the referee for their careful reading and valuable feedback on our paper. We respond to each major comment in turn and indicate the changes we will make to the manuscript.
read point-by-point responses
-
Referee: [§3] §3 (derivation of censored extension): The modified order-statistics coverage relation for right-censoring must exhibit an explicit identity showing that the probability equals the original Scheffé-Tukey value independently of both F and the censoring distribution; the joint distribution of observed order statistics depends on the censoring law, so any adjustment that restores the numerical relation must cancel this dependence uniformly rather than under a specific parametric assumption.
Authors: We thank the referee for this important observation. Our derivation aims to preserve the distribution-free nature by modifying the coverage relation to account for the effects of censoring in a way that the resulting probability matches the original Scheffé-Tukey formula. To address this, we will revise §3 to include an explicit mathematical identity demonstrating that the adjusted probability is independent of both the underlying distribution F and the censoring distribution. This will involve showing how the adjustment terms cancel the dependence introduced by the censoring law. revision: yes
-
Referee: [§5] §5 (simulation study): The reported coverage probabilities are close to nominal levels, but the section lacks explicit statements of the Monte Carlo replication count, the precise mechanisms used to generate weight-biased and censored samples, and the data-exclusion rules applied before computing empirical coverage; these omissions prevent confirmation that the results are free of post-hoc tuning that could affect the central robustness claim.
Authors: We agree that these details are necessary for full transparency and reproducibility. In the revised manuscript, we will add explicit information on the Monte Carlo replication count, provide precise descriptions of the simulation mechanisms for weight bias and censoring, and specify the data-exclusion rules used. These additions will allow independent verification of the simulation results supporting the robustness of our extensions. revision: yes
Circularity Check
Derivation extends external 1944 Scheffé-Tukey result via modeling assumptions on bias; no reduction to fitted inputs or self-citation chains.
full rationale
The paper starts from the external Scheffé-Tukey (1944) coverage relation for unbiased samples and states that specific forms of weight bias and censoring permit a direct modification that preserves the distribution-free property. This extension is presented as a mathematical derivation under the paper's modeling assumptions, followed by simulation validation on synthetic data and an application to the Canadian Study of Health and Aging. No equations are shown that equate a derived tolerance limit or sample-size formula back to a parameter fitted from the same observed data, nor does any load-bearing step rest on a self-citation whose content is itself unverified. The central claim therefore remains independent of the target result and receives external support from the cited 1944 identity plus the stated bias models.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Weight bias and censoring mechanisms permit a direct adjustment to the coverage probability relation of the original distribution-free tolerance limit formula.
Reference graph
Works this paper leans on
-
[1]
Shewhart W.Economic Control of Quality of Manufactured Product. D. Van Nostrand Company, 1931
work page 1931
-
[2]
Determining Sample Sizes for Setting Tolerance Limits.Ann
Wilks S. Determining Sample Sizes for Setting Tolerance Limits.Ann. Math. Statist..1941;12:91–96
work page 1941
-
[3]
A Formula for Sample Sizes for Population Tolerance Limits.Ann
Scheffé H, Tukey J. A Formula for Sample Sizes for Population Tolerance Limits.Ann. Math. Statist..1944;15:217
work page 1944
-
[4]
The Corpuscle Problem: A Mathematical Study of a Biometric Problem.Biometrika.1925;17:84-99
Wicksell S. The Corpuscle Problem: A Mathematical Study of a Biometric Problem.Biometrika.1925;17:84-99
work page 1925
-
[5]
Fisher R. The Effects of Methods of Ascertainment upon the Estimation of Frequencies.The Annals of Eugenics.1934;6:1325
work page 1934
-
[6]
Statistics-Servant of All Sciences.Science.1955;122:401-406
Neyman J. Statistics-Servant of All Sciences.Science.1955;122:401-406
work page 1955
-
[7]
Some Sampling Problems in Technology
Cox D. Some Sampling Problems in Technology. In: New Developments in Survey Sampling. 1969; New York. 506–527
work page 1969
-
[8]
A Reevaluation of the Duration of Survival after the Onset of Dementia.N
Wolfson C, Wolfson D, Asgharian M, et al. A Reevaluation of the Duration of Survival after the Onset of Dementia.N. Engl. J. Med..2001;344:1111- 1116
work page 2001
-
[9]
Lachin J, Foulkes M. Evaluation of sample size and power for analyses of survival with allowance for nonuniform patient entry, losses to follow-up, noncompliance, and stratification.Biometrics.1986;42:507-519
work page 1986
-
[10]
Liu H, Shen Y , Ning J, Qin J. Sample size calculations for prevalent cohort designs.Statistical Methods in Medical Research.2017;26:280-291
work page 2017
-
[11]
McVittie J. Determining sample sizes for combined incident and prevalent cohort studies with and without follow-up.Statistical Methods & Applications.2024;33:303-323
work page 2024
-
[12]
Sakamoto J, Mori Y , Sekioka T. Probability Analysis Method Using Fast Fourier Transform and Its Application.Structural Safety.1997;19(1):21–36. doi: 10.1016/S0167-4730(96)00032-X
-
[13]
Ruckdeschel P, Kohl M. General Purpose Convolution Algorithm in S4 Classes by Means of FFT.Journal of Statistical Software.2014;59(4). doi: 10.18637/jss.v059.i04
-
[14]
Patil G, Rao C. Weighted Distributions and Size-Biased Sampling with Applications to Wildlife Populations and Human Families.Biometrics. 1978;34:179-189
work page 1978
-
[15]
Length-Biased Sampling with Right Censoring: An Unconditional Approach.JASA.2002;97:201–209
Asgharian M, M’Lan C, Wolfson D. Length-Biased Sampling with Right Censoring: An Unconditional Approach.JASA.2002;97:201–209
work page 2002
-
[16]
Patil G, Rao C. The Weighted Distributions. A Survey of their Applications.Applications in Statistics.1977:383-405
work page 1977
-
[17]
Patil G, Rao C, Zelen M. Weighted Distributions. In: Encyclopedia of Statistical Sciences. 1988; New York:565–571
work page 1988
-
[18]
Nonparametric Estimation in the Presence of Length Bias.Ann
Vardi Y . Nonparametric Estimation in the Presence of Length Bias.Ann. Statist..1982;10:616-620
work page 1982
-
[19]
Empirical Distributions in Selection Bias Models.Ann
Vardi Y . Empirical Distributions in Selection Bias Models.Ann. Statist..1985;13:178-203
work page 1985
-
[20]
Vardi Y . Multiplicative Censoring, Renewal Processes, Deconvolution and Decreasing Density: Nonparametric Estimation.Biometrika.1989;76:751- 761
work page 1989
-
[21]
Large sample theory of empirical distributions in biased sampling models.Ann
Gill R, Vardi Y , Wellner J. Large sample theory of empirical distributions in biased sampling models.Ann. Statist..1988;16:1069-1112
work page 1988
-
[22]
Large Sample Study of Empirical Distributions in a Random Multiplicative Model.Ann
Vardi Y , Zhang C. Large Sample Study of Empirical Distributions in a Random Multiplicative Model.Ann. Math. Stat..1992;20:1022-1040
work page 1992
-
[23]
Asgharian M, Wolfson D. Asymptotic Behavior of the Unconditional NPMLE of the Length-Biased Survivor Function from Right Censored Prevalent Cohort Data.Ann. Statist..2005;33:2109–2131
work page 2005
-
[24]
Size-Biased Sampling.Technometrics.1972;14(3):635–644
Scheaffer RL. Size-Biased Sampling.Technometrics.1972;14(3):635–644
work page 1972
-
[25]
Length-bias: characterizations and applications.J
Correa J, Wolfson D. Length-bias: characterizations and applications.J. Stat. Comput. Simul..1999;64:209-219
work page 1999
-
[26]
On size-biased sampling and related form-invariant weighted distributions.Sankhya Ser
Patil G, Ord J. On size-biased sampling and related form-invariant weighted distributions.Sankhya Ser . B.1976;38:48-61. Nonparametric Tolerance Limits and Sample Size With Biased Sampling 11 APPENDIX A CLOSURE OF SIZE-BIASED GENERALIZED GAMMA DISTRIBUTION SupposeXis Generalized Gamma distributed with shape parameters α, δ and rate parameter β having dens...
work page 1976
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.