pith. machine review for the scientific record. sign in

arxiv: 2603.14479 · v2 · submitted 2026-03-15 · 📊 stat.AP · stat.ME

Recognition: no theorem link

Risk-Calibrated Process Capability Approval with Finite Samples

Authors on Pith no claims yet

Pith reviewed 2026-05-15 11:19 UTC · model grok-4.3

classification 📊 stat.AP stat.ME
keywords process capabilityC_pkrisk calibrationfinite samplesdecision rulesmanufacturing approvalstatistical decision theoryasymmetric loss
0
0 comments X

The pith

Process capability approval decisions can be risk-calibrated to account for finite-sample estimation uncertainty and asymmetric operational losses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a decision framework for process capability approval that incorporates the uncertainty from estimating C_pk with finite samples. It treats the approval as a binary decision problem with explicit consideration of operational losses that may be asymmetric. The resulting rule adjusts the threshold by a multiple of the standard error of the estimate, where the multiplier comes from either a maximum acceptable failure rate or the ratio of false-accept to false-reject costs. This unifies several standard approval procedures under one formulation and extends them to cost-sensitive settings. Evidence from simulations and an industrial case shows benefits mainly for decisions near the threshold.

Core claim

Capability approval is formulated as a binary statistical decision problem, leading to a rule of the form estimated C_pk greater than or equal to C0 plus k times the standard error of the estimate, where the calibration constant k is determined either by a tolerable failure probability or by a false-accept/false-reject cost ratio. The resulting formulation unifies several commonly used procedures, including deterministic thresholding, lower confidence bound rules, and probability-based approval rules, and naturally extends them to cost-sensitive decision rules derived from asymmetric operational loss.

What carries the argument

The risk-calibrated threshold rule estimated C_pk >= C0 + k SE(estimated C_pk), with k chosen from failure probability or cost ratio.

Load-bearing premise

The standard error of the C_pk estimator can be reliably computed from finite samples under the assumption that the process distribution permits standard capability index estimation.

What would settle it

A simulation study comparing expected operational loss under the calibrated rule versus the uncalibrated threshold rule in scenarios where true capability is near the approval threshold and false acceptance costs are higher.

Figures

Figures reproduced from arXiv: 2603.14479 by Fei Jiang, Lei Yang.

Figure 1
Figure 1. Figure 1: Acceptance probability surface under deterministic threshold approval (background heatmap). The deterministic threshold rule accepts when Cbpk ≥ C0. The color scale represents the probability of approval under repeated sampling. Overlaid contours show the effective approval boundaries defined by P(Accept) = 0.5 for three rules: the deterministic threshold rule (solid), the 95% lower confidence bound rule (… view at source ↗
Figure 2
Figure 2. Figure 2: Error probabilities under the cost-sensitive approval rule as the cost ratio λ = cF A/cF R varies. The horizontal axis is shown on a logarithmic scale. Results are based on Monte Carlo simulation under normal sampling with threshold C0 = 1.33, sample size n = 32, and replication size B = 10,000. (a) False-accept probability for selected sub-threshold capability levels (C true pk < C0). (b) False-reject pro… view at source ↗
Figure 3
Figure 3. Figure 3: Expected operational loss under deterministic threshold and risk-calibrated approval rules. The deterministic threshold rule accepts when Cbpk ≥ C0, whereas the risk-calibrated rule corresponds to the probability rule with α = 0.05. Under this calibration, the probability rule, the 95% lower confidence bound rule, and the cost-sensitive rule with λ = 19 are equivalent. Results are based on Monte Carlo simu… view at source ↗
Figure 4
Figure 4. Figure 4: Empirical characteristics of capability decisions in the industrial dataset. Panel (a) shows the distribution of estimated capability indices across all dimensions, separately for approximately normal and non-normal subsets. The vertical line indicates the approval threshold C0 = 1.33. Panel (b) compares the aggregate empirical expected loss of deterministic threshold and risk-calibrated approval rules acr… view at source ↗
read the original abstract

Process capability indices such as $C_{pk}$ are widely used in manufacturing to support supplier qualification, pilot-build release, and production approval. In practice, approval decisions are often based on deterministic threshold rules of the form $\widehat{C}_{pk} \ge C_0$. Because $\widehat{C}_{pk}$ is estimated from finite samples, however, such decisions are inherently stochastic, especially when the true capability lies near the approval threshold. This paper develops a risk-calibrated decision framework for process capability approval that explicitly accounts for estimation uncertainty and asymmetric operational loss. Capability approval is formulated as a binary statistical decision problem, leading to a rule of the form $\widehat{C}_{pk} \ge C_0 + k\,SE(\widehat{C}_{pk})$, where the calibration constant $k$ is determined either by a tolerable failure probability or by a false-accept/false-reject cost ratio. The resulting formulation unifies several commonly used procedures, including deterministic thresholding, lower confidence bound rules, and probability-based approval rules, and naturally extends them to cost-sensitive decision rules derived from asymmetric operational loss. Simulation experiments and an industrial case study show that risk calibration primarily affects near-threshold decisions, improves approval stability, and can substantially reduce expected operational loss when false acceptance is more costly than false rejection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper develops a risk-calibrated decision framework for process capability approval using estimated indices such as Ĉ_pk. It formulates approval as a binary statistical decision problem leading to the explicit rule Ĉ_pk ≥ C0 + k·SE(Ĉ_pk), with the calibration constant k chosen from a target failure probability or an asymmetric false-accept/false-reject cost ratio. The formulation is shown to unify deterministic thresholding, lower confidence bound rules, and probability-based approval procedures, and is validated through simulation experiments and an industrial case study demonstrating effects on near-threshold decisions, approval stability, and expected operational loss.

Significance. If the standard error of the Ĉ_pk estimator proves reliable, the work supplies a principled, extensible unification of existing approval heuristics that directly incorporates finite-sample uncertainty and operational loss asymmetry. This could improve decision stability and reduce expected losses in manufacturing qualification settings, particularly when true capability lies near the approval threshold; the simulations and case study provide concrete evidence of practical impact.

major comments (2)
  1. [formulation of the decision rule] The central decision rule Ĉ_pk ≥ C0 + k·SE(Ĉ_pk) (abstract) is load-bearing on the claim that SE(Ĉ_pk) is a stable, approximately unbiased estimator whose sampling distribution supports the stated calibration. Standard analytic expressions for Var(C_pk) rely on normality and large-n asymptotics; the manuscript must explicitly define the finite-sample SE estimator employed and provide evidence (analytic or bootstrap) that it remains accurate for the sample sizes and distributional conditions typical in manufacturing data.
  2. [simulation experiments and case study] The simulation experiments and industrial case study are cited as showing reduced expected loss, but they must include controlled departures from normality (skew, heavy tails, multimodality) to test whether the risk calibration remains valid when the SE estimator itself is misspecified; without such checks the unification claim cannot be fully substantiated.
minor comments (2)
  1. [methods] Clarify the exact procedure used to obtain SE(Ĉ_pk) in the main text (e.g., delta-method, bootstrap, or analytic formula) and ensure all equations are numbered consistently.
  2. [unification discussion] Add a short table comparing the proposed rule against the three unified procedures (deterministic, LCB, probability-based) for a common numerical example.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which have prompted us to strengthen the presentation of the estimator and the robustness analysis. We address each major comment below and have revised the manuscript accordingly.

read point-by-point responses
  1. Referee: [formulation of the decision rule] The central decision rule Ĉ_pk ≥ C0 + k·SE(Ĉ_pk) (abstract) is load-bearing on the claim that SE(Ĉ_pk) is a stable, approximately unbiased estimator whose sampling distribution supports the stated calibration. Standard analytic expressions for Var(C_pk) rely on normality and large-n asymptotics; the manuscript must explicitly define the finite-sample SE estimator employed and provide evidence (analytic or bootstrap) that it remains accurate for the sample sizes and distributional conditions typical in manufacturing data.

    Authors: We agree that an explicit definition and supporting evidence for the finite-sample SE estimator are necessary. In the revised manuscript we have added a new subsection (Section 3.2) that defines SE(Ĉ_pk) as the nonparametric bootstrap standard error obtained from 2000 resamples of the original sample; this choice avoids reliance on large-n asymptotics or normality. We have also inserted Monte Carlo results (new Figure 3 and Table 2) showing that the bootstrap SE remains approximately unbiased (bias < 5 %) for n = 20–100 under normal data and under moderate skewness (up to 0.8) and kurtosis typical of manufacturing measurements. Analytic variance formulas are retained only as an optional large-n reference and are now clearly labeled with their assumptions. revision: yes

  2. Referee: [simulation experiments and case study] The simulation experiments and industrial case study are cited as showing reduced expected loss, but they must include controlled departures from normality (skew, heavy tails, multimodality) to test whether the risk calibration remains valid when the SE estimator itself is misspecified; without such checks the unification claim cannot be fully substantiated.

    Authors: We accept the need for explicit robustness checks. The original simulation design used normal data to isolate the effect of the k-calibration; we have now extended the study with three additional scenarios: log-normal (skewness 1.2), Student-t (df = 5, heavy tails), and two-component Gaussian mixtures (multimodality). These results appear in new Section 5.3 and Figures 6–7. The risk-calibrated rule continues to reduce expected loss relative to deterministic thresholding for moderate departures, but the advantage narrows and becomes more conservative under severe misspecification. We have updated the discussion and the industrial-case-study section to note this limitation and to recommend a quick normality diagnostic before applying the procedure. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's central derivation formulates capability approval as a binary decision problem yielding the rule Ĉ_pk ≥ C0 + k·SE(Ĉ_pk), with k chosen externally from a target failure probability or asymmetric loss ratio. This choice is independent of the observed data and does not reduce to a tautology or fitted input by the paper's own equations. No self-definitional steps, fitted predictions, or load-bearing self-citations appear; the unification of thresholding, LCB, and cost-sensitive rules follows directly from the external calibration without circular reduction. The framework remains self-contained once the standard SE estimator and distributional assumptions are granted.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The framework rests on standard statistical assumptions for capability indices and decision theory; the main added element is the external calibration of k.

free parameters (1)
  • k
    Calibration constant chosen from tolerable failure probability or false-accept/false-reject cost ratio; not fitted to the capability data itself.
axioms (2)
  • domain assumption The estimator of C_pk has a computable standard error from finite samples.
    Invoked to form the adjusted threshold rule C_pk hat >= C0 + k * SE.
  • domain assumption Process data follow a distribution allowing standard C_pk estimation (typically normal).
    Standard background assumption for process capability indices.

pith-pipeline@v0.9.0 · 5517 in / 1387 out tokens · 26123 ms · 2026-05-15T11:19:30.815210+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Nonlinear Amplification of Finite-Sample Uncertainty in Capability-Based Decisions

    stat.AP 2026-05 unverdicted novelty 5.0

    Finite-sample uncertainty in capability indices is nonlinearly amplified into defect-risk metrics via tail curvature, producing decision instability near thresholds.

  2. A Machine Learning Framework for Uncertainty-Calibrated Capability Decision under Finite Samples

    stat.AP 2026-04 unverdicted novelty 4.0

    A hybrid statistical baseline plus data-driven residual learner framework is proposed to calibrate decision risk for process capability indices under finite-sample uncertainty, showing better stability than convention...

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · cited by 2 Pith papers

  1. [1]

    McGraw-hill New York, 1979

    Joseph M Juran, Frank M Gryna, and Richard S Bingham.Quality control handbook, volume 3. McGraw-hill New York, 1979

  2. [2]

    John wiley & sons, 2020

    Douglas C Montgomery.Introduction to statistical quality control. John wiley & sons, 2020

  3. [3]

    Statisticalmethodsinprocessmanagement – capability and performance – part 1: General prin- ciples and concepts

    ISO/TR. Statisticalmethodsinprocessmanagement – capability and performance – part 1: General prin- ciples and concepts. ISO/TR 22514-1:2014 (2014)

  4. [4]

    Statisticalmethodsinprocessmanagement – capability and performance – part 4: Process capa- bility estimates and performance measures

    ISO/TR. Statisticalmethodsinprocessmanagement – capability and performance – part 4: Process capa- bility estimates and performance measures. ISO/TR 22514-4:2016 (2016)

  5. [5]

    Victor E. Kane. Process Capability Indices.Jour- nal of Quality Technology, 18(1):41–52, January 14

  6. [6]

    doi: 10.1080/00224065.1986

    ISSN 0022-4065. doi: 10.1080/00224065.1986. 11978984

  7. [7]

    Chan, Smiley W

    Lai K. Chan, Smiley W. Cheng, and Frederick A. Spiring. A New Measure of Process Capability: C pm .Journal of Quality Technology, 20(3):162– 175, July 1988. ISSN 0022-4065, 2575-6230. doi: 10.1080/00224065.1988.11979102

  8. [8]

    Russell A. Boyles. The Taguchi Capability Index. Journal of Quality Technology, 23(1):17–26, January

  9. [9]

    doi: 10.1080/ 00224065.1991.11979279

    ISSN 0022-4065, 2575-6230. doi: 10.1080/ 00224065.1991.11979279

  10. [10]

    A unified approach to capability indices.Statistica Sinica, pages 805–820, 1995

    Kerstin Vännman. A unified approach to capability indices.Statistica Sinica, pages 805–820, 1995

  11. [11]

    Samuel Kotz and Norman L. Johnson. Process Ca- pability Indices—A Review, 1992–2000.Journal of Quality Technology, 34(1):2–19, January 2002. ISSN 0022-4065, 2575-6230. doi: 10.1080/00224065.2002. 11980119

  12. [12]

    Mohammed Z. Anis. Basic Process Capability In- dices: An Expository Review.International Statis- tical Review, 76(3):347–367, December 2008. ISSN 0306-7734, 1751-5823. doi: 10.1111/j.1751-5823. 2008.00060.x

  13. [13]

    An overview of theory and practice on process capability indices for quality assurance.International journal of production economics, 117(2):338–359, 2009

    Chien-Wei Wu, WL Pearn, and Samuel Kotz. An overview of theory and practice on process capability indices for quality assurance.International journal of production economics, 117(2):338–359, 2009

  14. [14]

    Incapability index with asymmetric tol- erances.Statistica Sinica, pages 253–262, 1998

    K S Chen. Incapability index with asymmetric tol- erances.Statistica Sinica, pages 253–262, 1998

  15. [15]

    Capability in- dicesforprocesseswithasymmetrictolerances.Jour- nal of the Chinese Institute of Engineers, 24(5):559– 568, July 2001

    Kuen-Suan Chen and Wen-Lee Pearn. Capability in- dicesforprocesseswithasymmetrictolerances.Jour- nal of the Chinese Institute of Engineers, 24(5):559– 568, July 2001. ISSN 0253-3839, 2158-7299. doi: 10.1080/02533839.2001.9670652

  16. [16]

    Abbasi Ganji and B

    Z. Abbasi Ganji and B. Sadeghpour Gildeh. A class of process capability indices for asymmetric toler- ances.Quality Engineering, 28(4):441–454, October

  17. [17]

    doi: 10.1080/ 08982112.2016.1168524

    ISSN 0898-2112, 1532-4222. doi: 10.1080/ 08982112.2016.1168524

  18. [18]

    K. S. Chen and W. L. Pearn. An ap- plication of non-normal process capability in- dices.Quality and Reliability Engineering In- ternational, 13(6):355–360, 1997. ISSN 1099-

  19. [19]

    doi: 10.1002/(SICI)1099-1638(199711/12)13: 6<355::AID-QRE125>3.0.CO;2-V

  20. [20]

    Jann-Pygn Chen and Cherng G. Ding. A new pro- cess capability index for non-normal distributions. International Journal of Quality & Reliability Man- agement, 18(7):762–770, October 2001. ISSN 0265- 671X. doi: 10.1108/02656710110396076

  21. [21]

    Process Capability Indices for Non-Normal Data

    Martin Kovářík and Libor Sarga. Process Capability Indices for Non-Normal Data. 11, 2014

  22. [22]

    Mahmoud, G

    Mahmoud A. Mahmoud, G. Robin Henderson, Eu- genio K. Epprecht, and William H. Woodall. Esti- mating the Standard Deviation in Quality-Control Applications.Journal of Quality Technology, 42(4): 348–357, October 2010. ISSN 0022-4065, 2575-6230. doi: 10.1080/00224065.2010.11917832

  23. [23]

    Moya-Férnandez, Francisco J

    Encarnación Álvarez, Pablo J. Moya-Férnandez, Francisco J. Blanco-Encomienda, and Juan F. Muñoz. Methodological insights for industrial qual- ity control management: The impact of various es- timators of the standard deviation on the process capability index.Journal of King Saud University - Science, 27(3):271–277, July 2015. ISSN 10183647. doi: 10.1016/j...

  24. [24]

    Practical process capa- bility indices workflows.The International Jour- nal of Advanced Manufacturing Technology, 2026

    Fei Jiang and Lei Yang. Practical process capa- bility indices workflows.The International Jour- nal of Advanced Manufacturing Technology, 2026. doi: 10.1007/s00170-026-17782-7. URLhttps:// doi.org/10.1007/s00170-026-17782-7

  25. [25]

    Interval estimation of process capability index cpk.Commu- nications in Statistics-Theory and Methods, 19(12): 4455–4470, 1990

    NF Zhang, GA Stenback, and DM Wardrop. Interval estimation of process capability index cpk.Commu- nications in Statistics-Theory and Methods, 19(12): 4455–4470, 1990

  26. [26]

    Confidence bounds for capability indices.Journal of Quality Technology, 24(4):188–195, 1992

    Robert H Kushler and Paul Hurley. Confidence bounds for capability indices.Journal of Quality Technology, 24(4):188–195, 1992

  27. [27]

    W. L. Pearn, Samuel Kotz, and Norman L. Johnson. Distributional and Inferential Properties of Process Capability Indices.Journal of Quality Technology, 24(4):216–231, October 1992. ISSN 0022-4065, 2575-

  28. [28]

    doi: 10.1080/00224065.1992.11979403

  29. [29]

    Bootstrap confidence limits on pro- cess capability indices.Journal of the Royal Sta- tistical Society: Series D (The Statistician), 44(3): 373–378, 1995

    Alan J Collins. Bootstrap confidence limits on pro- cess capability indices.Journal of the Royal Sta- tistical Society: Series D (The Statistician), 44(3): 373–378, 1995

  30. [30]

    Generalized confidence intervals for process capability indices.Quality and reliability engineering international, 23(4):471–481, 2007

    Thomas Mathew, George Sebastian, and KM Kurian. Generalized confidence intervals for process capability indices.Quality and reliability engineering international, 23(4):471–481, 2007

  31. [31]

    Testing process per- formance based on capability index cpk with critical values.Computers & Industrial Engineering, 47(4): 351–369, 2004

    Wen Lea Pearn and PC Lin. Testing process per- formance based on capability index cpk with critical values.Computers & Industrial Engineering, 47(4): 351–369, 2004. 15

  32. [32]

    Chang and Chien-Wei Wu

    Y.C. Chang and Chien-Wei Wu. Assessing process capability based on the lower confidence bound of Cpk for asymmetric tolerances.European Journal of Operational Research, 190(1):205–227, October

  33. [33]

    doi: 10.1016/j.ejor.2007.06

    ISSN 03772217. doi: 10.1016/j.ejor.2007.06. 003

  34. [34]

    Daniel Grau. Lower confidence bound for capability indices with asymmetric tolerances and gauge mea- surement errors.International Journal of Quality Engineering and Technology, 2(3):212–228, 2011

  35. [35]

    A bayesian approach to capability testing based on cpk with multiple samples.Quality and Reliability Engineering International, 30(5):615–621, 2014

    M Kargar, Mashaallah Mashinchi, and Abbas Par- chami. A bayesian approach to capability testing based on cpk with multiple samples.Quality and Reliability Engineering International, 30(5):615–621, 2014

  36. [36]

    Finite-sample decision insta- bility in threshold-based process capability approval

    Fei Jiang and Lei Yang. Finite-sample decision insta- bility in threshold-based process capability approval. arXiv:2603.11315, 2026

  37. [37]

    Springer Science & Business Me- dia, 2013

    James O Berger.Statistical decision theory and Bayesian analysis. Springer Science & Business Me- dia, 2013

  38. [38]

    Using measurement uncer- tainty in decision-making and conformity assess- ment.Metrologia, 51(4):S206–S218, 2014

    Leslie R Pendrill. Using measurement uncer- tainty in decision-making and conformity assess- ment.Metrologia, 51(4):S206–S218, 2014

  39. [39]

    ISO. Geometrical product specifications (gps) – in- spection by measurement of workpieces and mea- suring equipment – part 1: Decision rules for prov- ing conformity or nonconformity with specifications. International Organization for Standardization, ISO 14253-1:2013 (2013)

  40. [40]

    A note on the delta method.The American Statistician, 46(1):27–29, 1992

    Gary W Oehlert. A note on the delta method.The American Statistician, 46(1):27–29, 1992

  41. [41]

    Cambridge university press, 2000

    Aad W Van der Vaart.Asymptotic statistics, vol- ume 3. Cambridge university press, 2000

  42. [42]

    John Wiley & Sons, 2009

    Robert J Serfling.Approximation theorems of math- ematical statistics. John Wiley & Sons, 2009. 16