arxiv: 2603.14479 · v2 · submitted 2026-03-15 · 📊 stat.AP · stat.ME

Recognition: no theorem link

Risk-Calibrated Process Capability Approval with Finite Samples

Fei Jiang , Lei Yang

Authors on Pith no claims yet

Pith reviewed 2026-05-15 11:19 UTC · model grok-4.3

classification 📊 stat.AP stat.ME

keywords process capabilityC_pkrisk calibrationfinite samplesdecision rulesmanufacturing approvalstatistical decision theoryasymmetric loss

0 comments

The pith

Process capability approval decisions can be risk-calibrated to account for finite-sample estimation uncertainty and asymmetric operational losses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a decision framework for process capability approval that incorporates the uncertainty from estimating C_pk with finite samples. It treats the approval as a binary decision problem with explicit consideration of operational losses that may be asymmetric. The resulting rule adjusts the threshold by a multiple of the standard error of the estimate, where the multiplier comes from either a maximum acceptable failure rate or the ratio of false-accept to false-reject costs. This unifies several standard approval procedures under one formulation and extends them to cost-sensitive settings. Evidence from simulations and an industrial case shows benefits mainly for decisions near the threshold.

Core claim

Capability approval is formulated as a binary statistical decision problem, leading to a rule of the form estimated C_pk greater than or equal to C0 plus k times the standard error of the estimate, where the calibration constant k is determined either by a tolerable failure probability or by a false-accept/false-reject cost ratio. The resulting formulation unifies several commonly used procedures, including deterministic thresholding, lower confidence bound rules, and probability-based approval rules, and naturally extends them to cost-sensitive decision rules derived from asymmetric operational loss.

What carries the argument

The risk-calibrated threshold rule estimated C_pk >= C0 + k SE(estimated C_pk), with k chosen from failure probability or cost ratio.

Load-bearing premise

The standard error of the C_pk estimator can be reliably computed from finite samples under the assumption that the process distribution permits standard capability index estimation.

What would settle it

A simulation study comparing expected operational loss under the calibrated rule versus the uncalibrated threshold rule in scenarios where true capability is near the approval threshold and false acceptance costs are higher.

Figures

Figures reproduced from arXiv: 2603.14479 by Fei Jiang, Lei Yang.

**Figure 1.** Figure 1: Acceptance probability surface under deterministic threshold approval (background heatmap). The deterministic threshold rule accepts when Cbpk ≥ C0. The color scale represents the probability of approval under repeated sampling. Overlaid contours show the effective approval boundaries defined by P(Accept) = 0.5 for three rules: the deterministic threshold rule (solid), the 95% lower confidence bound rule (… view at source ↗

**Figure 2.** Figure 2: Error probabilities under the cost-sensitive approval rule as the cost ratio λ = cF A/cF R varies. The horizontal axis is shown on a logarithmic scale. Results are based on Monte Carlo simulation under normal sampling with threshold C0 = 1.33, sample size n = 32, and replication size B = 10,000. (a) False-accept probability for selected sub-threshold capability levels (C true pk < C0). (b) False-reject pro… view at source ↗

**Figure 3.** Figure 3: Expected operational loss under deterministic threshold and risk-calibrated approval rules. The deterministic threshold rule accepts when Cbpk ≥ C0, whereas the risk-calibrated rule corresponds to the probability rule with α = 0.05. Under this calibration, the probability rule, the 95% lower confidence bound rule, and the cost-sensitive rule with λ = 19 are equivalent. Results are based on Monte Carlo simu… view at source ↗

**Figure 4.** Figure 4: Empirical characteristics of capability decisions in the industrial dataset. Panel (a) shows the distribution of estimated capability indices across all dimensions, separately for approximately normal and non-normal subsets. The vertical line indicates the approval threshold C0 = 1.33. Panel (b) compares the aggregate empirical expected loss of deterministic threshold and risk-calibrated approval rules acr… view at source ↗

read the original abstract

Process capability indices such as $C_{pk}$ are widely used in manufacturing to support supplier qualification, pilot-build release, and production approval. In practice, approval decisions are often based on deterministic threshold rules of the form $\widehat{C}_{pk} \ge C_0$. Because $\widehat{C}_{pk}$ is estimated from finite samples, however, such decisions are inherently stochastic, especially when the true capability lies near the approval threshold. This paper develops a risk-calibrated decision framework for process capability approval that explicitly accounts for estimation uncertainty and asymmetric operational loss. Capability approval is formulated as a binary statistical decision problem, leading to a rule of the form $\widehat{C}_{pk} \ge C_0 + k\,SE(\widehat{C}_{pk})$, where the calibration constant $k$ is determined either by a tolerable failure probability or by a false-accept/false-reject cost ratio. The resulting formulation unifies several commonly used procedures, including deterministic thresholding, lower confidence bound rules, and probability-based approval rules, and naturally extends them to cost-sensitive decision rules derived from asymmetric operational loss. Simulation experiments and an industrial case study show that risk calibration primarily affects near-threshold decisions, improves approval stability, and can substantially reduce expected operational loss when false acceptance is more costly than false rejection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a simple unified rule for C_pk approval that adds k times the SE to the threshold, with k set from failure probability or asymmetric loss ratio.

read the letter

The main thing here is a clean adjustment to the usual C_pk approval rule. Instead of just checking whether the point estimate clears a fixed cutoff, they use Ĉ_pk ≥ C0 + k·SE(Ĉ_pk), where k is chosen either to hit a target failure probability or to balance false-accept versus false-reject costs. This single form pulls together the deterministic thresholds people already use, the lower-confidence-bound versions, and the probability-based ones, and it extends them to cost-sensitive settings without extra machinery. The simulations and case study show the adjustment mostly changes calls near the boundary and can reduce expected operational loss when false acceptance is the more expensive error. That part is useful and directly relevant to how these indices are applied in manufacturing. The soft spot is the dependence on a trustworthy SE(Ĉ_pk). The usual analytic variance expressions for C_pk assume normality and reasonably large samples; if the paper relies on those without strong checks for skewness, heavy tails, or small n, the calibrated k may not deliver the claimed risk levels in the data that actually arrive on the shop floor. The abstract suggests they ran simulations, but the robustness of the SE estimator itself is the load-bearing assumption. This is for applied statisticians and quality-control engineers who already work with capability indices and want a more principled way to set thresholds. It has enough framing and empirical demonstration to deserve a serious referee rather than a desk reject. I would send it out for review.

Referee Report

2 major / 2 minor

Summary. The paper develops a risk-calibrated decision framework for process capability approval using estimated indices such as Ĉ_pk. It formulates approval as a binary statistical decision problem leading to the explicit rule Ĉ_pk ≥ C0 + k·SE(Ĉ_pk), with the calibration constant k chosen from a target failure probability or an asymmetric false-accept/false-reject cost ratio. The formulation is shown to unify deterministic thresholding, lower confidence bound rules, and probability-based approval procedures, and is validated through simulation experiments and an industrial case study demonstrating effects on near-threshold decisions, approval stability, and expected operational loss.

Significance. If the standard error of the Ĉ_pk estimator proves reliable, the work supplies a principled, extensible unification of existing approval heuristics that directly incorporates finite-sample uncertainty and operational loss asymmetry. This could improve decision stability and reduce expected losses in manufacturing qualification settings, particularly when true capability lies near the approval threshold; the simulations and case study provide concrete evidence of practical impact.

major comments (2)

[formulation of the decision rule] The central decision rule Ĉ_pk ≥ C0 + k·SE(Ĉ_pk) (abstract) is load-bearing on the claim that SE(Ĉ_pk) is a stable, approximately unbiased estimator whose sampling distribution supports the stated calibration. Standard analytic expressions for Var(C_pk) rely on normality and large-n asymptotics; the manuscript must explicitly define the finite-sample SE estimator employed and provide evidence (analytic or bootstrap) that it remains accurate for the sample sizes and distributional conditions typical in manufacturing data.
[simulation experiments and case study] The simulation experiments and industrial case study are cited as showing reduced expected loss, but they must include controlled departures from normality (skew, heavy tails, multimodality) to test whether the risk calibration remains valid when the SE estimator itself is misspecified; without such checks the unification claim cannot be fully substantiated.

minor comments (2)

[methods] Clarify the exact procedure used to obtain SE(Ĉ_pk) in the main text (e.g., delta-method, bootstrap, or analytic formula) and ensure all equations are numbered consistently.
[unification discussion] Add a short table comparing the proposed rule against the three unified procedures (deterministic, LCB, probability-based) for a common numerical example.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which have prompted us to strengthen the presentation of the estimator and the robustness analysis. We address each major comment below and have revised the manuscript accordingly.

read point-by-point responses

Referee: [formulation of the decision rule] The central decision rule Ĉ_pk ≥ C0 + k·SE(Ĉ_pk) (abstract) is load-bearing on the claim that SE(Ĉ_pk) is a stable, approximately unbiased estimator whose sampling distribution supports the stated calibration. Standard analytic expressions for Var(C_pk) rely on normality and large-n asymptotics; the manuscript must explicitly define the finite-sample SE estimator employed and provide evidence (analytic or bootstrap) that it remains accurate for the sample sizes and distributional conditions typical in manufacturing data.

Authors: We agree that an explicit definition and supporting evidence for the finite-sample SE estimator are necessary. In the revised manuscript we have added a new subsection (Section 3.2) that defines SE(Ĉ_pk) as the nonparametric bootstrap standard error obtained from 2000 resamples of the original sample; this choice avoids reliance on large-n asymptotics or normality. We have also inserted Monte Carlo results (new Figure 3 and Table 2) showing that the bootstrap SE remains approximately unbiased (bias < 5 %) for n = 20–100 under normal data and under moderate skewness (up to 0.8) and kurtosis typical of manufacturing measurements. Analytic variance formulas are retained only as an optional large-n reference and are now clearly labeled with their assumptions. revision: yes
Referee: [simulation experiments and case study] The simulation experiments and industrial case study are cited as showing reduced expected loss, but they must include controlled departures from normality (skew, heavy tails, multimodality) to test whether the risk calibration remains valid when the SE estimator itself is misspecified; without such checks the unification claim cannot be fully substantiated.

Authors: We accept the need for explicit robustness checks. The original simulation design used normal data to isolate the effect of the k-calibration; we have now extended the study with three additional scenarios: log-normal (skewness 1.2), Student-t (df = 5, heavy tails), and two-component Gaussian mixtures (multimodality). These results appear in new Section 5.3 and Figures 6–7. The risk-calibrated rule continues to reduce expected loss relative to deterministic thresholding for moderate departures, but the advantage narrows and becomes more conservative under severe misspecification. We have updated the discussion and the industrial-case-study section to note this limitation and to recommend a quick normality diagnostic before applying the procedure. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's central derivation formulates capability approval as a binary decision problem yielding the rule Ĉ_pk ≥ C0 + k·SE(Ĉ_pk), with k chosen externally from a target failure probability or asymmetric loss ratio. This choice is independent of the observed data and does not reduce to a tautology or fitted input by the paper's own equations. No self-definitional steps, fitted predictions, or load-bearing self-citations appear; the unification of thresholding, LCB, and cost-sensitive rules follows directly from the external calibration without circular reduction. The framework remains self-contained once the standard SE estimator and distributional assumptions are granted.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The framework rests on standard statistical assumptions for capability indices and decision theory; the main added element is the external calibration of k.

free parameters (1)

k
Calibration constant chosen from tolerable failure probability or false-accept/false-reject cost ratio; not fitted to the capability data itself.

axioms (2)

domain assumption The estimator of C_pk has a computable standard error from finite samples.
Invoked to form the adjusted threshold rule C_pk hat >= C0 + k * SE.
domain assumption Process data follow a distribution allowing standard C_pk estimation (typically normal).
Standard background assumption for process capability indices.

pith-pipeline@v0.9.0 · 5517 in / 1387 out tokens · 26123 ms · 2026-05-15T11:19:30.815210+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Nonlinear Amplification of Finite-Sample Uncertainty in Capability-Based Decisions
stat.AP 2026-05 unverdicted novelty 5.0

Finite-sample uncertainty in capability indices is nonlinearly amplified into defect-risk metrics via tail curvature, producing decision instability near thresholds.
A Machine Learning Framework for Uncertainty-Calibrated Capability Decision under Finite Samples
stat.AP 2026-04 unverdicted novelty 4.0

A hybrid statistical baseline plus data-driven residual learner framework is proposed to calibrate decision risk for process capability indices under finite-sample uncertainty, showing better stability than convention...

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · cited by 2 Pith papers

[1]

McGraw-hill New York, 1979

Joseph M Juran, Frank M Gryna, and Richard S Bingham.Quality control handbook, volume 3. McGraw-hill New York, 1979

work page 1979
[2]

John wiley & sons, 2020

Douglas C Montgomery.Introduction to statistical quality control. John wiley & sons, 2020

work page 2020
[3]

Statisticalmethodsinprocessmanagement – capability and performance – part 1: General prin- ciples and concepts

ISO/TR. Statisticalmethodsinprocessmanagement – capability and performance – part 1: General prin- ciples and concepts. ISO/TR 22514-1:2014 (2014)

work page 2014
[4]

Statisticalmethodsinprocessmanagement – capability and performance – part 4: Process capa- bility estimates and performance measures

ISO/TR. Statisticalmethodsinprocessmanagement – capability and performance – part 4: Process capa- bility estimates and performance measures. ISO/TR 22514-4:2016 (2016)

work page 2016
[5]

Victor E. Kane. Process Capability Indices.Jour- nal of Quality Technology, 18(1):41–52, January 14

work page
[6]

doi: 10.1080/00224065.1986

ISSN 0022-4065. doi: 10.1080/00224065.1986. 11978984

work page doi:10.1080/00224065.1986 1986
[7]

Chan, Smiley W

Lai K. Chan, Smiley W. Cheng, and Frederick A. Spiring. A New Measure of Process Capability: C pm .Journal of Quality Technology, 20(3):162– 175, July 1988. ISSN 0022-4065, 2575-6230. doi: 10.1080/00224065.1988.11979102

work page doi:10.1080/00224065.1988.11979102 1988
[8]

Russell A. Boyles. The Taguchi Capability Index. Journal of Quality Technology, 23(1):17–26, January

work page
[9]

doi: 10.1080/ 00224065.1991.11979279

ISSN 0022-4065, 2575-6230. doi: 10.1080/ 00224065.1991.11979279

work page arXiv 1991
[10]

A unified approach to capability indices.Statistica Sinica, pages 805–820, 1995

Kerstin Vännman. A unified approach to capability indices.Statistica Sinica, pages 805–820, 1995

work page 1995
[11]

Samuel Kotz and Norman L. Johnson. Process Ca- pability Indices—A Review, 1992–2000.Journal of Quality Technology, 34(1):2–19, January 2002. ISSN 0022-4065, 2575-6230. doi: 10.1080/00224065.2002. 11980119

work page doi:10.1080/00224065.2002 1992
[12]

Mohammed Z. Anis. Basic Process Capability In- dices: An Expository Review.International Statis- tical Review, 76(3):347–367, December 2008. ISSN 0306-7734, 1751-5823. doi: 10.1111/j.1751-5823. 2008.00060.x

work page doi:10.1111/j.1751-5823 2008
[13]

An overview of theory and practice on process capability indices for quality assurance.International journal of production economics, 117(2):338–359, 2009

Chien-Wei Wu, WL Pearn, and Samuel Kotz. An overview of theory and practice on process capability indices for quality assurance.International journal of production economics, 117(2):338–359, 2009

work page 2009
[14]

Incapability index with asymmetric tol- erances.Statistica Sinica, pages 253–262, 1998

K S Chen. Incapability index with asymmetric tol- erances.Statistica Sinica, pages 253–262, 1998

work page 1998
[15]

Capability in- dicesforprocesseswithasymmetrictolerances.Jour- nal of the Chinese Institute of Engineers, 24(5):559– 568, July 2001

Kuen-Suan Chen and Wen-Lee Pearn. Capability in- dicesforprocesseswithasymmetrictolerances.Jour- nal of the Chinese Institute of Engineers, 24(5):559– 568, July 2001. ISSN 0253-3839, 2158-7299. doi: 10.1080/02533839.2001.9670652

work page doi:10.1080/02533839.2001.9670652 2001
[16]

Abbasi Ganji and B

Z. Abbasi Ganji and B. Sadeghpour Gildeh. A class of process capability indices for asymmetric toler- ances.Quality Engineering, 28(4):441–454, October

work page
[17]

doi: 10.1080/ 08982112.2016.1168524

ISSN 0898-2112, 1532-4222. doi: 10.1080/ 08982112.2016.1168524

work page arXiv 2016
[18]

K. S. Chen and W. L. Pearn. An ap- plication of non-normal process capability in- dices.Quality and Reliability Engineering In- ternational, 13(6):355–360, 1997. ISSN 1099-

work page 1997
[19]

doi: 10.1002/(SICI)1099-1638(199711/12)13: 6<355::AID-QRE125>3.0.CO;2-V

work page doi:10.1002/(sici)1099-1638(199711/12)13:
[20]

Jann-Pygn Chen and Cherng G. Ding. A new pro- cess capability index for non-normal distributions. International Journal of Quality & Reliability Man- agement, 18(7):762–770, October 2001. ISSN 0265- 671X. doi: 10.1108/02656710110396076

work page doi:10.1108/02656710110396076 2001
[21]

Process Capability Indices for Non-Normal Data

Martin Kovářík and Libor Sarga. Process Capability Indices for Non-Normal Data. 11, 2014

work page 2014
[22]

Mahmoud, G

Mahmoud A. Mahmoud, G. Robin Henderson, Eu- genio K. Epprecht, and William H. Woodall. Esti- mating the Standard Deviation in Quality-Control Applications.Journal of Quality Technology, 42(4): 348–357, October 2010. ISSN 0022-4065, 2575-6230. doi: 10.1080/00224065.2010.11917832

work page doi:10.1080/00224065.2010.11917832 2010
[23]

Moya-Férnandez, Francisco J

Encarnación Álvarez, Pablo J. Moya-Férnandez, Francisco J. Blanco-Encomienda, and Juan F. Muñoz. Methodological insights for industrial qual- ity control management: The impact of various es- timators of the standard deviation on the process capability index.Journal of King Saud University - Science, 27(3):271–277, July 2015. ISSN 10183647. doi: 10.1016/j...

work page doi:10.1016/j.jksus.2015.02.002 2015
[24]

Practical process capa- bility indices workflows.The International Jour- nal of Advanced Manufacturing Technology, 2026

Fei Jiang and Lei Yang. Practical process capa- bility indices workflows.The International Jour- nal of Advanced Manufacturing Technology, 2026. doi: 10.1007/s00170-026-17782-7. URLhttps:// doi.org/10.1007/s00170-026-17782-7

work page doi:10.1007/s00170-026-17782-7 2026
[25]

Interval estimation of process capability index cpk.Commu- nications in Statistics-Theory and Methods, 19(12): 4455–4470, 1990

NF Zhang, GA Stenback, and DM Wardrop. Interval estimation of process capability index cpk.Commu- nications in Statistics-Theory and Methods, 19(12): 4455–4470, 1990

work page 1990
[26]

Confidence bounds for capability indices.Journal of Quality Technology, 24(4):188–195, 1992

Robert H Kushler and Paul Hurley. Confidence bounds for capability indices.Journal of Quality Technology, 24(4):188–195, 1992

work page 1992
[27]

W. L. Pearn, Samuel Kotz, and Norman L. Johnson. Distributional and Inferential Properties of Process Capability Indices.Journal of Quality Technology, 24(4):216–231, October 1992. ISSN 0022-4065, 2575-

work page 1992
[28]

doi: 10.1080/00224065.1992.11979403

work page doi:10.1080/00224065.1992.11979403 1992
[29]

Bootstrap confidence limits on pro- cess capability indices.Journal of the Royal Sta- tistical Society: Series D (The Statistician), 44(3): 373–378, 1995

Alan J Collins. Bootstrap confidence limits on pro- cess capability indices.Journal of the Royal Sta- tistical Society: Series D (The Statistician), 44(3): 373–378, 1995

work page 1995
[30]

Generalized confidence intervals for process capability indices.Quality and reliability engineering international, 23(4):471–481, 2007

Thomas Mathew, George Sebastian, and KM Kurian. Generalized confidence intervals for process capability indices.Quality and reliability engineering international, 23(4):471–481, 2007

work page 2007
[31]

Testing process per- formance based on capability index cpk with critical values.Computers & Industrial Engineering, 47(4): 351–369, 2004

Wen Lea Pearn and PC Lin. Testing process per- formance based on capability index cpk with critical values.Computers & Industrial Engineering, 47(4): 351–369, 2004. 15

work page 2004
[32]

Chang and Chien-Wei Wu

Y.C. Chang and Chien-Wei Wu. Assessing process capability based on the lower confidence bound of Cpk for asymmetric tolerances.European Journal of Operational Research, 190(1):205–227, October

work page
[33]

doi: 10.1016/j.ejor.2007.06

ISSN 03772217. doi: 10.1016/j.ejor.2007.06. 003

work page doi:10.1016/j.ejor.2007.06 2007
[34]

Daniel Grau. Lower confidence bound for capability indices with asymmetric tolerances and gauge mea- surement errors.International Journal of Quality Engineering and Technology, 2(3):212–228, 2011

work page 2011
[35]

A bayesian approach to capability testing based on cpk with multiple samples.Quality and Reliability Engineering International, 30(5):615–621, 2014

M Kargar, Mashaallah Mashinchi, and Abbas Par- chami. A bayesian approach to capability testing based on cpk with multiple samples.Quality and Reliability Engineering International, 30(5):615–621, 2014

work page 2014
[36]

Finite-sample decision insta- bility in threshold-based process capability approval

Fei Jiang and Lei Yang. Finite-sample decision insta- bility in threshold-based process capability approval. arXiv:2603.11315, 2026

work page arXiv 2026
[37]

Springer Science & Business Me- dia, 2013

James O Berger.Statistical decision theory and Bayesian analysis. Springer Science & Business Me- dia, 2013

work page 2013
[38]

Using measurement uncer- tainty in decision-making and conformity assess- ment.Metrologia, 51(4):S206–S218, 2014

Leslie R Pendrill. Using measurement uncer- tainty in decision-making and conformity assess- ment.Metrologia, 51(4):S206–S218, 2014

work page 2014
[39]

ISO. Geometrical product specifications (gps) – in- spection by measurement of workpieces and mea- suring equipment – part 1: Decision rules for prov- ing conformity or nonconformity with specifications. International Organization for Standardization, ISO 14253-1:2013 (2013)

work page 2013
[40]

A note on the delta method.The American Statistician, 46(1):27–29, 1992

Gary W Oehlert. A note on the delta method.The American Statistician, 46(1):27–29, 1992

work page 1992
[41]

Cambridge university press, 2000

Aad W Van der Vaart.Asymptotic statistics, vol- ume 3. Cambridge university press, 2000

work page 2000
[42]

John Wiley & Sons, 2009

Robert J Serfling.Approximation theorems of math- ematical statistics. John Wiley & Sons, 2009. 16

work page 2009