arxiv: 2605.03233 · v1 · submitted 2026-05-04 · 📊 stat.ML · cs.LG

Recognition: unknown

Conformalized Percentile Interval: Finite Sample Validity and Improved Conditional Performance

Ran Zou , Wanrong Zhu , Bin Nan

Authors on Pith no claims yet

Pith reviewed 2026-05-08 17:05 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords conformal predictionpredictive intervalsconditional coverageprobability integral transformneural networksfinite sample validitypercentile calibration

0 comments

The pith

A calibration step applied to probability integral transforms of a neural network's conditional CDF estimate produces predictive intervals with exact finite-sample marginal coverage and improved conditional performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a conformal-style procedure that first fits a neural network to estimate the conditional cumulative distribution function, converts observed responses into probability integral transform values, and then calibrates a percentile interval directly from the empirical distribution of those transformed values. This construction guarantees finite-sample marginal coverage without distributional assumptions while aiming for shorter intervals and better conditional coverage than standard conformal prediction, especially when the response distribution varies with features or contains skewness. A sympathetic reader would care because many real-world prediction tasks need uncertainty estimates that remain reliable even when the underlying model is imperfect and the data exhibit heteroskedasticity.

Core claim

The conformalized percentile interval is formed by applying percentile calibration to the PIT values obtained from a neural-network estimate of the conditional CDF. This yields intervals whose length is the smallest value satisfying the empirical quantile in PIT space, delivering exact finite-sample marginal coverage for any underlying distribution and asymptotic conditional coverage whenever the CDF estimator satisfies mild consistency conditions.

What carries the argument

Probability integral transform applied to the neural-network conditional CDF estimate, followed by empirical percentile calibration on the resulting PIT values to determine interval endpoints.

If this is right

Predictive intervals retain exact marginal coverage even if the neural network CDF model is misspecified.
Under consistency of the CDF estimator, the intervals achieve conditional coverage without requiring perfect knowledge of the true distribution.
Interval lengths adapt to the empirical PIT distribution, producing shorter intervals than methods that ignore the estimated conditional shape.
Feature-dependent miscoverage is reduced because calibration occurs after the PIT step removes most feature dependence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same PIT-calibration idea could be paired with other consistent conditional distribution estimators such as quantile forests or normalizing flows.
In sequential or non-stationary settings the method may need an additional forgetting mechanism to maintain the asymptotic conditional property.
Practical gains would be largest in applications where baseline conformal intervals are overly wide due to strong heteroskedasticity.

Load-bearing premise

The neural network conditional CDF estimate must be accurate enough for the PIT values to become asymptotically independent of the input features.

What would settle it

Apply the method to a dataset with deliberately inaccurate conditional CDF estimates and measure whether marginal coverage still equals the nominal level while conditional coverage deviates from the asymptotic claim.

Figures

Figures reproduced from arXiv: 2605.03233 by Bin Nan, Ran Zou, Wanrong Zhu.

**Figure 1.** Figure 1: Illustration of CPI with z = α/2. Given calibration data and a pre-trained conditional CDF estimator Fb(· | x), CPI (i) computes the calibration PIT values Uj = Fb(Yj | Xj ), (ii) selects lower and upper PIT cutoffs ulo, uhi independently as order statistics of {Uj}, and (iii) outputs the prediction interval Qb(ulo | Xn+1), Qb(uhi | Xn+1) view at source ↗

**Figure 2.** Figure 2: Left: DCP center range [0.45, 0.55] (red shading) versus the mean of Beta(3, 1). Right: the oracle shortest 90% interval (green), together with CPI and DCP interval endpoints from a single calibration draw at four starting points z. score Sj = |Uj − c|, where c is a fixed center of the PIT interval (c = 0.5 for basic DCP and c = z + (1 − α)/2 in the general case). DCP then forms the PIT cutoffs ulo = max(0… view at source ↗

**Figure 3.** Figure 3: conditional performance and noise structure. view at source ↗

**Figure 4.** Figure 4: Coverage vs interval width on the Abalone dataset, stratified by PC1 quartile groups. view at source ↗

**Figure 5.** Figure 5: Coverage vs. mean interval width on the Airfoil dataset, stratified by PC1 quartile groups. view at source ↗

**Figure 6.** Figure 6: Coverage vs. mean interval width on the Computer dataset, stratified by PC1 quartile view at source ↗

**Figure 7.** Figure 7: Coverage vs. mean interval width on the Concrete dataset, stratified by PC1 quartile view at source ↗

**Figure 8.** Figure 8: Coverage vs. mean interval width on the AutoMPG dataset, stratified by PC1 quartile view at source ↗

**Figure 9.** Figure 9: Coverage vs. mean interval width on the Crime dataset, stratified by PC1 quartile groups. view at source ↗

read the original abstract

Conformal prediction provides distribution-free predictive intervals with finite-sample marginal coverage. However, achieving conditional validity and interval efficiency (in terms of short interval length) remains challenging, particularly in complex settings with heteroskedasticity, skewed responses, or estimation errors. We propose a conformal-style calibration method for responses obtained by the probability integral transform (PIT) of the conditional cumulative distribution function (CDF) estimated via neural networks to construct a finite-sample-adjusted percentile interval with the shortest length determined by the estimated conditional CDF. Calibrating in PIT space is effective because PIT values are asymptotically feature-independent when the CDF estimator is accurate, which mitigates feature-dependent miscoverage and improves conditional calibration. On the other hand, our percentile calibration adapts to the empirical PIT distribution, which is robust against a possibly imperfect estimation of the conditional CDF. We prove the finite-sample marginal coverage property of the proposed method and show its asymptotic conditional coverage under mild consistency conditions. Experiments on diverse synthetic and real-world benchmarks demonstrate better conditional calibration and substantially shorter intervals than existing methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives finite-sample marginal coverage for a PIT-calibrated percentile interval and shows shorter lengths in experiments, but the asymptotic conditional coverage rests on NN consistency that may not be as mild as claimed.

read the letter

The core contribution is a conformal-style method that takes PIT values from a neural-network conditional CDF estimator, then builds a percentile interval adjusted for finite samples. This yields the usual marginal coverage guarantee from exchangeability on the transformed scores, plus a claim of asymptotic conditional coverage when the estimator is consistent enough. Experiments on synthetic and real data report shorter intervals and improved conditional calibration compared to standard conformal baselines. That combination of PIT-space calibration with explicit percentile adjustment on the empirical distribution is the main novelty; it tries to get robustness to imperfect CDF fits while keeping the finite-sample property. The marginal coverage part follows directly from standard conformal arguments once the scores are computed on held-out data, so that side looks solid. The experiments appear to cover diverse cases with heteroskedasticity and skewness, which is useful. The soft spot is the asymptotic conditional result. It requires the PIT values to become asymptotically independent of the features x, which in turn needs the NN CDF estimator to converge in a uniform or strong enough sense across the feature space. The abstract calls the conditions mild, but neural nets often achieve only weaker consistency, and residual dependence can remain precisely when conditional coverage matters most. If the full proofs spell out explicit rates or uniform convergence assumptions, that would clarify things; otherwise the claim is more fragile than it reads. This is aimed at people working on conformal prediction and conditional uncertainty in ML. A reader interested in practical interval construction with some theory will get value from the experiments and the marginal guarantee. It deserves serious referee time because the problem is real and the finite-sample part is clean, even if the asymptotic part needs tightening.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes the Conformalized Percentile Interval (CPI), which applies a conformal-style calibration step to the probability integral transform (PIT) values obtained from a neural-network estimate of the conditional CDF. This yields predictive intervals whose length is determined by the estimated conditional CDF, with a claim of finite-sample marginal coverage (via exchangeability of the calibrated PIT scores) and asymptotic conditional coverage under mild consistency conditions on the CDF estimator. Experiments on synthetic and real-world data are reported to show improved conditional calibration and shorter intervals relative to existing conformal methods.

Significance. If the finite-sample marginal guarantee and the asymptotic conditional coverage both hold, the approach would provide a useful practical tool for obtaining shorter, better-calibrated intervals in settings with heteroskedasticity or skewness, while preserving the distribution-free marginal property. The idea of calibrating in PIT space to exploit asymptotic feature-independence is conceptually appealing and could complement existing conformal techniques.

major comments (2)

[§4] §4 (Theoretical Analysis), statement of asymptotic conditional coverage: The 'mild consistency conditions' invoked for the neural-network conditional CDF estimator are not stated with sufficient precision (e.g., no explicit rate or uniformity requirement over x). Without uniform or sup-norm consistency, the PIT scores may retain residual dependence on x, so that the subsequent global percentile calibration on the empirical PIT distribution cannot guarantee the claimed asymptotic conditional coverage; this is load-bearing for the paper's central claim of improved conditional performance.
[§5] §5 (Experiments), Tables 1–3 and Figures 2–4: The reported gains in conditional coverage and interval length are presented without ablation on the quality of the base NN CDF estimator (e.g., no comparison when the NN is deliberately underfit or when the data exhibit strong heteroskedasticity). It is therefore unclear whether the observed improvements are driven by the CPI calibration step itself or by the particular NN architecture and training regime chosen.

minor comments (2)

[§3] Notation in §3: The precise definition of the calibration quantile applied to the PIT scores (and how ties or discrete PIT values are handled) should be written explicitly, as it directly affects the finite-sample coverage argument.
[§1, §5] The abstract and §1 claim 'substantially shorter intervals'; the experimental tables would benefit from reporting the ratio of average lengths together with standard errors across replications.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. We address each major comment below and will revise the paper accordingly to strengthen the presentation of the theoretical results and experimental evidence.

read point-by-point responses

Referee: [§4] §4 (Theoretical Analysis), statement of asymptotic conditional coverage: The 'mild consistency conditions' invoked for the neural-network conditional CDF estimator are not stated with sufficient precision (e.g., no explicit rate or uniformity requirement over x). Without uniform or sup-norm consistency, the PIT scores may retain residual dependence on x, so that the subsequent global percentile calibration on the empirical PIT distribution cannot guarantee the claimed asymptotic conditional coverage; this is load-bearing for the paper's central claim of improved conditional performance.

Authors: We agree that the asymptotic conditional coverage result requires a more precise statement of the consistency assumptions on the neural-network CDF estimator. In the revised manuscript, we will update the theorem in §4 to explicitly require uniform (sup-norm) consistency of the CDF estimator at a rate that vanishes asymptotically, ensuring that the PIT scores become independent of the features x. This will clarify how the global percentile calibration on the empirical PIT distribution yields the claimed asymptotic conditional coverage and will make the load-bearing assumption transparent. revision: yes
Referee: [§5] §5 (Experiments), Tables 1–3 and Figures 2–4: The reported gains in conditional coverage and interval length are presented without ablation on the quality of the base NN CDF estimator (e.g., no comparison when the NN is deliberately underfit or when the data exhibit strong heteroskedasticity). It is therefore unclear whether the observed improvements are driven by the CPI calibration step itself or by the particular NN architecture and training regime chosen.

Authors: We acknowledge that the current experiments do not include explicit ablations on the base estimator quality. While the reported results use a standard NN trained to convergence on benchmarks that already incorporate heteroskedasticity and skewness, we agree that additional controls are needed to isolate the contribution of the CPI step. In the revised version, we will add ablation experiments: (i) deliberately underfit NNs (reduced capacity or early stopping) and (ii) synthetic data with controlled levels of heteroskedasticity, comparing CPI against baselines under these regimes. This will demonstrate that the observed gains in conditional calibration and interval length are attributable to the PIT-space calibration. revision: yes

Circularity Check

0 steps flagged

No circularity: finite-sample coverage follows from exchangeability independent of NN fit; asymptotic result uses external consistency assumptions

full rationale

The derivation separates the finite-sample marginal coverage (obtained via standard conformal exchangeability applied to the PIT scores after training the NN CDF estimator on a held-out set) from the estimator quality itself. The percentile calibration step adapts directly to the empirical distribution of the transformed scores without feeding any fitted parameter back into the coverage guarantee. The asymptotic conditional coverage is explicitly conditioned on mild consistency of the NN estimator (an external assumption, not derived from the method's own outputs), with no self-citation load-bearing the central claim and no renaming or self-definitional reduction of predictions to inputs. The method is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard conformal prediction assumptions plus the accuracy of the neural-network CDF estimator; no new free parameters or invented entities are introduced in the abstract description.

axioms (1)

domain assumption Mild consistency conditions on the conditional CDF estimator
Invoked for the asymptotic conditional coverage guarantee.

pith-pipeline@v0.9.0 · 5478 in / 1214 out tokens · 34064 ms · 2026-05-08T17:05:42.908629+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 7 canonical work pages · 1 internal anchor

[1]

and YU, B

Agarwal, A., Xiao, M., Barter, R., Ronen, O., Fan, B., and Yu, B. Pcs-uq: Uncer- tainty quantification via the predictability-computability-stability framework.arXiv preprint arXiv:2505.08784, 2025

work page arXiv 2025
[2]

arXiv preprint arXiv:2411.11824 , year=

Angelopoulos, A. N., Barber, R. F., and Bates, S. Theoretical foundations of conformal prediction.arXiv preprint arXiv:2411.11824, 2024

work page arXiv 2024
[3]

F., Pope, D

Brooks, T. F., Pope, D. S., and Marcolini, M. A. Airfoil self-noise and prediction. NASA Reference Publication 1218, National Aeronautics and Space Administration, 1989

1989
[4]

and Vovk, V

Burnaev, E. and Vovk, V. Efficiency of conformalized ridge regression. InConference on Learning Theory, pp. 605–622. PMLR, 2014

2014
[5]

Distributional conformal prediction.Proceedings of the National Academy of Sciences, 118(48):e2107794118, 2021

Chernozhukov, V., W¨ uthrich, K., and Zhu, Y. Distributional conformal prediction.Proceedings of the National Academy of Sciences, 118(48):e2107794118, 2021

2021
[6]

and Feldmesser, J

Ein-Dor, P. and Feldmesser, J. Attributes of the performance of central processing units: A relative performance prediction model.Communications of the ACM, 30(4):308–317, 1987. doi: 10.1145/32232.32234

work page doi:10.1145/32232.32234 1987
[7]

J., Ramdas, A., and Tibshirani, R

Foygel Barber, R., Candes, E. J., Ramdas, A., and Tibshirani, R. J. The limits of distribution- free conditional predictive inference.Information and Inference: A Journal of the IMA, 10(2): 455–482, 2021

2021
[8]

J., and Cand` es, E

Gibbs, I., Cherian, J. J., and Cand` es, E. J. Conformal prediction with conditional guarantees. Journal of the Royal Statistical Society Series B: Statistical Methodology, pp. qkaf008, 2025

2025
[9]

Localized conformal prediction: A generalized inference framework for conformal prediction.Biometrika, 110(1):33–50, 2023

Guan, L. Localized conformal prediction: A generalized inference framework for conformal prediction.Biometrika, 110(1):33–50, 2023

2023
[10]

and Barber, R

Hore, R. and Barber, R. F. Conformal prediction with local weights: randomization enables local guarantees.arXiv preprint arXiv:2310.07850, 2023. 20

work page arXiv 2023
[11]

and Nan, B

Hu, B. and Nan, B. Conditional distribution function estimation using neural networks for censored and uncensored data.Journal of Machine Learning Research, 24(223):1–26, 2023

2023
[12]

Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. InInternational Conference on Learning Representations, 2015

2015
[13]

and Hallock, K

Koenker, R. and Hallock, K. F. Quantile regression.Journal of Economic Perspectives, 15(4): 143–156, 2001

2001
[14]

J., and Wasserman, L

Lei, J., G’Sell, M., Rinaldo, A., Tibshirani, R. J., and Wasserman, L. Distribution-free predictive inference for regression.Journal of the American Statistical Association, 113(523):1094–1111, 2018

2018
[15]

Liang, R., Zhu, W., and Barber, R. F. Conformal prediction after efficiency-oriented model selection.arXiv preprint arXiv:2408.07066, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[16]

J., Sellers, T

Nash, W. J., Sellers, T. L., Talbot, S. R., Cawthorn, A. J., and Ford, W. B. The population biology of abalone (Haliotisspecies) in Tasmania. I. Blacklip abalone (H. rubra) from the north coast and islands of Bass Strait. Technical Report 48, Sea Fisheries Division, Department of Primary Industry and Fisheries, Tasmania, 1994

1994
[17]

B., Panov, M., and Moulines, E

Plassier, V., Fishkov, A., Dheur, V., Guizani, M., Taieb, S. B., Panov, M., and Moulines, E. Rectifying conformity scores for better conditional coverage. InInternational Conference on Machine Learning, 2025

2025
[18]

Probabilistic conformal prediction with approximate conditional validity

Plassier, V., Fishkov, A., Guizani, M., Panov, M., and Moulines, E. Probabilistic conformal prediction with approximate conditional validity. InInternational Conference on Learning Representations, 2025

2025
[19]

Quinlan, J. R. Combining instance-based and model-based learning. InProceedings of the 10th International Conference on Machine Learning, pp. 236–243. Morgan Kaufmann, 1993

1993
[20]

and Baveja, A

Redmond, M. and Baveja, A. A data-driven software tool for enabling cooperative information sharing among police departments.European Journal of Operational Research, 141(3):660–678,
[21]

doi: 10.1016/S0377-2217(01)00264-8

work page doi:10.1016/s0377-2217(01)00264-8
[22]

Conformalized quantile regression.Advances in Neural Information Processing Systems, 32, 2019

Romano, Y., Patterson, E., and Candes, E. Conformalized quantile regression.Advances in Neural Information Processing Systems, 32, 2019

2019
[23]

P., and Carroll, R

Ruppert, D., Wand, M. P., and Carroll, R. J.Semiparametric regression. Number 12 in Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2003. 21

2003
[24]

and Romano, Y

Sesia, M. and Romano, Y. Conformal prediction using conditional histograms.Advances in Neural Information Processing Systems, 34:6304–6315, 2021

2021
[25]

T., Doucet, A., et al

Stutz, D., Cemgil, A. T., Doucet, A., et al. Learning optimal conformal classifiers. In International Conference on Learning Representations, 2022

2022
[26]

J., Foygel Barber, R., Candes, E., and Ramdas, A

Tibshirani, R. J., Foygel Barber, R., Candes, E., and Ramdas, A. Conformal prediction under covariate shift.Advances in Neural Information Processing Systems, 32, 2019

2019
[27]

Springer, 2005

Vovk, V., Gammerman, A., and Shafer, G.Algorithmic learning in a random world. Springer, 2005

2005
[28]

Boosted conformal prediction intervals.Advances in Neural Information Processing Systems, 37:71868–71899, 2024

Xie, R., Barber, R., and Candes, E. Boosted conformal prediction intervals.Advances in Neural Information Processing Systems, 37:71868–71899, 2024

2024
[29]

and Kuchibhotla, A

Yang, Y. and Kuchibhotla, A. K. Selection and aggregation of conformal prediction sets. Journal of the American Statistical Association, 120(549):435–447, 2025

2025
[30]

Modeling of strength of high-performance concrete using artificial neural networks

Yeh, I.-C. Modeling of strength of high-performance concrete using artificial neural networks. Cement and Concrete Research, 28(12):1797–1808, 1998. doi: 10.1016/S0008-8846(98)00165-3

work page doi:10.1016/s0008-8846(98)00165-3 1998
[31]

Stability.Bernoulli, 19(4):1484–1500, 2013

Yu, B. Stability.Bernoulli, 19(4):1484–1500, 2013. Appendix A Implementation Details We provide the full implementation details for reproducibility. NN-CDE (Hazard Network).The conditional density estimator is a two-hidden-layer MLP with 64 hidden units per layer and ReLU activations. All features are standardized to zero mean and unit variance. Training ...

2013