pith. machine review for the scientific record. sign in

arxiv: 2605.08561 · v1 · submitted 2026-05-08 · 📊 stat.ML · cs.LG

Recognition: 3 theorem links

· Lean Theorem

CONTRA: Conformal Prediction Region via Normalizing Flow Transformation

Aixin Tan, Jian Huang, Zhenhan Fang

Pith reviewed 2026-05-12 00:57 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords conformal predictionnormalizing flowsprediction regionsdensity estimationmulti-dimensional outputsnonconformity scoresresidual learning
0
0 comments X

The pith

Normalizing flows let conformal prediction produce sharp multi-dimensional regions by using latent-space distance as the nonconformity score.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CONTRA, which trains a normalizing flow to map output data into a latent space where distance from the center defines a nonconformity score. Because the flow is invertible, a simple ball in latent space transforms back into a compact, density-aligned region in the original output space. This region still satisfies the conformal coverage guarantee, and the same idea works by fitting a flow to the residuals of any other predictor. The result is a practical way to obtain reliable, non-rectangular, non-elliptical prediction regions for vector-valued outputs.

Core claim

CONTRA utilizes the latent spaces of normalizing flows to define nonconformity scores based on distances from the center. This allows for the mapping of high-density regions in latent space to sharp prediction regions in the output space, surpassing traditional hyperrectangular or elliptical conformal regions. For scenarios where other predictive models are favored, a simple normalizing flow trained on residuals extends the same guarantee to any base model while preserving exact coverage probability.

What carries the argument

The invertible mapping of a normalizing flow that converts Euclidean distance from the latent-space origin into a nonconformity score, so that a fixed-radius ball in latent space becomes a high-density conformal region after the inverse transform.

If this is right

  • Multi-dimensional conformal sets can take shapes that follow the actual data density instead of being forced into boxes or ellipses.
  • The exact finite-sample coverage guarantee of conformal prediction carries over unchanged through the flow transform.
  • Any existing point predictor can be equipped with a conformal region simply by fitting a flow to its residuals.
  • The same latent-distance construction supplies both prediction regions and a form of conditional density estimate.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the flow approximates the true conditional distribution closely, the resulting regions should approach the smallest possible volume among all sets that meet the coverage target.
  • The approach could be applied to other invertible maps, such as certain diffeomorphisms or autoregressive transforms, without changing the core argument.
  • In high-dimensional regression tasks the method may reduce the volume of uncertainty sets enough to improve downstream decision-making that depends on those sets.

Load-bearing premise

The normalizing flow must be trained well enough that the latent-space distance remains a valid nonconformity measure whose coverage guarantee survives the invertible mapping back to data space.

What would settle it

Train the flow on a held-out calibration set, form the conformal regions at target coverage 0.9, and measure the empirical fraction of test points that fall inside; if this fraction falls below 0.9 by more than sampling error, the coverage claim fails.

Figures

Figures reproduced from arXiv: 2605.08561 by Aixin Tan, Jian Huang, Zhenhan Fang.

Figure 1
Figure 1. Figure 1: NYC Taxi data. Prediction regions of drop-off location given a pickup coordinate (blue pin). (a) shows a 90% high-density region estimated via kernel density estimates (KDE) based on dropoff locations from the 200 nearest pickups to the blue pin. While there is no guarantee of coverage, the shape of this region offers an informal visual reference to help users assess conformal regions. (b)-(f) show various… view at source ↗
Figure 2
Figure 2. Figure 2: Prediction regions of a two-dimensional outcome given a specific x value from the test set. Colored lines show the boundary of various prediction regions. In the case of PCP, K = 3000 disks are shown. Orange points show a random sample of size 2000 from the true conditional distribution of y given x. Beyond numerical metrics, we visualize and compare these conformal regions [PITH_FULL_IMAGE:figures/full_f… view at source ↗
Figure 3
Figure 3. Figure 3: Prediction regions of a two-dimensional outcome given a specific x value from the test set. Colored lines show the boundary of various prediction regions. In the case of PCP, K = 3000 disks are shown. Orange points show a random sample of size 2000 from the true conditional distribution of y given x. E THE IMPACT OF UNDERFITTING AND OVERFITTING ON CONTRA Using a Normalizing Flow model that is either too co… view at source ↗
Figure 4
Figure 4. Figure 4: CONTRA prediction regions and latent z for the calibration set under three different NF models, for drop-off location given a specific pickup coordinate (blue pin) for the NYC taxi data. Data size equals 4800, with a 75%-25% training-calibration split. (a) and (d): An underfitting NF with 2 coupling layers and 16 hidden units per layer, trained for 50 epochs in 4 seconds. (b) and (e): An overfitting NF wit… view at source ↗
Figure 5
Figure 5. Figure 5: The impact of data size on prediction regions of drop-off location given a specific pickup coordinate (blue pin) for the NYC taxi data. The top row shows results of our proposed CONTRA method; the bottom row shows results of ST-DQR based on a diffusion model. Data sizes used for each column are 300, 1200, and 4800, respectively, with a 75%-25% training-calibration split. We examined the performance of CONT… view at source ↗
Figure 6
Figure 6. Figure 6: Comparing CONTRA to ResCONTRA in terms of the latent variable, z, of the calibration set on two different datasets. Plots of z that closely resemble the standard bivariate Gaussian distribution result in smaller calibrated radii, r.9, and better conformal prediction regions. Five conformal prediction methods are implemented: (a) CONTRA with a 10-layer NF; (b) and (c) ResCONTRA with SVR followed by 10- and … view at source ↗
Figure 7
Figure 7. Figure 7: CONTRA conformal regions of the output variable y given some fixed x value for five setups with coverage levels 50%, 70% and 90%, respectively. The orange points are random samples of size 2000 from each of the true conditional distribution of y given x. We can see the CONTRA regions of various levels properly capture the high-density regions in each case. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗
read the original abstract

Density estimation and reliable prediction regions for outputs are crucial in supervised and unsupervised learning. While conformal prediction effectively generates coverage-guaranteed regions, it struggles with multi-dimensional outputs due to reliance on one-dimensional nonconformity scores. To address this, we introduce CONTRA: CONformal prediction region via normalizing flow TRAnsformation. CONTRA utilizes the latent spaces of normalizing flows to define nonconformity scores based on distances from the center. This allows for the mapping of high-density regions in latent space to sharp prediction regions in the output space, surpassing traditional hyperrectangular or elliptical conformal regions. Further, for scenarios where other predictive models are favored over flow-based models, we extend CONTRA to enhance any such model with a reliable prediction region by training a simple normalizing flow on the residuals. We demonstrate that both CONTRA and its extension maintain guaranteed coverage probability and outperform existing methods in generating accurate prediction regions across various datasets. We conclude that CONTRA is an effective tool for (conditional) density estimation, addressing the under-explored challenge of delivering multi-dimensional prediction regions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes CONTRA, a conformal prediction method that trains a normalizing flow and defines nonconformity scores as Euclidean distances from the origin in the resulting latent space. The (1-α) quantile of these scores on a calibration set is mapped back through the inverse flow to produce a prediction region in the original output space. An extension trains a flow on residuals of an arbitrary base predictor to obtain similar regions. The authors claim that both variants retain the standard marginal coverage guarantee of split conformal prediction while yielding sharper, density-adapted regions than hyperrectangular or elliptical baselines, and they report empirical improvements on several datasets.

Significance. If the coverage argument holds, the approach supplies a practical route to flexible, non-parametric multi-dimensional prediction regions that inherit finite-sample validity from conformal prediction while adapting shape to the learned density. This is a useful addition to the conformal toolkit for settings where rigid region shapes are overly conservative. The residual-flow extension further increases applicability to existing point predictors.

minor comments (3)
  1. §2 (Method): the statement that the flow is 'fixed before scoring' should be made explicit, including whether its parameters are estimated exclusively on the training split and never updated with calibration data, to make the exchangeability argument immediate to the reader.
  2. Experiments section: the tables reporting region volumes or coverage should include the number of random seeds or cross-validation folds and any standard errors, given that flow training introduces additional stochasticity.
  3. Notation: define the symbols for the latent variable z, the flow parameters θ, and the nonconformity score s clearly at first use; the current presentation mixes data-space and latent-space quantities without a dedicated notation table.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our work and for recommending minor revision. The report correctly captures the core idea of CONTRA and its residual-flow extension, as well as the claimed marginal coverage guarantee inherited from split conformal prediction. We provide point-by-point responses to the major comments below.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central derivation applies standard split conformal prediction to nonconformity scores defined as Euclidean distances from the origin in the latent space of a pre-trained normalizing flow (or a flow on residuals). The coverage guarantee follows directly from the exchangeability of calibration and test scores under the usual rank-based argument, which holds regardless of how well the flow approximates the data density; the flow only determines region shape and volume after the threshold is obtained. No equation reduces a claimed prediction to a fitted parameter by construction, no uniqueness theorem is imported from self-citation, and the invertible mapping preserves the marginal coverage property without introducing data-dependent circularity between flow training and calibration scores.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the assumption that a trained normalizing flow produces a latent representation in which Euclidean distance from the origin is a valid nonconformity score that preserves conformal validity after inversion. No explicit free parameters, axioms, or invented entities are stated in the abstract.

pith-pipeline@v0.9.0 · 5480 in / 1187 out tokens · 17241 ms · 2026-05-12T00:57:40.467181+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 3 internal anchors

  1. [1]

    Scaling Learning Algorithms Towards

    Bengio, Yoshua and LeCun, Yann , booktitle =. Scaling Learning Algorithms Towards

  2. [2]

    and Osindero, Simon and Teh, Yee Whye , journal =

    Hinton, Geoffrey E. and Osindero, Simon and Teh, Yee Whye , journal =. A Fast Learning Algorithm for Deep Belief Nets , volume =

  3. [3]

    2016 , publisher=

    Deep learning , author=. 2016 , publisher=

  4. [4]

    Density estimation using Real NVP

    Density estimation using real nvp , author=. arXiv preprint arXiv:1605.08803 , year=

  5. [5]

    Journal of the American Statistical Association , volume=

    Distribution-free predictive inference for regression , author=. Journal of the American Statistical Association , volume=. 2018 , publisher=

  6. [6]

    Advances in neural information processing systems , volume=

    Conformalized quantile regression , author=. Advances in neural information processing systems , volume=

  7. [7]

    Conformal and Probabilistic Prediction and Applications , pages=

    Conformal uncertainty sets for robust optimization , author=. Conformal and Probabilistic Prediction and Applications , pages=. 2021 , organization=

  8. [8]

    Annals of Mathematics and Artificial Intelligence , volume=

    A conformal prediction approach to explore functional data , author=. Annals of Mathematics and Artificial Intelligence , volume=. 2015 , publisher=

  9. [9]

    2005 , publisher=

    Algorithmic learning in a random world , author=. 2005 , publisher=

  10. [10]

    Machine Learning: ECML 2002: 13th European Conference on Machine Learning Helsinki, Finland, August 19--23, 2002 Proceedings 13 , pages=

    Inductive confidence machines for regression , author=. Machine Learning: ECML 2002: 13th European Conference on Machine Learning Helsinki, Finland, August 19--23, 2002 Proceedings 13 , pages=. 2002 , organization=

  11. [11]

    A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification

    A gentle introduction to conformal prediction and distribution-free uncertainty quantification , author=. arXiv preprint arXiv:2107.07511 , year=

  12. [12]

    Conformal and Probabilistic Prediction with Applications , pages=

    Ellipsoidal conformal inference for multi-target regression , author=. Conformal and Probabilistic Prediction with Applications , pages=. 2022 , organization=

  13. [13]

    ArXiv , year=

    Learning Likelihoods with Conditional Normalizing Flows , author=. ArXiv , year=

  14. [14]

    Journal of Machine Learning Research , year =

    George Papamakarios and Eric Nalisnick and Danilo Jimenez Rezende and Shakir Mohamed and Balaji Lakshminarayanan , title =. Journal of Machine Learning Research , year =

  15. [15]

    NICE: Non-linear Independent Components Estimation

    Nice: Non-linear independent components estimation , author=. arXiv preprint arXiv:1410.8516 , year=

  16. [16]

    Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics , pages =

    Flexible distribution-free conditional predictive bands using density estimators , author =. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics , pages =. 2020 , editor =

  17. [17]

    arXiv preprint arXiv:2206.06584 , year=

    Probabilistic conformal prediction using conditional random samples , author=. arXiv preprint arXiv:2206.06584 , year=

  18. [18]

    Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze , volume=

    Teoria statistica delle classi e calcolo delle probabilita , author=. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze , volume=

  19. [19]

    Journal of the American statistical association , volume=

    Multiple comparisons among means , author=. Journal of the American statistical association , volume=. 1961 , publisher=

  20. [20]

    Journal of Machine Learning Research , volume=

    Cd-split and hpd-split: Efficient conformal regions in high dimensions , author=. Journal of Machine Learning Research , volume=

  21. [21]

    Multi-target regression via input space expansion: treating targets as inputs , volume=

    Spyromitros-Xioufis, Eleftherios and Tsoumakas, Grigorios and Groves, William and Vlahavas, Ioannis , year=. Multi-target regression via input space expansion: treating targets as inputs , volume=. Machine Learning , publisher=. doi:10.1007/s10994-016-5546-z , number=

  22. [22]

    Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools , journal =

    Athanasios Tsanas and Angeliki Xifara , keywords =. Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools , journal =. 2012 , issn =. doi:https://doi.org/10.1016/j.enbuild.2012.03.003 , url =

  23. [23]

    2013 , howpublished =

    Rana,Prashant , title =. 2013 , howpublished =

  24. [24]

    International Conference on Machine Learning , pages=

    Neural autoregressive flows , author=. International Conference on Machine Learning , pages=. 2018 , organization=

  25. [25]

    Advances in Neural Information Processing Systems , volume=

    Residual flows for invertible generative modeling , author=. Advances in Neural Information Processing Systems , volume=

  26. [26]

    Advances in neural information processing systems , volume=

    Glow: Generative flow with invertible 1x1 convolutions , author=. Advances in neural information processing systems , volume=

  27. [27]

    Prentic Hall of India Private Limited, New delhi , year=

    Topology , author=. Prentic Hall of India Private Limited, New delhi , year=

  28. [28]

    Proceedings of the National Academy of Sciences , volume=

    Distributional conformal prediction , author=. Proceedings of the National Academy of Sciences , volume=. 2021 , publisher=

  29. [29]

    and Pospisil, T

    Dalmasso, N. and Pospisil, T. and Lee, A.B. and Izbicki, R. and Freeman, P.E. and Malz, A.I. , year=. Conditional density estimation tools in python and R with applications to photometric redshifts and likelihood-free cosmological inference , volume=. doi:10.1016/j.ascom.2019.100362 , journal=

  30. [30]

    Approximating conditional distribution functions using dimension reduction , volume=

    Hall, Peter and Yao, Qiwei , year=. Approximating conditional distribution functions using dimension reduction , volume=. The Annals of Statistics , publisher=. doi:10.1214/009053604000001282 , number=

  31. [31]

    arXiv preprint arXiv:1401.3632 , year=

    Bayesian conditional density filtering , author=. arXiv preprint arXiv:1401.3632 , year=

  32. [32]

    Sbornik: Mathematics , volume=

    Triangular transformations of measures , author=. Sbornik: Mathematics , volume=. 2005 , publisher=

  33. [33]

    Advances in neural information processing systems , volume=

    Neural spline flows , author=. Advances in neural information processing systems , volume=

  34. [34]

    Journal of Machine Learning Research , volume=

    Calibrated multiple-output quantile regression with representation learning , author=. Journal of Machine Learning Research , volume=

  35. [35]

    2024 , eprint=

    Normalizing Flows for Conformal Regression , author=. 2024 , eprint=

  36. [36]

    CoRR , year=

    Adam: A Method for Stochastic Optimization , author=. CoRR , year=

  37. [37]

    Advances in neural information processing systems , volume=

    Denoising diffusion probabilistic models , author=. Advances in neural information processing systems , volume=