arxiv: 2605.08561 · v1 · submitted 2026-05-08 · 📊 stat.ML · cs.LG

Recognition: 3 theorem links

· Lean Theorem

CONTRA: Conformal Prediction Region via Normalizing Flow Transformation

Aixin Tan, Jian Huang, Zhenhan Fang

Pith reviewed 2026-05-12 00:57 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords conformal predictionnormalizing flowsprediction regionsdensity estimationmulti-dimensional outputsnonconformity scoresresidual learning

0 comments

The pith

Normalizing flows let conformal prediction produce sharp multi-dimensional regions by using latent-space distance as the nonconformity score.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CONTRA, which trains a normalizing flow to map output data into a latent space where distance from the center defines a nonconformity score. Because the flow is invertible, a simple ball in latent space transforms back into a compact, density-aligned region in the original output space. This region still satisfies the conformal coverage guarantee, and the same idea works by fitting a flow to the residuals of any other predictor. The result is a practical way to obtain reliable, non-rectangular, non-elliptical prediction regions for vector-valued outputs.

Core claim

CONTRA utilizes the latent spaces of normalizing flows to define nonconformity scores based on distances from the center. This allows for the mapping of high-density regions in latent space to sharp prediction regions in the output space, surpassing traditional hyperrectangular or elliptical conformal regions. For scenarios where other predictive models are favored, a simple normalizing flow trained on residuals extends the same guarantee to any base model while preserving exact coverage probability.

What carries the argument

The invertible mapping of a normalizing flow that converts Euclidean distance from the latent-space origin into a nonconformity score, so that a fixed-radius ball in latent space becomes a high-density conformal region after the inverse transform.

If this is right

Multi-dimensional conformal sets can take shapes that follow the actual data density instead of being forced into boxes or ellipses.
The exact finite-sample coverage guarantee of conformal prediction carries over unchanged through the flow transform.
Any existing point predictor can be equipped with a conformal region simply by fitting a flow to its residuals.
The same latent-distance construction supplies both prediction regions and a form of conditional density estimate.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the flow approximates the true conditional distribution closely, the resulting regions should approach the smallest possible volume among all sets that meet the coverage target.
The approach could be applied to other invertible maps, such as certain diffeomorphisms or autoregressive transforms, without changing the core argument.
In high-dimensional regression tasks the method may reduce the volume of uncertainty sets enough to improve downstream decision-making that depends on those sets.

Load-bearing premise

The normalizing flow must be trained well enough that the latent-space distance remains a valid nonconformity measure whose coverage guarantee survives the invertible mapping back to data space.

What would settle it

Train the flow on a held-out calibration set, form the conformal regions at target coverage 0.9, and measure the empirical fraction of test points that fall inside; if this fraction falls below 0.9 by more than sampling error, the coverage claim fails.

Figures

Figures reproduced from arXiv: 2605.08561 by Aixin Tan, Jian Huang, Zhenhan Fang.

**Figure 1.** Figure 1: NYC Taxi data. Prediction regions of drop-off location given a pickup coordinate (blue pin). (a) shows a 90% high-density region estimated via kernel density estimates (KDE) based on dropoff locations from the 200 nearest pickups to the blue pin. While there is no guarantee of coverage, the shape of this region offers an informal visual reference to help users assess conformal regions. (b)-(f) show various… view at source ↗

**Figure 2.** Figure 2: Prediction regions of a two-dimensional outcome given a specific x value from the test set. Colored lines show the boundary of various prediction regions. In the case of PCP, K = 3000 disks are shown. Orange points show a random sample of size 2000 from the true conditional distribution of y given x. Beyond numerical metrics, we visualize and compare these conformal regions [PITH_FULL_IMAGE:figures/full_f… view at source ↗

**Figure 3.** Figure 3: Prediction regions of a two-dimensional outcome given a specific x value from the test set. Colored lines show the boundary of various prediction regions. In the case of PCP, K = 3000 disks are shown. Orange points show a random sample of size 2000 from the true conditional distribution of y given x. E THE IMPACT OF UNDERFITTING AND OVERFITTING ON CONTRA Using a Normalizing Flow model that is either too co… view at source ↗

**Figure 4.** Figure 4: CONTRA prediction regions and latent z for the calibration set under three different NF models, for drop-off location given a specific pickup coordinate (blue pin) for the NYC taxi data. Data size equals 4800, with a 75%-25% training-calibration split. (a) and (d): An underfitting NF with 2 coupling layers and 16 hidden units per layer, trained for 50 epochs in 4 seconds. (b) and (e): An overfitting NF wit… view at source ↗

**Figure 5.** Figure 5: The impact of data size on prediction regions of drop-off location given a specific pickup coordinate (blue pin) for the NYC taxi data. The top row shows results of our proposed CONTRA method; the bottom row shows results of ST-DQR based on a diffusion model. Data sizes used for each column are 300, 1200, and 4800, respectively, with a 75%-25% training-calibration split. We examined the performance of CONT… view at source ↗

**Figure 6.** Figure 6: Comparing CONTRA to ResCONTRA in terms of the latent variable, z, of the calibration set on two different datasets. Plots of z that closely resemble the standard bivariate Gaussian distribution result in smaller calibrated radii, r.9, and better conformal prediction regions. Five conformal prediction methods are implemented: (a) CONTRA with a 10-layer NF; (b) and (c) ResCONTRA with SVR followed by 10- and … view at source ↗

**Figure 7.** Figure 7: CONTRA conformal regions of the output variable y given some fixed x value for five setups with coverage levels 50%, 70% and 90%, respectively. The orange points are random samples of size 2000 from each of the true conditional distribution of y given x. We can see the CONTRA regions of various levels properly capture the high-density regions in each case. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

read the original abstract

Density estimation and reliable prediction regions for outputs are crucial in supervised and unsupervised learning. While conformal prediction effectively generates coverage-guaranteed regions, it struggles with multi-dimensional outputs due to reliance on one-dimensional nonconformity scores. To address this, we introduce CONTRA: CONformal prediction region via normalizing flow TRAnsformation. CONTRA utilizes the latent spaces of normalizing flows to define nonconformity scores based on distances from the center. This allows for the mapping of high-density regions in latent space to sharp prediction regions in the output space, surpassing traditional hyperrectangular or elliptical conformal regions. Further, for scenarios where other predictive models are favored over flow-based models, we extend CONTRA to enhance any such model with a reliable prediction region by training a simple normalizing flow on the residuals. We demonstrate that both CONTRA and its extension maintain guaranteed coverage probability and outperform existing methods in generating accurate prediction regions across various datasets. We conclude that CONTRA is an effective tool for (conditional) density estimation, addressing the under-explored challenge of delivering multi-dimensional prediction regions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CONTRA uses latent-space distances from a normalizing flow as the nonconformity score to produce flexible multi-dimensional conformal regions, and the coverage guarantee survives intact.

read the letter

The paper's main idea is straightforward: fit a normalizing flow, score calibration points by their Euclidean distance from the origin in latent space, take the usual conformal quantile, and push the resulting ball back through the inverse flow to get a prediction region in output space. For cases where you already have a good point predictor, they train a separate flow on residuals. This is new relative to the usual residual or density-based scores in the conformal literature, and it directly targets the problem of getting non-box, non-ellipse regions in multiple dimensions without losing the marginal coverage property. The validity argument is the standard one: once the flow is fixed, the latent scores remain exchangeable, so the rank-based threshold works regardless of how accurately the flow approximates the density. Flow quality only controls region volume and shape, not the guarantee itself. That part checks out cleanly. On the empirical side they show smaller regions than standard baselines across datasets while coverage holds, which is the expected outcome if the flow captures structure. The residual extension is practical and worth having. The soft spots are modest. Sharpness still depends on the flow being decent; a bad flow just gives larger regions, but no validity loss. The abstract is light on derivation details, so a referee would want an explicit exchangeability argument written out and checks that flow training does not introduce dependence with the calibration scores. Computational cost for high-dimensional flows is not discussed much. This is useful for anyone who needs calibrated vector-valued predictions with non-standard shapes, such as in robotics or multivariate time series. It is not a foundational advance, but the construction is clean enough that a serious referee should see it. I would bring it to a reading group to walk through the implementation and the figures. It deserves peer review.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes CONTRA, a conformal prediction method that trains a normalizing flow and defines nonconformity scores as Euclidean distances from the origin in the resulting latent space. The (1-α) quantile of these scores on a calibration set is mapped back through the inverse flow to produce a prediction region in the original output space. An extension trains a flow on residuals of an arbitrary base predictor to obtain similar regions. The authors claim that both variants retain the standard marginal coverage guarantee of split conformal prediction while yielding sharper, density-adapted regions than hyperrectangular or elliptical baselines, and they report empirical improvements on several datasets.

Significance. If the coverage argument holds, the approach supplies a practical route to flexible, non-parametric multi-dimensional prediction regions that inherit finite-sample validity from conformal prediction while adapting shape to the learned density. This is a useful addition to the conformal toolkit for settings where rigid region shapes are overly conservative. The residual-flow extension further increases applicability to existing point predictors.

minor comments (3)

§2 (Method): the statement that the flow is 'fixed before scoring' should be made explicit, including whether its parameters are estimated exclusively on the training split and never updated with calibration data, to make the exchangeability argument immediate to the reader.
Experiments section: the tables reporting region volumes or coverage should include the number of random seeds or cross-validation folds and any standard errors, given that flow training introduces additional stochasticity.
Notation: define the symbols for the latent variable z, the flow parameters θ, and the nonconformity score s clearly at first use; the current presentation mixes data-space and latent-space quantities without a dedicated notation table.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our work and for recommending minor revision. The report correctly captures the core idea of CONTRA and its residual-flow extension, as well as the claimed marginal coverage guarantee inherited from split conformal prediction. We provide point-by-point responses to the major comments below.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central derivation applies standard split conformal prediction to nonconformity scores defined as Euclidean distances from the origin in the latent space of a pre-trained normalizing flow (or a flow on residuals). The coverage guarantee follows directly from the exchangeability of calibration and test scores under the usual rank-based argument, which holds regardless of how well the flow approximates the data density; the flow only determines region shape and volume after the threshold is obtained. No equation reduces a claimed prediction to a fitted parameter by construction, no uniqueness theorem is imported from self-citation, and the invertible mapping preserves the marginal coverage property without introducing data-dependent circularity between flow training and calibration scores.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the assumption that a trained normalizing flow produces a latent representation in which Euclidean distance from the origin is a valid nonconformity score that preserves conformal validity after inversion. No explicit free parameters, axioms, or invented entities are stated in the abstract.

pith-pipeline@v0.9.0 · 5480 in / 1187 out tokens · 17241 ms · 2026-05-12T00:57:40.467181+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
CONTRA utilizes the latent spaces of normalizing flows to define nonconformity scores based on distances from the center... ˆE = {z : ||z|| ≤ r_{1-α}} ... ˆC(x_{n+1}) = t_θ̂(Ê, x_{n+1})
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear
the conformal ball of size (1-α) ... mapping of high-density regions in latent space
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat ≃ Nat recovery unclear
ResCONTRA ... training a simple normalizing flow on the residuals

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 3 internal anchors

[1]

Scaling Learning Algorithms Towards

Bengio, Yoshua and LeCun, Yann , booktitle =. Scaling Learning Algorithms Towards

work page
[2]

and Osindero, Simon and Teh, Yee Whye , journal =

Hinton, Geoffrey E. and Osindero, Simon and Teh, Yee Whye , journal =. A Fast Learning Algorithm for Deep Belief Nets , volume =

work page
[3]

2016 , publisher=

Deep learning , author=. 2016 , publisher=

work page 2016
[4]

Density estimation using Real NVP

Density estimation using real nvp , author=. arXiv preprint arXiv:1605.08803 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Journal of the American Statistical Association , volume=

Distribution-free predictive inference for regression , author=. Journal of the American Statistical Association , volume=. 2018 , publisher=

work page 2018
[6]

Advances in neural information processing systems , volume=

Conformalized quantile regression , author=. Advances in neural information processing systems , volume=

work page
[7]

Conformal and Probabilistic Prediction and Applications , pages=

Conformal uncertainty sets for robust optimization , author=. Conformal and Probabilistic Prediction and Applications , pages=. 2021 , organization=

work page 2021
[8]

Annals of Mathematics and Artificial Intelligence , volume=

A conformal prediction approach to explore functional data , author=. Annals of Mathematics and Artificial Intelligence , volume=. 2015 , publisher=

work page 2015
[9]

2005 , publisher=

Algorithmic learning in a random world , author=. 2005 , publisher=

work page 2005
[10]

Machine Learning: ECML 2002: 13th European Conference on Machine Learning Helsinki, Finland, August 19--23, 2002 Proceedings 13 , pages=

Inductive confidence machines for regression , author=. Machine Learning: ECML 2002: 13th European Conference on Machine Learning Helsinki, Finland, August 19--23, 2002 Proceedings 13 , pages=. 2002 , organization=

work page 2002
[11]

A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification

A gentle introduction to conformal prediction and distribution-free uncertainty quantification , author=. arXiv preprint arXiv:2107.07511 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[12]

Conformal and Probabilistic Prediction with Applications , pages=

Ellipsoidal conformal inference for multi-target regression , author=. Conformal and Probabilistic Prediction with Applications , pages=. 2022 , organization=

work page 2022
[13]

ArXiv , year=

Learning Likelihoods with Conditional Normalizing Flows , author=. ArXiv , year=

work page
[14]

Journal of Machine Learning Research , year =

George Papamakarios and Eric Nalisnick and Danilo Jimenez Rezende and Shakir Mohamed and Balaji Lakshminarayanan , title =. Journal of Machine Learning Research , year =

work page
[15]

NICE: Non-linear Independent Components Estimation

Nice: Non-linear independent components estimation , author=. arXiv preprint arXiv:1410.8516 , year=

work page internal anchor Pith review arXiv
[16]

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics , pages =

Flexible distribution-free conditional predictive bands using density estimators , author =. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics , pages =. 2020 , editor =

work page 2020
[17]

arXiv preprint arXiv:2206.06584 , year=

Probabilistic conformal prediction using conditional random samples , author=. arXiv preprint arXiv:2206.06584 , year=

work page arXiv
[18]

Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze , volume=

Teoria statistica delle classi e calcolo delle probabilita , author=. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze , volume=

work page
[19]

Journal of the American statistical association , volume=

Multiple comparisons among means , author=. Journal of the American statistical association , volume=. 1961 , publisher=

work page 1961
[20]

Journal of Machine Learning Research , volume=

Cd-split and hpd-split: Efficient conformal regions in high dimensions , author=. Journal of Machine Learning Research , volume=

work page
[21]

Multi-target regression via input space expansion: treating targets as inputs , volume=

Spyromitros-Xioufis, Eleftherios and Tsoumakas, Grigorios and Groves, William and Vlahavas, Ioannis , year=. Multi-target regression via input space expansion: treating targets as inputs , volume=. Machine Learning , publisher=. doi:10.1007/s10994-016-5546-z , number=

work page doi:10.1007/s10994-016-5546-z
[22]

Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools , journal =

Athanasios Tsanas and Angeliki Xifara , keywords =. Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools , journal =. 2012 , issn =. doi:https://doi.org/10.1016/j.enbuild.2012.03.003 , url =

work page doi:10.1016/j.enbuild.2012.03.003 2012
[23]

2013 , howpublished =

Rana,Prashant , title =. 2013 , howpublished =

work page 2013
[24]

International Conference on Machine Learning , pages=

Neural autoregressive flows , author=. International Conference on Machine Learning , pages=. 2018 , organization=

work page 2018
[25]

Advances in Neural Information Processing Systems , volume=

Residual flows for invertible generative modeling , author=. Advances in Neural Information Processing Systems , volume=

work page
[26]

Advances in neural information processing systems , volume=

Glow: Generative flow with invertible 1x1 convolutions , author=. Advances in neural information processing systems , volume=

work page
[27]

Prentic Hall of India Private Limited, New delhi , year=

Topology , author=. Prentic Hall of India Private Limited, New delhi , year=

work page
[28]

Proceedings of the National Academy of Sciences , volume=

Distributional conformal prediction , author=. Proceedings of the National Academy of Sciences , volume=. 2021 , publisher=

work page 2021
[29]

and Pospisil, T

Dalmasso, N. and Pospisil, T. and Lee, A.B. and Izbicki, R. and Freeman, P.E. and Malz, A.I. , year=. Conditional density estimation tools in python and R with applications to photometric redshifts and likelihood-free cosmological inference , volume=. doi:10.1016/j.ascom.2019.100362 , journal=

work page doi:10.1016/j.ascom.2019.100362 2019
[30]

Approximating conditional distribution functions using dimension reduction , volume=

Hall, Peter and Yao, Qiwei , year=. Approximating conditional distribution functions using dimension reduction , volume=. The Annals of Statistics , publisher=. doi:10.1214/009053604000001282 , number=

work page doi:10.1214/009053604000001282
[31]

arXiv preprint arXiv:1401.3632 , year=

Bayesian conditional density filtering , author=. arXiv preprint arXiv:1401.3632 , year=

work page arXiv
[32]

Sbornik: Mathematics , volume=

Triangular transformations of measures , author=. Sbornik: Mathematics , volume=. 2005 , publisher=

work page 2005
[33]

Advances in neural information processing systems , volume=

Neural spline flows , author=. Advances in neural information processing systems , volume=

work page
[34]

Journal of Machine Learning Research , volume=

Calibrated multiple-output quantile regression with representation learning , author=. Journal of Machine Learning Research , volume=

work page
[35]

2024 , eprint=

Normalizing Flows for Conformal Regression , author=. 2024 , eprint=

work page 2024
[36]

CoRR , year=

Adam: A Method for Stochastic Optimization , author=. CoRR , year=

work page
[37]

Advances in neural information processing systems , volume=

Denoising diffusion probabilistic models , author=. Advances in neural information processing systems , volume=

work page