Structure-Adaptive Conformal Inference for Large-Scale Out-of-Distribution Testing
Pith reviewed 2026-06-29 16:16 UTC · model grok-4.3
The pith
SCQ and P-TAMS deliver finite-sample FDR control for structured out-of-distribution testing by replacing joint exchangeability with pairwise exchangeability.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SCQ and P-TAMS together form a unified framework under pairwise exchangeability that provides finite-sample error-rate control, improved power, and enhanced interpretability for structured OOD testing.
What carries the argument
The structure-adaptive conformal q-value (SCQ), which combines individual test evidence with structural patterns, and the pseudo-score-guided transductive automated model selection (P-TAMS) that adapts model choice to the same setting.
If this is right
- The false discovery rate remains controlled at any pre-specified level for any finite sample size.
- Power increases when structural information is present compared with methods that discard it.
- P-TAMS selects among a library of candidate models while retaining the same finite-sample guarantees.
- The resulting q-values admit direct interpretation as adjusted significance indices that incorporate structure.
- The framework applies to both simulated and real data across diverse dependence patterns.
Where Pith is reading between the lines
- The same pairwise-exchangeability device might extend conformal procedures to graph-structured or network data without new theoretical machinery.
- Interpretability gains could support regulatory review of OOD flags in safety-critical systems.
- The approach suggests a route to conformal testing under weaker dependence conditions than full exchangeability in other multiple-testing problems.
- Adaptive model selection inside the procedure may reduce the need for separate validation sets in streaming monitoring applications.
Load-bearing premise
The observations satisfy pairwise exchangeability rather than requiring full joint exchangeability.
What would settle it
A dataset generated so that pairwise exchangeability holds yet the realized false discovery rate of SCQ exceeds the nominal level across repeated trials, or a dataset where pairwise exchangeability is violated yet the method still controls the rate.
Figures
read the original abstract
This paper addresses structured out-of-distribution (OOD) testing in high-stakes machine learning applications. Traditional conformal methods rely on joint exchangeability, making it difficult to incorporate auxiliary information such as spatiotemporal or grouping structures. To overcome this limitation, we propose the structure-adaptive conformal q-value (SCQ), a significance index that integrates individual test evidence with structural patterns. We also develop pseudo-score-guided transductive automated model selection (P-TAMS), which adapts conformalized model selection to structured OOD testing across a toolbox of candidate models. Together, SCQ and P-TAMS form a unified framework under pairwise exchangeability, providing finite-sample error-rate control, improved power, and enhanced interpretability. Experiments on simulated and real data demonstrate that the proposed approach controls the false discovery rate and performs well across diverse settings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes the structure-adaptive conformal q-value (SCQ) and pseudo-score-guided transductive automated model selection (P-TAMS) to enable structured out-of-distribution testing. It claims that, under pairwise exchangeability, the methods form a unified framework delivering finite-sample error-rate control (specifically FDR control), improved power, and enhanced interpretability relative to standard conformal procedures that require joint exchangeability, with supporting experiments on simulated and real data.
Significance. If the finite-sample FDR control is rigorously established, the contribution would be significant for conformal inference in high-stakes applications where auxiliary structure (e.g., spatiotemporal or grouping patterns) is available. The approach explicitly leverages pairwise exchangeability to integrate structural information without invalidating guarantees, which addresses a recognized limitation of joint-exchangeability-based methods.
major comments (2)
- [Abstract] Abstract: The central claim of finite-sample error-rate control under pairwise exchangeability is asserted without any derivation, theorem statement, or proof sketch showing how SCQ and P-TAMS maintain FDR control; this is load-bearing for the main contribution and must be supplied explicitly.
- [Abstract] Abstract: The manuscript invokes pairwise exchangeability (rather than joint exchangeability) as the key modeling choice enabling structure incorporation, but provides no proposition or argument establishing that this weaker condition is sufficient for the claimed finite-sample guarantees; verification of this step is required.
minor comments (1)
- The abstract would be clearer if it briefly indicated the concrete forms of structural information (e.g., spatiotemporal or grouping) that SCQ is designed to exploit.
Simulated Author's Rebuttal
We thank the referee for their constructive comments emphasizing the need for explicit theoretical support in the abstract. We address each point below and agree to revise the abstract accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim of finite-sample error-rate control under pairwise exchangeability is asserted without any derivation, theorem statement, or proof sketch showing how SCQ and P-TAMS maintain FDR control; this is load-bearing for the main contribution and must be supplied explicitly.
Authors: We agree that the abstract should explicitly reference the supporting result. We will revise the abstract to state: 'Under pairwise exchangeability, SCQ and P-TAMS achieve finite-sample FDR control, as established in Theorem 3.1.' The full derivation and proof sketch appear in Section 3 and Appendix A of the manuscript. revision: yes
-
Referee: [Abstract] Abstract: The manuscript invokes pairwise exchangeability (rather than joint exchangeability) as the key modeling choice enabling structure incorporation, but provides no proposition or argument establishing that this weaker condition is sufficient for the claimed finite-sample guarantees; verification of this step is required.
Authors: We acknowledge the request for explicit verification. We will update the abstract to include: 'We establish that pairwise exchangeability suffices for the finite-sample guarantees (Proposition 2.1).' The argument showing sufficiency of this weaker condition is derived in Section 2. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper proposes SCQ and P-TAMS as new methods under the explicit modeling assumption of pairwise exchangeability to achieve finite-sample FDR control for structured OOD testing. No equations, fitting procedures, or self-citations are visible in the provided text that reduce any claimed prediction or result to an input by construction. The central claims rest on the stated exchangeability condition and standard conformal inference extensions rather than self-referential definitions or fitted quantities renamed as predictions. The framework is presented as self-contained with external empirical validation on simulated and real data.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
arXiv preprint arXiv:2411.17983 , year=
T. Bai and Y. Jin. Optimized conformal selection: Powerful selective inference after conformity score optimization.arXiv preprint arXiv:2411.17983,
-
[2]
R. F. Barber and E. J. Cand` es. Controlling the false discovery rate via knockoffs.The Annals of Statistics, 43(5):2055 – 2085,
2055
-
[3]
doi: 10.1214/08-EJS180. T. Cai, W. Sun, and W. Wang. Covariate-assisted ranking and screening for large-scale two- sample inference.Journal of the Royal Statistical Society Series B: Statistical Methodology, 81 (2):187–234,
- [4]
-
[5]
arXiv preprint arXiv:2102.12967 , year=
M. Haroush, T. Frostig, R. Heller, and D. Soudry. A statistical framework for efficient out of distribution detection in deep neural networks.arXiv preprint arXiv:2102.12967,
-
[6]
Leung and W
D. Leung and W. Sun. Zap: Z z-value adaptive procedures for false discovery rate control with side information.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 84(5):1886–1946,
1946
- [7]
-
[8]
C. G. Magnani, M. Sesia, and A. Solari. Collective outlier detection and enumeration with conformalized closed testing.arXiv preprint arXiv:2308.05534,
work page internal anchor Pith review Pith/arXiv arXiv
-
[9]
doi: https://doi.org/10.1016/j.sigpro
ISSN 0165-1684. doi: https://doi.org/10.1016/j.sigpro. 2013.12.026. Z. Ren and E. Cand` es. Knockoffs with side information.The Annals of Applied Statistics, 17(2): 1152–1174,
- [10]
-
[11]
doi: 10.1049/cp:19950597. V. Vovk, A. Gammerman, and C. Saunders. Machine-learning applications of algorithmic ran- domness. InProceedings of the Sixteenth International Conference on Machine Learning, ICML ’99, page 444–453, San Francisco, CA, USA,
- [12]
-
[13]
doi: 10.1007/ s11263-024-02117-4
ISSN 0920-5691. doi: 10.1007/ s11263-024-02117-4. Y. Yang and A. K. Kuchibhotla. Finite-sample efficient conformal prediction.arXiv preprint arXiv:2104.13871, 5,
-
[14]
Structure-Adaptive Conformal Inference for Large-Scale Out-of-Distribution Testing
Z. Zhao and W. Sun. A conformalized empirical bayes method for multiple testing with side information.arXiv preprint arXiv:2502.19667, 2025a. Z. Zhao and W. Sun. False discovery rate control for structured multiple testing: Asymmetric rules and conformal q-values.Journal of the American Statistical Association, 120(550):805– 817, 2025b. 24 Online Suppleme...
-
[15]
Thus, the three rejection sets coincide:R ebh =R bc =R scq
Otherwise, we will have e(ˆk) = 0, and this contradicts with ˆke(ˆk) m ≥1 αby definition. Thus, the three rejection sets coincide:R ebh =R bc =R scq. We establish the FDR control of the SCQ procedure by proving the validity ofR ebh. By the e-BH theory of Wang and Ramdas (2022), this validity holds provided that the e-values defined in (A.4) are generalize...
2022
-
[16]
Step (b): The asymptotic equivalence between ˆtandt ∗.Sincet ∗is nonrandom, (A.8) implies ˆQc(t∗)−¯Q(t∗) = ˆQc(t∗) p →0
= 1 m ∑ j∈Dtest P( ˜Vj≤t,˜Vj <V j,Yj = 1)→0, finishing the proof of (A.8). Step (b): The asymptotic equivalence between ˆtandt ∗.Sincet ∗is nonrandom, (A.8) implies ˆQc(t∗)−¯Q(t∗) = ˆQc(t∗) p →0. Thus we have P(t∗≤ˆt)≥P(ˆQc(t∗) = 0)→1, m→∞.(A.9) Note that the BC thresholdτ(13) can be equally expressed as τ= max { t∈{ν(i)}m i=1 : ˆQ(t)≤0 } . 5 Letτ=ν(k). B...
2019
-
[17]
∑ j∈Dtest I(Vj≤ˆt, Vj < ˜Vj) ⏐⏐⏐⏐⏐ ] =E [ A|t∗= ˆt ] P(t∗= ˆt) +E [ A|t∗̸=ˆt ] P(t∗̸=ˆt) ≤P(t∗̸=ˆt) =o(1), whereAdenotes the term ∑ j∈Dtest I(Vj≤t∗,Vj< ˜Vj,Yj=0)∑ j∈Dtest I(Vj≤t∗,Vj< ˜Vj) − ∑ j∈Dtest I(Vj≤ˆt,Vj< ˜Vj,Yj=0)∑ j∈Dtest I(Vj≤ˆt,Vj< ˜Vj) for simplic- ity. Finally, combining the above arguments, we conclude that limm→∞FDRδτδτδτ=α, which completes...
2015
-
[18]
) /2 = l N+2− l(l+1) 2(N+1)(N+2) 1 2−1 N+2 ≥l+ 1 N+ 1 . As a result, we have Lemma 4 proved by noticing that, for anyt∈[ 1 N+ 1 ,1) (the case wheret= 1 holds trivially), supposetlies in [ l N+ 1 , l+ 1 N+ 1 ) for some 1≤l≤N, we obtain the desired inequality as follows P(pi≤t|pi <˜pi,Yi = 0,Si)≥P(pi≤l N+ 1 |pi <˜pi,Yi = 0,Si)≥l+ 1 N+ 1 ≥t. A.11 Further det...
2004
-
[19]
Proof.Under the model in Claim 1, the desired convergence is established by showing that 1 m m∑ j=1 P{˜Vj <V j|Yj = 1}≤1 m m∑ j=1 P{˜Vj≤Vj|Yj = 1} 14 = 1 m m∑ j=1 P { f0( ˜Xj) f( ˜Xj) ≤f0(Xj) f(Xj) ⏐⏐⏐⏐⏐Yj = 1 } = 1 m m∑ j=1 P { ϕσ0( ˜Xj−µ0) ϕσ0( ˜Xj−µm)≤ϕσ0(Xj−µ0) ϕσ0(Xj−µm) ⏐⏐⏐⏐⏐Yj = 1 } = 1 m m∑ j=1 P { (µ0−µm) ˜Xj≤(µ0−µm)Xj ⏐⏐⏐Y j = 1 } = Φ ( −|µ0−µm|...
2022
-
[20]
Remark 11.As highlighted by Cai et al
Sparsity estimation.We construct a swap-invariant estimator: ˆπ(Sj) = 1− ∑m i=1ωij [I(pi >λ) +I(˜pi >λ)] 2(1−λ)( ∑m i=1ωij) ,(A.19) where (pi,˜pi) are conformal p-values andλ∈(0,1) is a screening threshold, with the default choice ofλ= 0.1. Remark 11.As highlighted by Cai et al. (2022), the selection ofλentails a bias–variance trade-off. Hence, we develop...
2022
-
[21]
positive sample
Bias correction.Under standard regularity conditions, ˆπ(Sj) converges in prob- ability toπ(Sj)/2 asm→∞, introducing a multiplicative bias. We therefore recalibrate the estimator and construct the final data-driven weights as w(Sj) = ˆπ(Sj) 1/2−ˆπ(Sj), j∈D test.(A.20) By construction, the estimator satisfy the swap-invariance condition (4) as desired. Fin...
2023
-
[22]
Other settings remain the same as in Section 5.3. We implement the P-TAMS algorithm with four candidate classifiers: two OCCs, including the SVM- 21 sigmoid (OneClassSVM with sigmoid kernel), SVM-poly (OneClassSVM with polynomial kernel), and two BICs, including the KNN (K-nearest neighbors) and MLP (multi-layer perceptron). D.2 Additional details for the...
2016
-
[23]
Two one-class classifiers (OneClassSVM and LOF) and two binary classifiers (QDA and RF) are employed
We first utilize the same experimental setup and candidate classifiers in Figure 4, which exhibits the effectiveness of P-TAMS selecting among four SCQ variants (see Appendix D.1 for detailed experimental setup), to explore the performance of ICP-AMS selecting among 24 SCQ-OCC-SVM SCQ-OCC-LOF SCQ-BIC-QDA SCQ-BIC-RF cfBH-OCC-SVM cfBH-OCC-LOF cfBH-BIC-QDA c...
2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.