arxiv: 2605.08574 · v1 · submitted 2026-05-09 · 💻 cs.CV · cs.LG

Recognition: no theorem link

Post-hoc Selective Classification for Reliable Synthetic Image Detection

Kaixiang Zheng , Jacob H. Seidman

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:27 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords synthetic image detectionselective classificationcovariate shiftpost-hoc confidencedeepfake detectionrisk-coverage curvelayer-wise aggregation

0 comments

The pith

ReSIDe aggregates layer-wise confidence scores to let synthetic image detectors abstain reliably under common shifts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Synthetic image detectors perform well on training data but degrade sharply when inputs undergo typical changes such as compression, blur, or noise. The paper introduces ReSIDe, a post-hoc selective classification framework that lets a detector refuse to decide on low-confidence cases instead of risking errors. It first extends standard logit-based scoring to every internal layer by matching features to class centroids, then uses preference optimization to combine the per-layer scores while minimizing an upper bound on the area under the risk-coverage curve. Experiments across multiple detectors and shift types show consistent gains, including AURC reductions reaching 69.55 percent. The method needs no retraining of the original network.

Core claim

ReSIDe generalizes the notion of logits to an SID's intermediate layers from a centroid matching perspective, extending the use of logit-based CSFs to any layer of an SID. It then introduces a preference optimization algorithm that aggregates confidence scores extracted from different layers to a final confidence estimate by minimizing an upper bound of the area under the risk-coverage curve.

What carries the argument

Centroid-matching generalization of logits to intermediate layers, aggregated by preference optimization that minimizes an AURC upper bound.

If this is right

Existing logit-based confidence functions can be boosted for selective classification without changing the detector weights.
Post-hoc deployment is possible on any already-trained synthetic image detector.
Risk from erroneous decisions drops measurably under the tested shifts.
Abstention decisions become practical for real-world use where shifts are common.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same layer-aggregation idea could be tested on other detection or classification tasks that suffer from distribution shift.
Selective systems of this form might be combined with human review pipelines to limit the spread of convincing synthetic media.
If layer preferences prove stable across many shifts, the method could reduce the need for frequent model retraining in deployed detectors.

Load-bearing premise

The preference optimization that combines layer-wise scores will continue to produce good rankings on covariate shifts never seen during optimization.

What would settle it

Applying the trained aggregator to a new, previously unseen covariate shift and measuring no AURC reduction or an increase relative to single-layer baselines.

Figures

Figures reproduced from arXiv: 2605.08574 by Jacob H. Seidman, Kaixiang Zheng.

**Figure 2.** Figure 2: (a)-(f): RC curves for MSP evaluated using ResNet50 on datasets with different covariate [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison of confidence score distributions between the baseline and ReSIDe. [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Convergence curves of the ReSIDe loss and AURC. [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗

read the original abstract

As synthetic images become increasingly realistic, reliable synthetic image detection techniques are of pressing need to prevent their misuse. Despite satisfactory in-distribution performance, deep neural network-based synthetic image detectors (SIDs) lack reliability in deployment and often fail in the presence of common covariate shifts, resulting in poor detection accuracy. To avoid the risk caused by potential errors, we adopt a selective classification (SC) strategy by allowing SIDs to abstain from making low confidence predictions. For practicality, we focus on post-hoc methods which perform confidence estimation on a given SID without retraining. However, we show that conventional logit-based confidence score functions (CSFs) exhibit pathological behavior under covariate shifts, leading to SC performance close to or even worse than random guessing. To address this, we propose a simple yet effective SC framework for Reliable Synthetic Image Detection (ReSIDe). First, we generalize the notion of logits to an SID's intermediate layers from a centroid matching perspective, extending the use of logit-based CSFs to any layer of an SID. Then, we introduce a preference optimization algorithm that aggregates confidence scores extracted from different layers to a final confidence estimate by minimizing an upper bound of the area under the risk-coverage curve (AURC). Extensive experimental results show that ReSIDe significantly boosts the SC performance of various logit-based CSFs under common covariate shifts, achieving up to 69.55% AURC reduction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ReSIDe adapts selective classification to synthetic image detectors post-hoc via layer-wise centroid logits and AURC-bound preference optimization, delivering reported gains but with open questions on shift generalization.

read the letter

The main point is that this work takes selective classification and makes it work better for synthetic image detectors under covariate shifts without any retraining. They generalize logits to intermediate layers using centroid matching, then run a preference optimization that combines scores from those layers by minimizing an upper bound on AURC. The experiments report clear improvements, up to 69% AURC reduction across several base detectors and common shifts, which is useful because plain logit methods can drop to near-random performance once the data shifts.

Referee Report

2 major / 1 minor

Summary. The paper claims that standard logit-based confidence scoring functions for synthetic image detectors exhibit pathological behavior under covariate shifts, and proposes ReSIDe: a post-hoc selective classification framework that (i) generalizes logits to intermediate layers via centroid matching and (ii) aggregates layer-wise scores by a preference optimization procedure that minimizes an upper bound on the area under the risk-coverage curve (AURC), reporting up to 69.55% AURC reduction on various detectors and shifts.

Significance. If the aggregation step proves robust to truly unseen shifts, the work would offer a practical, training-free way to improve reliability of synthetic-image detectors in deployment; the post-hoc framing and use of a theoretically motivated AURC bound are positive features that distinguish it from retraining-based alternatives.

major comments (2)

[Abstract] Abstract: the preference optimization step that learns aggregation weights necessarily uses data; the manuscript must clarify whether this data is strictly disjoint from the covariate-shift evaluation sets and whether the procedure requires per-shift retuning, because any overlap would undermine the reported generalization of the 69.55% AURC gains.
[Abstract] The central claim that ReSIDe 'significantly boosts the SC performance of various logit-based CSFs under common covariate shifts' rests on the layer-aggregation step; without an ablation that isolates the contribution of the learned weights versus simple averaging or fixed selection, it is impossible to determine whether the gains are driven by the optimization or by the centroid-matching generalization alone.

minor comments (1)

The abstract states large AURC reductions but provides no information on data splits, number of layers used, or statistical significance of the improvements; these details are required for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and constructive suggestions. We address each of the major comments below, providing clarifications and committing to revisions that will strengthen the paper.

read point-by-point responses

Referee: [Abstract] Abstract: the preference optimization step that learns aggregation weights necessarily uses data; the manuscript must clarify whether this data is strictly disjoint from the covariate-shift evaluation sets and whether the procedure requires per-shift retuning, because any overlap would undermine the reported generalization of the 69.55% AURC gains.

Authors: We thank the referee for this important observation. The preference optimization is performed on a validation set consisting of in-distribution samples that is completely disjoint from the covariate shift test sets used for evaluation. Furthermore, the learned aggregation weights are fixed after optimization on this validation set and do not require retuning for each shift. We will revise the abstract and the experimental setup section to explicitly state these details to ensure the generalization claims are unambiguous. revision: yes
Referee: [Abstract] The central claim that ReSIDe 'significantly boosts the SC performance of various logit-based CSFs under common covariate shifts' rests on the layer-aggregation step; without an ablation that isolates the contribution of the learned weights versus simple averaging or fixed selection, it is impossible to determine whether the gains are driven by the optimization or by the centroid-matching generalization alone.

Authors: We agree that isolating the contribution of the preference optimization is necessary to substantiate the central claim. In the revised version, we will add an ablation study that compares (i) the full ReSIDe with optimized aggregation, (ii) centroid-matching with simple averaging across layers, and (iii) centroid-matching using only the final layer or best single layer. This will demonstrate the added value of the optimization procedure beyond the layer generalization. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper proposes generalizing logits to intermediate layers via centroid matching and a preference optimization step that aggregates layer-wise scores by minimizing an upper bound on AURC. These are presented as new methodological steps with empirical validation on covariate shifts. No equations or self-citations are provided that reduce any claimed result to its inputs by construction, no fitted parameters are renamed as independent predictions, and no uniqueness theorems or ansatzes are smuggled via self-citation. The derivation chain remains self-contained as a post-hoc framework whose performance claims rest on reported experiments rather than tautological reductions.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on standard assumptions that intermediate-layer features remain informative under covariate shift and that an upper bound on AURC can be minimized without overfitting to the evaluation distribution.

free parameters (1)

layer selection and aggregation weights
Choice of which intermediate layers to extract generalized logits from and how to weight them during preference optimization.

axioms (1)

domain assumption Covariate shifts preserve enough structure in intermediate features for centroid matching to yield useful confidence scores.
Invoked when extending logit-based CSFs to arbitrary layers.

pith-pipeline@v0.9.0 · 5547 in / 1208 out tokens · 33784 ms · 2026-05-12T01:27:39.286356+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages

[1]

Journal of Machine Learning Research , volume=

Optimal strategies for reject option classifiers , author=. Journal of Machine Learning Research , volume=

work page
[2]

Large Scale

Andrew Brock and Jeff Donahue and Karen Simonyan , booktitle=. Large Scale. 2019 , url=

work page 2019
[3]

International Conference on Machine Learning , pages=

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models , author=. International Conference on Machine Learning , pages=. 2022 , organization=

work page 2022
[4]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Vector quantized diffusion model for text-to-image synthesis , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[5]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

High-resolution image synthesis with latent diffusion models , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[6]

Advances in neural information processing systems , volume=

Diffusion models beat gans on image synthesis , author=. Advances in neural information processing systems , volume=

work page
[7]

2022 , howpublished =

Midjourney , author=. 2022 , howpublished =

work page 2022
[8]

2022 , howpublished =

Wukong , author=. 2022 , howpublished =

work page 2022
[9]

2025 , howpublished =

Sora 2 , author=. 2025 , howpublished =

work page 2025
[10]

IEEE Transactions on information theory , volume=

On optimum recognition error and reject tradeoff , author=. IEEE Transactions on information theory , volume=. 1970 , publisher=

work page 1970
[11]

Advances in neural information processing systems , volume=

Selective classification for deep neural networks , author=. Advances in neural information processing systems , volume=

work page
[12]

[1990] Proceedings

Handwritten zip code recognition with multilayer networks , author=. [1990] Proceedings. 10th International Conference on Pattern Recognition , volume=. 1990 , organization=

work page 1990
[13]

Advances in neural information processing systems , volume=

Genimage: A million-scale benchmark for detecting ai-generated image , author=. Advances in neural information processing systems , volume=

work page
[14]

Journal of Imaging , volume=

Reliable out-of-distribution recognition of synthetic images , author=. Journal of Imaging , volume=. 2024 , publisher=

work page 2024
[15]

Yumlembam et al

Detection of AI generated images using combined uncertainty measures and particle swarm optimised rejection mechanism: R. Yumlembam et al. , author=. Scientific Reports , year=

work page
[16]

Transactions on Machine Learning Research , issn=

Selective Classification Under Distribution Shifts , author=. Transactions on Machine Learning Research , issn=. 2024 , url=

work page 2024
[17]

European conference on computer vision , pages=

Zero-shot detection of ai-generated images , author=. European conference on computer vision , pages=. 2024 , organization=

work page 2024
[18]

International conference on machine learning , pages=

Leveraging frequency analysis for deep fake image recognition , author=. International conference on machine learning , pages=. 2020 , organization=

work page 2020
[19]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Aeroblade: Training-free detection of latent diffusion images using autoencoder reconstruction error , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[20]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Ucf: Uncovering common features for generalizable deepfake detection , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

work page
[21]

for now , author=

CNN-generated images are surprisingly easy to spot... for now , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[22]

Proceedings of the 2023 ACM SIGSAC conference on computer and communications security , pages=

De-fake: Detection and attribution of fake images generated by text-to-image generation models , author=. Proceedings of the 2023 ACM SIGSAC conference on computer and communications security , pages=

work page 2023
[23]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Towards universal fake image detectors that generalize across generative models , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[24]

European Conference on computer vision , pages=

Leveraging representations from intermediate encoder-blocks for synthetic image detection , author=. European Conference on computer vision , pages=. 2024 , organization=

work page 2024
[25]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Faceforensics++: Learning to detect manipulated facial images , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

work page
[26]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

Forensic self-descriptions are all you need for zero-shot detection, open-set source attribution, and clustering of ai-generated images , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

work page
[27]

arXiv preprint arXiv:1805.08206 , year=

Bias-reduced uncertainty estimation for deep neural classifiers , author=. arXiv preprint arXiv:1805.08206 , year=

work page arXiv
[28]

International Conference on Learning Representations , year=

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , author=. International Conference on Learning Representations , year=

work page
[29]

The 40th Conference on Uncertainty in Artificial Intelligence , year=

How to Fix a Broken Confidence Estimator: Evaluating Post-hoc Methods for Selective Classification with Deep Neural Networks , author=. The 40th Conference on Uncertainty in Artificial Intelligence , year=

work page
[30]

International conference on machine learning , pages=

On calibration of modern neural networks , author=. International conference on machine learning , pages=. 2017 , organization=

work page 2017
[31]

Advances in neural information processing systems , volume=

Simple and scalable predictive uncertainty estimation using deep ensembles , author=. Advances in neural information processing systems , volume=

work page
[32]

International conference on machine learning , pages=

Bayesian uncertainty estimation for batch normalized deep networks , author=. International conference on machine learning , pages=. 2018 , organization=

work page 2018
[33]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Spectral Bayesian uncertainty for image super-resolution , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[34]

international conference on machine learning , pages=

Dropout as a bayesian approximation: Representing model uncertainty in deep learning , author=. international conference on machine learning , pages=. 2016 , organization=

work page 2016
[35]

Advances in Neural Information Processing Systems , volume=

Deep ensembles work, but are they necessary? , author=. Advances in Neural Information Processing Systems , volume=

work page
[36]

arXiv preprint arXiv:2207.07517 , year=

On the usefulness of deep ensemble diversity for out-of-distribution detection , author=. arXiv preprint arXiv:2207.07517 , year=

work page arXiv
[37]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Post-hoc uncertainty learning using a dirichlet meta-model , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[38]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

Confidence estimation via auxiliary models , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2021 , publisher=

work page 2021
[39]

arXiv preprint arXiv:2107.06217 , year=

What classifiers know what they don't? , author=. arXiv preprint arXiv:2107.06217 , year=

work page arXiv
[40]

International Conference on Machine Learning , year=

Scaling out-of-distribution detection for real-world settings , author=. International Conference on Machine Learning , year=

work page
[41]

Advances in Neural Information Processing Systems , volume=

Doctor: A simple method for detecting misclassification errors , author=. Advances in Neural Information Processing Systems , volume=

work page
[42]

International conference on machine learning , pages=

Selectivenet: A deep neural network with an integrated reject option , author=. International conference on machine learning , pages=. 2019 , organization=

work page 2019
[43]

Advances in neural information processing systems , volume=

Self-adaptive training: beyond empirical risk minimization , author=. Advances in neural information processing systems , volume=

work page
[44]

Advances in Neural Information Processing Systems , volume=

Deep gamblers: Learning to abstain with portfolio theory , author=. Advances in Neural Information Processing Systems , volume=

work page
[45]

International Conference on Machine Learning , pages=

A Novel Characterization of the Population Area Under the Risk Coverage Curve (AURC) and Rates of Finite Sample Estimators , author=. International Conference on Machine Learning , pages=. 2025 , organization=

work page 2025
[46]

Advances in neural information processing systems , volume=

When does label smoothing help? , author=. Advances in neural information processing systems , volume=

work page
[47]

Proceedings of the National Academy of Sciences , volume=

Prevalence of neural collapse during the terminal phase of deep learning training , author=. Proceedings of the National Academy of Sciences , volume=. 2020 , publisher=

work page 2020
[48]

International conference on machine learning , pages=

Learning transferable visual models from natural language supervision , author=. International conference on machine learning , pages=. 2021 , organization=

work page 2021
[49]

Advances in neural information processing systems , volume=

Prototypical networks for few-shot learning , author=. Advances in neural information processing systems , volume=

work page
[50]

the method of paired comparisons , author=

Rank analysis of incomplete block designs: I. the method of paired comparisons , author=. Biometrika , volume=. 1952 , publisher=

work page 1952
[51]

Proceedings of the 22nd international conference on Machine learning , pages=

Learning to rank using gradient descent , author=. Proceedings of the 22nd international conference on Machine learning , pages=

work page
[52]

Advances in neural information processing systems , volume=

Direct preference optimization: Your language model is secretly a reward model , author=. Advances in neural information processing systems , volume=

work page
[53]

Advances in neural information processing systems , volume=

Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=

work page
[54]

arXiv preprint arXiv:2509.25509 , year=

Can Molecular Foundation Models Know What They Don't Know? A Simple Remedy with Preference Optimization , author=. arXiv preprint arXiv:2509.25509 , year=

work page arXiv
[55]

Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Li, Kai and Fei-Fei, Li , booktitle=. Image. 2009 , organization=

work page 2009
[56]

Goodfellow and Jonathon Shlens and Christian Szegedy , title =

Ian J. Goodfellow and Jonathon Shlens and Christian Szegedy , title =. 3rd International Conference on Learning Representations,

work page
[57]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Deep residual learning for image recognition , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page
[58]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Swin transformer: Hierarchical vision transformer using shifted windows , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

work page
[59]

Machine learning , volume=

Concept decompositions for large sparse text data using clustering , author=. Machine learning , volume=. 2001 , publisher=

work page 2001
[60]

2020 IEEE 7th international conference on data science and advanced analytics (DSAA) , pages=

Cluster quality analysis using silhouette score , author=. 2020 IEEE 7th international conference on data science and advanced analytics (DSAA) , pages=. 2020 , organization=

work page 2020
[61]

NeurIPS 2023 Workshop on Distribution Shifts: New Frontiers with Foundation Models , year=

On selective classification under distribution shift , author=. NeurIPS 2023 Workshop on Distribution Shifts: New Frontiers with Foundation Models , year=

work page 2023
[62]

Meta-Radiology , volume=

A review of uncertainty estimation and its application in medical imaging , author=. Meta-Radiology , volume=. 2023 , publisher=

work page 2023
[63]

NIPS Workshop MLITS , year=

Relaxed softmax: Efficient confidence auto-calibration for safe pedestrian detection , author=. NIPS Workshop MLITS , year=

work page
[64]

Sociological Methods & Research , volume=

Fairness in criminal justice risk assessments: The state of the art , author=. Sociological Methods & Research , volume=. 2021 , publisher=

work page 2021