Architecture-Adaptive Uncertainty Fusion for Deepfake Detection

Mohammad Ghasemigol; Ritesh Sharma; Yuichi Motai

arxiv: 2606.06666 · v1 · pith:OP3X7L7Mnew · submitted 2026-06-04 · 💻 cs.CV

Architecture-Adaptive Uncertainty Fusion for Deepfake Detection

Ritesh Sharma , Mohammad Ghasemigol , Yuichi Motai This is my paper

Pith reviewed 2026-06-28 01:56 UTC · model grok-4.3

classification 💻 cs.CV

keywords deepfake detectionuncertainty quantificationdistribution shiftcorrelation optimizationfusion methodscomputer visionforensic deploymentprediction reliability

0 comments

The pith

Fusing five uncertainty sources by maximizing their correlation with prediction errors yields architecture-specific weights that retain more signal under distribution shift than random forest or nonlinear alternatives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Correlation-Optimized Fusion as a way to combine epistemic, aleatoric, calibration, conformal, and distributional uncertainty estimates without altering the underlying deepfake detector. Weights are found by solving a constrained optimization on the probability simplex that maximizes Pearson correlation between the fused score and observed errors. The procedure needs only seconds of computation per architecture. In matched train-test conditions nonlinear methods reach higher correlation, yet under distribution shift the linear correlation-tuned weights outperform random forest on nine of eleven tested models and degrade less severely. All approaches, including COF, lose nearly all correlation when evaluated on entirely new datasets.

Core claim

COF solves a constrained optimization on the probability simplex to find linear weights for five complementary uncertainty sources that maximize Pearson correlation with prediction errors. On FaceForensics++ this produces slightly lower in-domain correlation than nonlinear fusion, but on CelebDF the same weights outperform random forest in nine of eleven architectures and retain substantially more signal after distribution shift. Cross-dataset evaluation on CelebDF and DFDC shows that every method suffers roughly 90 percent degradation, with seven architectures exhibiting uncertainty inversion.

What carries the argument

Correlation-Optimized Fusion (COF): linear combination of five uncertainty sources whose weights are chosen by simplex-constrained maximization of Pearson correlation with observed prediction errors.

If this is right

COF applies to any existing detector with no model changes and only 42 seconds of weight search.
Under distribution shift the correlation-optimized linear fusion retains more predictive power than random forest in most architectures.
Nonlinear fusion methods lose more performance than linear COF when the data distribution changes.
Cross-dataset uncertainty estimates collapse to near-zero correlation for all tested methods.
COF identifies domain-adaptive uncertainty quantification as the remaining barrier to forensic use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The consistent outperformance of correlation-tuned linear weights under shift implies that architecture-specific linear fusion may be preferable to black-box alternatives when deployment stays within controlled distribution ranges.
The near-total loss of correlation on new datasets indicates that current uncertainty sources largely capture dataset-specific artifacts rather than intrinsic model reliability.
Monitoring whether the fused uncertainty remains positively correlated with errors on incoming batches could serve as a practical out-of-distribution detector.
The observed trade-off between matched-condition accuracy and shift robustness may appear in uncertainty quantification tasks outside deepfake detection.

Load-bearing premise

That weights chosen to maximize correlation on one evaluation distribution will still produce informative fused uncertainties when the input images come from a shifted distribution.

What would settle it

Compute COF weights on FaceForensics++ and test whether they still produce higher error correlation than random forest or equal-weight fusion on a fresh shifted dataset such as CelebDF; reversal of the reported advantage would falsify the transfer claim.

Figures

Figures reproduced from arXiv: 2606.06666 by Mohammad Ghasemigol, Ritesh Sharma, Yuichi Motai.

**Figure 2.** Figure 2: COF outperforms MC Dropout in all eleven architectures. (a) Absolute correlation: COF (blue) consistently exceeds MC Dropout (gray) across all [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: COF-5 learned fusion weights across eleven architectures. Three [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Cross-domain robustness reversal: COF vs. Random Forest under matched protocol. (a) In-domain (FF++): RF achieves marginally higher correlation than COF (mean ∆ρ = +0.025; 0.463 vs. 0.438, a 5.7% gap). (b) Cross-domain (CelebDF): COF outperforms RF in 9/11 architectures, with up to 7.3× higher correlation (MaxViT-B: ρ = 0.249 vs. 0.034). The simplex constraint that limits COF’s in-domain expressiveness act… view at source ↗

**Figure 5.** Figure 5: Uncertainty inversion analysis: in-domain vs. cross-domain correlation for each architecture. Points below the horizontal dashed line ( [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

read the original abstract

Deepfake detection systems achieve near-perfect accuracy on benchmarks, yet forensic deployment demands reliable prediction uncertainty. Existing uncertainty quantification (UQ) methods rely on single sources and ignore that optimal uncertainty composition varies across architectures. We propose Correlation-Optimized Fusion (COF), an architecture-adaptive framework that fuses five complementary uncertainty sources -- epistemic, aleatoric, calibration, conformal, and distributional -- by maximizing Pearson correlation between fused uncertainty scores and prediction errors via constrained optimization on the probability simplex. COF requires no model modifications and only 42 s of weight optimization, compared to 20--45 h for a 5-model Deep Ensemble. Evaluation across eleven architectures on FaceForensics++ reveals a fundamental trade-off: under matched train/evaluation protocol, non-linear methods achieve approximately 5--6% higher in-domain correlation than COF (mean r = 0.438), but this reverses under distribution shift. On CelebDF, COF outperforms Random Forest in 9/11 architectures with up to 7.3x higher correlation (MaxViT-B: r = 0.249 vs. 0.034); RF degrades 85% cross-domain to r = 0.071, whereas COF retains substantially more signal (74% drop to r = 0.116). Cross-dataset evaluation on CelebDF and DFDC reveals catastrophic generalization failure across all methods: in-domain correlations of 0.41--0.47 collapse to near-zero externally (mean degradation 90.7%), with seven of eleven architectures exhibiting uncertainty inversion. These results establish COF as a practical, interpretable framework for controlled-distribution deployment and identify domain-adaptive UQ as the central open challenge for forensic deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

COF fuses five uncertainty sources via correlation-maximizing weights but the reported shift gains likely come from fitting those weights to the same test errors used for evaluation.

read the letter

The main thing to know is that this paper gives a fast, architecture-adaptive way to combine epistemic, aleatoric, calibration, conformal and distributional uncertainty into one score for deepfake detectors, and it claims this fused score stays more informative than random forest when the test distribution shifts. It also documents that every method, including COF, loses almost all signal on truly new datasets.

What the work actually does is solve a constrained optimization over the probability simplex to pick fusion weights that maximize Pearson correlation with observed prediction errors. The method needs no model changes and finishes in under a minute. Across eleven architectures the results show a clean pattern: nonlinear baselines win on matched data, but COF degrades less under the CelebDF shift (74 % drop versus 85 % for random forest). The cross-dataset numbers on CelebDF and DFDC are the most useful part; they show mean correlation collapsing from 0.41-0.47 to near zero, with uncertainty inversion on seven architectures. That finding is honest and points to a real deployment barrier.

The soft spot is the optimization split. The abstract ties the weight search directly to correlation with the prediction errors on the data being evaluated. If that search uses the same CelebDF predictions and errors that later appear in the reported r values, the outperformance is expected by construction and does not demonstrate that the weights would be useful on fresh inputs without their labels. The stress-test note flags exactly this, and nothing in the provided text rules it out. Lack of error bars or significance tests on the correlations is a smaller but related gap.

This is for computer-vision researchers who already work on deepfake detection and want a lightweight UQ option for controlled settings. It deserves peer review so the optimization procedure and data partitioning can be checked in detail. The central observation about cross-dataset failure is worth having in the record.

Referee Report

2 major / 2 minor

Summary. The paper proposes Correlation-Optimized Fusion (COF), an architecture-adaptive method that fuses five uncertainty sources (epistemic, aleatoric, calibration, conformal, distributional) for deepfake detectors by solving a constrained optimization on the probability simplex to maximize Pearson correlation between the fused score and observed prediction errors. It reports that non-linear baselines outperform COF in-domain on FaceForensics++ (mean r ≈ 0.438 vs. lower for COF), but COF outperforms Random Forest on CelebDF in 9/11 architectures (e.g., MaxViT-B: r=0.249 vs. 0.034) and degrades less under shift (74% drop to r=0.116 vs. RF 85% drop to 0.071); all methods suffer near-total collapse (mean 90.7% degradation) on further cross-dataset evaluation to DFDC.

Significance. If the optimization split is out-of-sample, the work usefully demonstrates that linear fusion weights can be architecture-specific and more robust than non-linear alternatives under controlled distribution shift, while quantifying the severe generalization failure of current UQ methods; the 42-second optimization cost versus ensemble training time is a practical advantage. The identification of domain-adaptive UQ as the central open problem is well-supported by the reported cross-dataset collapse.

major comments (2)

[Abstract and method description] The central empirical claims on CelebDF (outperformance vs. RF and reduced degradation under shift) rest on the COF weight optimization procedure. The abstract and description state that weights are obtained by maximizing Pearson correlation with prediction errors, but provide no indication of whether this optimization uses a held-out validation partition, source-domain (FF++) data only, or the CelebDF evaluation predictions themselves. If performed on the same CelebDF errors used to report the r values, the superiority is expected by construction and does not establish that the weights remain informative for new inputs.
[Evaluation on CelebDF and cross-dataset results] Table or figure reporting the CelebDF and cross-domain results (e.g., the 9/11 outperformance and degradation percentages) must include the exact data partition used for weight optimization, the number of optimization runs, and any statistical tests or error bars on the reported Pearson r values; without this, the cross-domain retention claim (74% vs. 85% drop) cannot be evaluated for robustness.

minor comments (2)

[Method] Clarify whether the five uncertainty sources are computed from a single forward pass or require multiple inferences; the 42 s optimization time suggests the former, but this should be stated explicitly.
[Cross-dataset evaluation] The claim of 'catastrophic generalization failure' across all methods would be strengthened by reporting the raw in-domain and out-of-domain r values for each of the eleven architectures rather than only means and selected examples.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for highlighting the need for clarity on the optimization procedure and for suggesting improvements to our evaluation reporting. We address each point below and will incorporate the necessary revisions.

read point-by-point responses

Referee: [Abstract and method description] The central empirical claims on CelebDF (outperformance vs. RF and reduced degradation under shift) rest on the COF weight optimization procedure. The abstract and description state that weights are obtained by maximizing Pearson correlation with prediction errors, but provide no indication of whether this optimization uses a held-out validation partition, source-domain (FF++) data only, or the CelebDF evaluation predictions themselves. If performed on the same CelebDF errors used to report the r values, the superiority is expected by construction and does not establish that the weights remain informative for new inputs.

Authors: The weight optimization for each architecture is performed exclusively on the source-domain FaceForensics++ dataset using a held-out validation partition within FF++. The optimized weights are then transferred to the CelebDF evaluation set without any further fitting. This design choice is what allows us to demonstrate the robustness of the linear fusion under distribution shift. We will revise the abstract and method sections to explicitly describe this source-domain optimization procedure. revision: yes
Referee: [Evaluation on CelebDF and cross-dataset results] Table or figure reporting the CelebDF and cross-domain results (e.g., the 9/11 outperformance and degradation percentages) must include the exact data partition used for weight optimization, the number of optimization runs, and any statistical tests or error bars on the reported Pearson r values; without this, the cross-domain retention claim (74% vs. 85% drop) cannot be evaluated for robustness.

Authors: We agree that additional details on the experimental protocol are necessary for full reproducibility and assessment of robustness. In the revised manuscript, we will augment the relevant tables and figures with: (i) the exact data partitions used for optimization (source-domain validation split), (ii) the number of optimization runs performed, and (iii) error bars representing standard deviation across runs on the Pearson r values. We will also include a note on the absence of formal statistical tests, as the comparisons are primarily descriptive. revision: yes

Circularity Check

1 steps flagged

COF weights fitted by maximizing correlation to evaluation-set errors; reported superiority over RF is by construction on CelebDF

specific steps

fitted input called prediction [Abstract (COF definition)]
"fuses five complementary uncertainty sources -- epistemic, aleatoric, calibration, conformal, and distributional -- by maximizing Pearson correlation between fused uncertainty scores and prediction errors via constrained optimization on the probability simplex"

The optimization solves for weights that maximize the exact quantity (Pearson r to prediction errors) later reported as the method's performance on CelebDF. When performed on the same distribution used for the 9/11 architecture comparisons and shift results, the reported correlations are the fitted values by construction rather than out-of-sample predictions.

full rationale

The paper defines COF via constrained optimization that directly maximizes Pearson r between the linear fusion and observed prediction errors. The abstract and skeptic description give no evidence of a held-out optimization split separate from the CelebDF evaluation distribution on which the r values (0.249 vs 0.034, 74% drop) are reported. This reduces the central empirical claim to a fitted linear combination evaluated on its own training errors rather than an independent prediction. No other circular steps identified; the rest of the architecture comparison and cross-dataset degradation results stand on their own measurements.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method rests on fitted fusion weights and the assumption that the five uncertainty sources can be usefully combined; no independent evidence for the sources' complementarity is supplied beyond the optimization itself.

free parameters (1)

fusion weights on probability simplex
Five non-negative weights summing to one, chosen by constrained optimization to maximize Pearson correlation with prediction errors.

axioms (1)

domain assumption The five uncertainty sources (epistemic, aleatoric, calibration, conformal, distributional) are complementary and admit a linear combination that improves correlation with errors.
Invoked by the definition of COF as a fusion of these specific sources.

pith-pipeline@v0.9.1-grok · 5848 in / 1371 out tokens · 26537 ms · 2026-06-28T01:56:53.838688+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 7 canonical work pages · 3 internal anchors

[1]

Mesonet: a compact facial video forgery detection network,

D. Afchar, V . Nozick, J. Yamagishi, and I. Echizen, “Mesonet: a compact facial video forgery detection network,” inIEEE Int. Workshop Inf. Forensics Security, 2018, pp. 1–7. 13

2018
[2]

FaceForensics++: Learning to detect manipulated facial images,

A. R ¨ossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner, “FaceForensics++: Learning to detect manipulated facial images,” inIEEE Int. Conf. Comput. Vis., 2019, pp. 1–11

2019
[3]

Dropout as a Bayesian approximation: Representing model uncertainty in deep learning,

Y . Gal and Z. Ghahramani, “Dropout as a Bayesian approximation: Representing model uncertainty in deep learning,” inInt. Conf. Mach. Learn.PMLR, 2016, pp. 1050–1059

2016
[4]

On calibration of modern neural networks,

C. Guo, G. Pleiss, Y . Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” inInt. Conf. Mach. Learn.PMLR, 2017, pp. 1321–1330

2017
[5]

Thinking in frequency: Face forgery detection by mining frequency-aware clues,

Y . Qian, G. Yin, L. Sheng, Z. Chen, and J. Shao, “Thinking in frequency: Face forgery detection by mining frequency-aware clues,” inECCV. Springer, 2020, pp. 86–103

2020
[6]

Deepfake video detection using convolutional vision transformer,

D. Wodajo and S. Atnafu, “Deepfake video detection using convolutional vision transformer,” inarXiv preprint arXiv:2102.11126, 2021

work page arXiv 2021
[7]

Combining EfficientNet and vision transformers for video deepfake detection,

D. A. Coccomini, N. Messina, C. Gennaro, and F. Falchi, “Combining EfficientNet and vision transformers for video deepfake detection,” in Image Analysis and Processing–ICIAP 2022. Springer, 2022, pp. 219– 229

2022
[8]

EfficientNet: Rethinking model scaling for convo- lutional neural networks,

M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for convo- lutional neural networks,” inPMLR, vol. 97, 2019, pp. 6105–6114

2019
[9]

Mintime: Multi-identity size-invariant video deepfake detection,

D. A. Coccomini, G. K. Zilos, G. Amato, R. Caldelli, F. Falchi, S. Pa- padopoulos, and C. Gennaro, “Mintime: Multi-identity size-invariant video deepfake detection,”IEEE Trans. Inf. Forensics Security, vol. 19, pp. 6084–6096, 2024

2024
[10]

Bi-stream coteaching network for weakly-supervised deepfake localization in videos,

Z. Li, Z. Teng, B. Zhang, and J. Fan, “Bi-stream coteaching network for weakly-supervised deepfake localization in videos,”IEEE Trans. Inf. Forensics Security, vol. 20, pp. 1724–1738, 2025

2025
[11]

Ddl: Effective and comprehensible interpretation framework for diverse deepfake detectors,

Z. Sun, N. Ruan, and J. Li, “Ddl: Effective and comprehensible interpretation framework for diverse deepfake detectors,”IEEE Trans. Inf. Forensics Security, vol. 20, pp. 3601–3615, 2025

2025
[12]

Deepfake-Eval-2024: A Multi-Modal In-the-Wild Benchmark of Deepfakes Circulated in 2024

N. A. Chandraet al., “Deepfake-eval-2024: A multi-modal in-the- wild benchmark of deepfakes circulated in 2024,”arXiv preprint arXiv:2503.02857, 2025, available at: https://arxiv.org/abs/2503.02857

work page internal anchor Pith review Pith/arXiv arXiv 2024
[13]

Cnn- generated images are surprisingly easy to spot. . . for now,

S.-Y . Wang, O. Wang, R. Zhang, A. Owens, and A. A. Efros, “Cnn- generated images are surprisingly easy to spot. . . for now,”IEEE Conf. Comput. Vis. Pattern Recog., pp. 8692–8701, 2020

2020
[14]

Improving generalization of deepfake detectors by imposing gradient regularization,

W. Guan, W. Wang, J. Dong, and B. Peng, “Improving generalization of deepfake detectors by imposing gradient regularization,”IEEE Trans. Inf. Forensics Security, vol. 19, pp. 5345–5356, 2024

2024
[15]

Weight uncertainty in neural networks,

C. Blundell, J. Cornebise, K. Kavukcuoglu, and D. Wierstra, “Weight uncertainty in neural networks,” inInt. Conf. Mach. Learn.PMLR, 2015, pp. 1613–1622

2015
[16]

Simple and scalable predictive uncertainty estimation using deep ensembles,

B. Lakshminarayanan, A. Pritzel, and C. Blundell, “Simple and scalable predictive uncertainty estimation using deep ensembles,” inAdv. Neural Inf. Process. Syst., 2017, pp. 6402–6413

2017
[17]

Evidential deep learning to quantify classification uncertainty,

M. Sensoy, L. Kaplan, and M. Kandemir, “Evidential deep learning to quantify classification uncertainty,” inAdv. Neural Inf. Process. Syst. Curran Associates Inc., 2018, pp. 3183–3193

2018
[18]

Calibrating deep neural networks using focal loss,

J. Mukhoti, V . Kulharia, A. Sanyal, S. Golodetz, P. H. S. Torr, and P. K. Dokania, “Calibrating deep neural networks using focal loss,” inAdv. Neural Inf. Process. Syst.Curran Associates Inc., 2020

2020
[19]

A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification

A. N. Angelopoulos and S. Bates, “A gentle introduction to confor- mal prediction and distribution-free uncertainty quantification,”arXiv preprint arXiv:2107.07511, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[20]

Classification with valid and adaptive coverage,

Y . Romano, M. Sesia, and E. J. Cand `es, “Classification with valid and adaptive coverage,” inAdv. Neural Inf. Process. Syst.Curran Associates Inc., 2020

2020
[21]

A simple unified framework for detecting out-of-distribution samples and adversarial attacks,

K. Lee, K. Lee, H. Lee, and J. Shin, “A simple unified framework for detecting out-of-distribution samples and adversarial attacks,” in Proceedings of the 32nd International Conference on Neural Information Processing Systems. Curran Associates Inc., 2018, pp. 7167–7177

2018
[22]

Uncertainty-aware face embedding with contrastive learning for open-set evaluation,

K. Ahn, S. Lee, S. Han, C. Y . Low, and M. Cha, “Uncertainty-aware face embedding with contrastive learning for open-set evaluation,”IEEE Trans. Inf. Forensics Security, vol. 19, pp. 7176–7186, 2024

2024
[23]

Toward gener- alizable deepfake detection via forgery-aware audio–visual adaptation: A variational bayesian approach,

F. Nie, J. Ni, J. Zhang, B. Zhang, W. Zhang, and B. Li, “Toward gener- alizable deepfake detection via forgery-aware audio–visual adaptation: A variational bayesian approach,”IEEE Trans. Inf. Forensics Security, vol. 21, pp. 2933–2946, 2026

2026
[24]

Incremental pedestrian attribute recognition via dual uncertainty-aware pseudo-labeling,

D. Li, Z. Zhang, C. Shan, and L. Wang, “Incremental pedestrian attribute recognition via dual uncertainty-aware pseudo-labeling,”IEEE Trans. Inf. Forensics Security, vol. 18, pp. 2622–2636, 2023

2023
[25]

A baseline for detecting misclassified and out-of-distribution examples in neural networks,

D. Hendrycks and K. Gimpel, “A baseline for detecting misclassified and out-of-distribution examples in neural networks,” inInt. Conf. Learn. Represent., 2017

2017
[26]

Deep evidential regression,

A. Amini, W. Schwarting, A. Soleimany, and D. Rus, “Deep evidential regression,” inAdv. Neural Inf. Process. Syst.Curran Associates Inc., 2020

2020
[27]

Advances in deepfake detection algorithms: Exploring fusion techniques in single and multi-modal approach,

A. Kumar, D. Singh, R. Jain, D. K. Jain, C. Gan, and X. Zhao, “Advances in deepfake detection algorithms: Exploring fusion techniques in single and multi-modal approach,”Inf. Fusion, vol. 118, p. 102993, 2025

2025
[28]

What uncertainties do we need in Bayesian deep learning for computer vision?

A. Kendall and Y . Gal, “What uncertainties do we need in Bayesian deep learning for computer vision?” inAdv. Neural Inf. Process. Syst., 2017, pp. 5574–5584

2017
[29]

Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift,

Y . Ovadia, E. Fertig, J. Ren, Z. Nado, D. Sculley, S. Nowozin, J. V . Dillon, B. Lakshminarayanan, and J. Snoek, “Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift,” in Adv. Neural Inf. Process. Syst.Curran Associates Inc., 2019, pp. 13 991–14 002

2019
[30]

Do- mainforensics: Exposing face forgery across domains via bi-directional adaptation,

Q. Lv, Y . Li, J. Dong, S. Chen, H. Yu, H. Zhou, and S. Zhang, “Do- mainforensics: Exposing face forgery across domains via bi-directional adaptation,”IEEE Trans. Inf. Forensics Security, vol. 19, pp. 7275–7289, 2024

2024
[31]

Fine-grained open-set deepfake detection via unsupervised domain adaptation,

X. Zhou, H. Han, S. Shan, and X. Chen, “Fine-grained open-set deepfake detection via unsupervised domain adaptation,”IEEE Trans. Inf. Forensics Security, vol. 19, pp. 7536–7547, 2024

2024
[32]

Rademacher and gaussian complex- ities: risk bounds and structural results,

P. L. Bartlett and S. Mendelson, “Rademacher and gaussian complex- ities: risk bounds and structural results,”J. Mach. Learn. Res., vol. 3, no. null, p. 463–482, Mar. 2003

2003
[33]

Machine Learning , author =

S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan, “A theory of learning from different domains,”Mach. Learn., vol. 79, no. 1–2, p. 151–175, May 2010. [Online]. Available: https://doi.org/10.1007/s10994-009-5152-4

work page doi:10.1007/s10994-009-5152-4 2010
[34]

Celeb-DF: A large-scale challenging dataset for deepfake forensics,

Y . Li, X. Yang, P. Sun, H. Qi, and S. Lyu, “Celeb-DF: A large-scale challenging dataset for deepfake forensics,” inIEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 3207–3216

2020
[35]

The DeepFake Detection Challenge (DFDC) Dataset

B. Dolhanskyet al., “The DeepFake Detection Challenge (DFDC) dataset,” inarXiv preprint arXiv:2006.07397, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2006
[36]

Xception: Deep learning with depthwise separable convolu- tions,

F. Chollet, “Xception: Deep learning with depthwise separable convolu- tions,” inIEEE Conf. Comput. Vis. Pattern Recog., 2017, pp. 1251–1258

2017
[37]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inIEEE Conf. Comput. Vis. Pattern Recog., 2016, pp. 770– 778

2016
[38]

Efficientnetv2: Smaller models and faster training,

M. Tan and Q. Le, “Efficientnetv2: Smaller models and faster training,” inPMLR, 18–24 Jul 2021, pp. 10 096–10 106. [Online]. Available: https://proceedings.mlr.press/v139/tan21a.html

2021
[39]

An image is worth 16x16 words: Trans- formers for image recognition at scale,

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Trans- formers for image recognition at scale,” inInt. Conf. Learn. Represent. OpenReview.net, 2021

2021
[40]

Training data-efficient image transformers & distillation through attention,

H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayroldes, and H. Jegou, “Training data-efficient image transformers & distillation through attention,” inPMLR, vol. 139, 2021, pp. 10 347–10 357

2021
[41]

Swin transformer: Hierarchical vision transformer using shifted windows,

Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inIEEE Int. Conf. Comput. Vis., 2021, pp. 9992–10 002

2021
[42]

A convnet for the 2020s

Z. Liu, H. Mao, C.-Y . Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “ A ConvNet for the 2020s ,” inIEEE Conf. Comput. Vis. Pattern Recog.Los Alamitos, CA, USA: IEEE Computer Society, Jun. 2022, pp. 11 966–11 976. [Online]. Available: https: //doi.ieeecomputersociety.org/10.1109/CVPR52688.2022.01167

work page doi:10.1109/cvpr52688.2022.01167 2022
[43]

Maxvit: Multi-axis vision transformer,

Z. Tu, H. Talebi, H. Zhang, F. Yang, P. Milanfar, A. Bovik, and Y . Li, “Maxvit: Multi-axis vision transformer,” inComputer Vision – ECCV 2022, S. Avidan, G. Brostow, M. Ciss´e, G. M. Farinella, and T. Hassner, Eds. Cham: Springer Nature Switzerland, 2022, pp. 459–479

2022
[44]

Think twice before adaptation: improving adaptability of deepfake detection via online test-time adaptation,

H.-H. Nguyen-Le, V .-T. Tran, D.-T. Nguyen, and N.-A. Le-Khac, “Think twice before adaptation: improving adaptability of deepfake detection via online test-time adaptation,” inProceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, ser. IJCAI ’25,
[45]

Available: https://doi.org/10.24963/ijcai.2025/854

[Online]. Available: https://doi.org/10.24963/ijcai.2025/854

work page doi:10.24963/ijcai.2025/854 2025

[1] [1]

Mesonet: a compact facial video forgery detection network,

D. Afchar, V . Nozick, J. Yamagishi, and I. Echizen, “Mesonet: a compact facial video forgery detection network,” inIEEE Int. Workshop Inf. Forensics Security, 2018, pp. 1–7. 13

2018

[2] [2]

FaceForensics++: Learning to detect manipulated facial images,

A. R ¨ossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner, “FaceForensics++: Learning to detect manipulated facial images,” inIEEE Int. Conf. Comput. Vis., 2019, pp. 1–11

2019

[3] [3]

Dropout as a Bayesian approximation: Representing model uncertainty in deep learning,

Y . Gal and Z. Ghahramani, “Dropout as a Bayesian approximation: Representing model uncertainty in deep learning,” inInt. Conf. Mach. Learn.PMLR, 2016, pp. 1050–1059

2016

[4] [4]

On calibration of modern neural networks,

C. Guo, G. Pleiss, Y . Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” inInt. Conf. Mach. Learn.PMLR, 2017, pp. 1321–1330

2017

[5] [5]

Thinking in frequency: Face forgery detection by mining frequency-aware clues,

Y . Qian, G. Yin, L. Sheng, Z. Chen, and J. Shao, “Thinking in frequency: Face forgery detection by mining frequency-aware clues,” inECCV. Springer, 2020, pp. 86–103

2020

[6] [6]

Deepfake video detection using convolutional vision transformer,

D. Wodajo and S. Atnafu, “Deepfake video detection using convolutional vision transformer,” inarXiv preprint arXiv:2102.11126, 2021

work page arXiv 2021

[7] [7]

Combining EfficientNet and vision transformers for video deepfake detection,

D. A. Coccomini, N. Messina, C. Gennaro, and F. Falchi, “Combining EfficientNet and vision transformers for video deepfake detection,” in Image Analysis and Processing–ICIAP 2022. Springer, 2022, pp. 219– 229

2022

[8] [8]

EfficientNet: Rethinking model scaling for convo- lutional neural networks,

M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for convo- lutional neural networks,” inPMLR, vol. 97, 2019, pp. 6105–6114

2019

[9] [9]

Mintime: Multi-identity size-invariant video deepfake detection,

D. A. Coccomini, G. K. Zilos, G. Amato, R. Caldelli, F. Falchi, S. Pa- padopoulos, and C. Gennaro, “Mintime: Multi-identity size-invariant video deepfake detection,”IEEE Trans. Inf. Forensics Security, vol. 19, pp. 6084–6096, 2024

2024

[10] [10]

Bi-stream coteaching network for weakly-supervised deepfake localization in videos,

Z. Li, Z. Teng, B. Zhang, and J. Fan, “Bi-stream coteaching network for weakly-supervised deepfake localization in videos,”IEEE Trans. Inf. Forensics Security, vol. 20, pp. 1724–1738, 2025

2025

[11] [11]

Ddl: Effective and comprehensible interpretation framework for diverse deepfake detectors,

Z. Sun, N. Ruan, and J. Li, “Ddl: Effective and comprehensible interpretation framework for diverse deepfake detectors,”IEEE Trans. Inf. Forensics Security, vol. 20, pp. 3601–3615, 2025

2025

[12] [12]

Deepfake-Eval-2024: A Multi-Modal In-the-Wild Benchmark of Deepfakes Circulated in 2024

N. A. Chandraet al., “Deepfake-eval-2024: A multi-modal in-the- wild benchmark of deepfakes circulated in 2024,”arXiv preprint arXiv:2503.02857, 2025, available at: https://arxiv.org/abs/2503.02857

work page internal anchor Pith review Pith/arXiv arXiv 2024

[13] [13]

Cnn- generated images are surprisingly easy to spot. . . for now,

S.-Y . Wang, O. Wang, R. Zhang, A. Owens, and A. A. Efros, “Cnn- generated images are surprisingly easy to spot. . . for now,”IEEE Conf. Comput. Vis. Pattern Recog., pp. 8692–8701, 2020

2020

[14] [14]

Improving generalization of deepfake detectors by imposing gradient regularization,

W. Guan, W. Wang, J. Dong, and B. Peng, “Improving generalization of deepfake detectors by imposing gradient regularization,”IEEE Trans. Inf. Forensics Security, vol. 19, pp. 5345–5356, 2024

2024

[15] [15]

Weight uncertainty in neural networks,

C. Blundell, J. Cornebise, K. Kavukcuoglu, and D. Wierstra, “Weight uncertainty in neural networks,” inInt. Conf. Mach. Learn.PMLR, 2015, pp. 1613–1622

2015

[16] [16]

Simple and scalable predictive uncertainty estimation using deep ensembles,

B. Lakshminarayanan, A. Pritzel, and C. Blundell, “Simple and scalable predictive uncertainty estimation using deep ensembles,” inAdv. Neural Inf. Process. Syst., 2017, pp. 6402–6413

2017

[17] [17]

Evidential deep learning to quantify classification uncertainty,

M. Sensoy, L. Kaplan, and M. Kandemir, “Evidential deep learning to quantify classification uncertainty,” inAdv. Neural Inf. Process. Syst. Curran Associates Inc., 2018, pp. 3183–3193

2018

[18] [18]

Calibrating deep neural networks using focal loss,

J. Mukhoti, V . Kulharia, A. Sanyal, S. Golodetz, P. H. S. Torr, and P. K. Dokania, “Calibrating deep neural networks using focal loss,” inAdv. Neural Inf. Process. Syst.Curran Associates Inc., 2020

2020

[19] [19]

A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification

A. N. Angelopoulos and S. Bates, “A gentle introduction to confor- mal prediction and distribution-free uncertainty quantification,”arXiv preprint arXiv:2107.07511, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[20] [20]

Classification with valid and adaptive coverage,

Y . Romano, M. Sesia, and E. J. Cand `es, “Classification with valid and adaptive coverage,” inAdv. Neural Inf. Process. Syst.Curran Associates Inc., 2020

2020

[21] [21]

A simple unified framework for detecting out-of-distribution samples and adversarial attacks,

K. Lee, K. Lee, H. Lee, and J. Shin, “A simple unified framework for detecting out-of-distribution samples and adversarial attacks,” in Proceedings of the 32nd International Conference on Neural Information Processing Systems. Curran Associates Inc., 2018, pp. 7167–7177

2018

[22] [22]

Uncertainty-aware face embedding with contrastive learning for open-set evaluation,

K. Ahn, S. Lee, S. Han, C. Y . Low, and M. Cha, “Uncertainty-aware face embedding with contrastive learning for open-set evaluation,”IEEE Trans. Inf. Forensics Security, vol. 19, pp. 7176–7186, 2024

2024

[23] [23]

Toward gener- alizable deepfake detection via forgery-aware audio–visual adaptation: A variational bayesian approach,

F. Nie, J. Ni, J. Zhang, B. Zhang, W. Zhang, and B. Li, “Toward gener- alizable deepfake detection via forgery-aware audio–visual adaptation: A variational bayesian approach,”IEEE Trans. Inf. Forensics Security, vol. 21, pp. 2933–2946, 2026

2026

[24] [24]

Incremental pedestrian attribute recognition via dual uncertainty-aware pseudo-labeling,

D. Li, Z. Zhang, C. Shan, and L. Wang, “Incremental pedestrian attribute recognition via dual uncertainty-aware pseudo-labeling,”IEEE Trans. Inf. Forensics Security, vol. 18, pp. 2622–2636, 2023

2023

[25] [25]

A baseline for detecting misclassified and out-of-distribution examples in neural networks,

D. Hendrycks and K. Gimpel, “A baseline for detecting misclassified and out-of-distribution examples in neural networks,” inInt. Conf. Learn. Represent., 2017

2017

[26] [26]

Deep evidential regression,

A. Amini, W. Schwarting, A. Soleimany, and D. Rus, “Deep evidential regression,” inAdv. Neural Inf. Process. Syst.Curran Associates Inc., 2020

2020

[27] [27]

Advances in deepfake detection algorithms: Exploring fusion techniques in single and multi-modal approach,

A. Kumar, D. Singh, R. Jain, D. K. Jain, C. Gan, and X. Zhao, “Advances in deepfake detection algorithms: Exploring fusion techniques in single and multi-modal approach,”Inf. Fusion, vol. 118, p. 102993, 2025

2025

[28] [28]

What uncertainties do we need in Bayesian deep learning for computer vision?

A. Kendall and Y . Gal, “What uncertainties do we need in Bayesian deep learning for computer vision?” inAdv. Neural Inf. Process. Syst., 2017, pp. 5574–5584

2017

[29] [29]

Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift,

Y . Ovadia, E. Fertig, J. Ren, Z. Nado, D. Sculley, S. Nowozin, J. V . Dillon, B. Lakshminarayanan, and J. Snoek, “Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift,” in Adv. Neural Inf. Process. Syst.Curran Associates Inc., 2019, pp. 13 991–14 002

2019

[30] [30]

Do- mainforensics: Exposing face forgery across domains via bi-directional adaptation,

Q. Lv, Y . Li, J. Dong, S. Chen, H. Yu, H. Zhou, and S. Zhang, “Do- mainforensics: Exposing face forgery across domains via bi-directional adaptation,”IEEE Trans. Inf. Forensics Security, vol. 19, pp. 7275–7289, 2024

2024

[31] [31]

Fine-grained open-set deepfake detection via unsupervised domain adaptation,

X. Zhou, H. Han, S. Shan, and X. Chen, “Fine-grained open-set deepfake detection via unsupervised domain adaptation,”IEEE Trans. Inf. Forensics Security, vol. 19, pp. 7536–7547, 2024

2024

[32] [32]

Rademacher and gaussian complex- ities: risk bounds and structural results,

P. L. Bartlett and S. Mendelson, “Rademacher and gaussian complex- ities: risk bounds and structural results,”J. Mach. Learn. Res., vol. 3, no. null, p. 463–482, Mar. 2003

2003

[33] [33]

Machine Learning , author =

S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan, “A theory of learning from different domains,”Mach. Learn., vol. 79, no. 1–2, p. 151–175, May 2010. [Online]. Available: https://doi.org/10.1007/s10994-009-5152-4

work page doi:10.1007/s10994-009-5152-4 2010

[34] [34]

Celeb-DF: A large-scale challenging dataset for deepfake forensics,

Y . Li, X. Yang, P. Sun, H. Qi, and S. Lyu, “Celeb-DF: A large-scale challenging dataset for deepfake forensics,” inIEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 3207–3216

2020

[35] [35]

The DeepFake Detection Challenge (DFDC) Dataset

B. Dolhanskyet al., “The DeepFake Detection Challenge (DFDC) dataset,” inarXiv preprint arXiv:2006.07397, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2006

[36] [36]

Xception: Deep learning with depthwise separable convolu- tions,

F. Chollet, “Xception: Deep learning with depthwise separable convolu- tions,” inIEEE Conf. Comput. Vis. Pattern Recog., 2017, pp. 1251–1258

2017

[37] [37]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inIEEE Conf. Comput. Vis. Pattern Recog., 2016, pp. 770– 778

2016

[38] [38]

Efficientnetv2: Smaller models and faster training,

M. Tan and Q. Le, “Efficientnetv2: Smaller models and faster training,” inPMLR, 18–24 Jul 2021, pp. 10 096–10 106. [Online]. Available: https://proceedings.mlr.press/v139/tan21a.html

2021

[39] [39]

An image is worth 16x16 words: Trans- formers for image recognition at scale,

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Trans- formers for image recognition at scale,” inInt. Conf. Learn. Represent. OpenReview.net, 2021

2021

[40] [40]

Training data-efficient image transformers & distillation through attention,

H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayroldes, and H. Jegou, “Training data-efficient image transformers & distillation through attention,” inPMLR, vol. 139, 2021, pp. 10 347–10 357

2021

[41] [41]

Swin transformer: Hierarchical vision transformer using shifted windows,

Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inIEEE Int. Conf. Comput. Vis., 2021, pp. 9992–10 002

2021

[42] [42]

A convnet for the 2020s

Z. Liu, H. Mao, C.-Y . Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “ A ConvNet for the 2020s ,” inIEEE Conf. Comput. Vis. Pattern Recog.Los Alamitos, CA, USA: IEEE Computer Society, Jun. 2022, pp. 11 966–11 976. [Online]. Available: https: //doi.ieeecomputersociety.org/10.1109/CVPR52688.2022.01167

work page doi:10.1109/cvpr52688.2022.01167 2022

[43] [43]

Maxvit: Multi-axis vision transformer,

Z. Tu, H. Talebi, H. Zhang, F. Yang, P. Milanfar, A. Bovik, and Y . Li, “Maxvit: Multi-axis vision transformer,” inComputer Vision – ECCV 2022, S. Avidan, G. Brostow, M. Ciss´e, G. M. Farinella, and T. Hassner, Eds. Cham: Springer Nature Switzerland, 2022, pp. 459–479

2022

[44] [44]

Think twice before adaptation: improving adaptability of deepfake detection via online test-time adaptation,

H.-H. Nguyen-Le, V .-T. Tran, D.-T. Nguyen, and N.-A. Le-Khac, “Think twice before adaptation: improving adaptability of deepfake detection via online test-time adaptation,” inProceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, ser. IJCAI ’25,

[45] [45]

Available: https://doi.org/10.24963/ijcai.2025/854

[Online]. Available: https://doi.org/10.24963/ijcai.2025/854

work page doi:10.24963/ijcai.2025/854 2025