arxiv: 2604.16987 · v1 · submitted 2026-04-18 · 💻 cs.CV

Recognition: unknown

DVAR: Adversarial Multi-Agent Debate for Video Authenticity Detection

Feifei Shao, Hehe Fan, Hongyuan Qi, Jun Xiao, Ming Li

Pith reviewed 2026-05-10 07:45 UTC · model grok-4.3

classification 💻 cs.CV

keywords video authenticity detectionmulti-agent debatetraining-free frameworkMDL adjudicationgeneralization to unseen generatorsdeepfake forensicsexplanatory costadversarial reasoning

0 comments

The pith

A training-free debate between generative and natural agents detects fake videos competitively with supervised methods and generalizes better to new generators.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes DVAR to reformulate video authenticity detection as an iterative debate between two agents: one advancing a generative explanation for observed anomalies and the other a natural mechanism. These agents cross-examine each other's claims over multiple rounds until the evidence forces convergence, after which minimum description length selects the lower-cost explanation. A dynamic knowledge base supplies heuristics about generative failure modes to guide the process. This matters because video generators evolve rapidly and render supervised detectors trained on past examples obsolete, while the debate format yields explicit reasoning traces rather than opaque scores.

Core claim

DVAR is a training-free framework that casts video authenticity assessment as a multi-agent forensic debate in which a Generative Hypothesis Agent and a Natural Mechanism Agent iteratively defend their accounts against abnormal evidence; the Minimum Description Length principle adjudicates by comparing the explanatory cost of each path, augmented by heuristics from GenVideoKB, yielding performance competitive with supervised state-of-the-art detectors and markedly stronger generalization to unseen generative architectures.

What carries the argument

The adversarial cross-examination loop between the Generative Hypothesis Agent and Natural Mechanism Agent, resolved by computing Explanatory Cost under the Minimum Description Length (MDL) framework and informed by GenVideoKB generative-boundary heuristics.

If this is right

Detection performance remains stable when entirely new video generators appear, without retraining on fresh labeled data.
The system produces inspectable reasoning traces that reveal which pieces of evidence drove the final decision.
The method operates in a zero-shot regime for novel architectures while matching the accuracy of fully supervised alternatives on seen generators.
The framework converts an opaque classification task into a transparent logical stress-test of competing explanations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same debate structure could be tested on generated images or audio to check whether cross-examination generalizes beyond video.
Maintaining an up-to-date knowledge base of generator failure modes becomes the primary maintenance task for sustained performance.
One could measure whether adding a third agent representing a hybrid explanation improves convergence speed or accuracy.
The explicit cost comparison may expose systematic weaknesses in current generative models that could guide future generator design.

Load-bearing premise

Iterative cross-examination between the two agents plus MDL adjudication will reliably converge on the correct authenticity label without training data or fine-tuning, provided GenVideoKB supplies accurate and current heuristics on generative boundaries.

What would settle it

Evaluating DVAR on videos produced by a generative architecture absent from GenVideoKB and checking whether its accuracy falls below supervised baselines trained only on older generators.

Figures

Figures reproduced from arXiv: 2604.16987 by Feifei Shao, Hehe Fan, Hongyuan Qi, Jun Xiao, Ming Li.

**Figure 2.** Figure 2: Overview of the DVAR framework. The pipeline consists of four stages: (1) Evidence Discovery, where semantic scenes [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Case study illustrating the reasoning-driven detection process of DVAR. For each identified trace, the system adjudicates [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

The rapid evolution of video generation technologies poses a significant challenge to media forensics, as conventional detection methods often fail to generalize beyond their training distributions. To address this, we propose DVAR (Debate-based Video Authenticity Reasoning), a training-free framework that reformulates video detection as a structured multi-agent forensic reasoning process. Moving beyond the paradigm of pattern matching, DVAR orchestrates a competition between a Generative Hypothesis Agent and a Natural Mechanism Agent. Through iterative rounds of cross-examination, these agents defend their respective explanations against abnormal evidence, driving a logical convergence where the truth emerges from rigorous stress-testing. To adjudicate these conflicting claims, we apply Occam's Razor through the Minimum Description Length (MDL) framework, defining an Explanatory Cost to quantify the "logical burden" of each reasoning path. Furthermore, we integrate GenVideoKB, a dynamic knowledge repository that provides high-level reasoning heuristics on generative boundaries and failure modes. Extensive experiments demonstrate that DVAR achieves competitive performance against supervised state-of-the-art methods while exhibiting superior generalization to unseen generative architectures. By transforming detection into a transparent debate, DVAR provides explicit, interpretable reasoning traces for robust video authenticity assessment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DVAR proposes a training-free agent debate for video detection with MDL adjudication, but lacks supporting results in the abstract.

read the letter

The main takeaway is that DVAR sets up video authenticity detection as a debate between a generative hypothesis agent and a natural mechanism agent. They cross-examine each other, and minimum description length picks the explanation with lower cost, all without training on any data. A knowledge base supplies the heuristics. This approach is new in how it orchestrates opposing agents for this specific task and uses MDL for adjudication instead of learned classifiers. It does well at targeting the real weakness in current methods, which is poor performance on generators not seen during training. The interpretable reasoning traces from the debate could be valuable for fact-checkers who need to understand why a video is flagged. The soft spots are significant because the abstract makes strong claims about competitive performance and superior generalization but provides no numbers, ablations, or implementation details. There is no description of how the agents are prompted, how the explanatory cost is computed in practice, or how GenVideoKB is constructed and updated. Without those, it is difficult to know if the system actually works as described or if the debate process converges reliably on the right answer. The stress-test note correctly flags this gap in the available information. This paper is for researchers working on robust deepfake detection and anyone exploring multi-agent reasoning for forensic applications. A reader focused on practical media forensics would get value from the high-level idea even if the details need filling in. The paper engages clearly with the literature on detection failures and proposes a distinct alternative. It deserves a serious referee. I recommend sending it to peer review, but the authors should be asked to include the experimental results and technical specifics upfront.

Referee Report

3 major / 1 minor

Summary. The paper proposes DVAR (Debate-based Video Authenticity Reasoning), a training-free framework that reformulates video authenticity detection as a multi-agent debate between a Generative Hypothesis Agent and a Natural Mechanism Agent. The agents engage in iterative cross-examination, with adjudication via an MDL-based Explanatory Cost that applies Occam's Razor, augmented by the GenVideoKB knowledge repository for generative heuristics. The central claim is that this process yields competitive performance against supervised state-of-the-art detectors while providing superior generalization to unseen generative architectures, along with interpretable reasoning traces.

Significance. If the empirical claims hold, the work would be significant for media forensics: it offers a training-free, interpretable alternative to supervised detectors that typically overfit to specific generators and fail on new architectures. The adversarial debate plus MDL adjudication mechanism could provide a principled way to leverage external knowledge without parameter fitting, addressing a key limitation in the field.

major comments (3)

[Abstract] Abstract: The central claims of 'competitive performance against supervised state-of-the-art methods' and 'superior generalization to unseen generative architectures' are asserted without any quantitative results, tables, ablation studies, or baseline comparisons. This absence makes it impossible to evaluate whether the debate process plus MDL adjudication actually delivers the stated gains.
[Abstract] Abstract: No equations, pseudocode, or procedural details are supplied for computing the Explanatory Cost under the MDL framework or for how the agents are prompted and how cross-examination is structured. These omissions are load-bearing because the reliability of convergence on correct labels depends directly on these mechanisms.
[Abstract] Abstract: The description of GenVideoKB as supplying 'high-level reasoning heuristics on generative boundaries and failure modes' is given without any characterization of its coverage, update mechanism, or handling of novel generators, leaving the generalization claim without concrete support.

minor comments (1)

[Abstract] The abstract introduces several invented entities (Generative Hypothesis Agent, Natural Mechanism Agent, GenVideoKB) without initial definitions or references to later sections where they are formalized.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our manuscript. We address each major comment point by point below, clarifying where the full paper provides supporting material and indicating the revisions we will make to improve the abstract's informativeness and self-containment.

read point-by-point responses

Referee: [Abstract] Abstract: The central claims of 'competitive performance against supervised state-of-the-art methods' and 'superior generalization to unseen generative architectures' are asserted without any quantitative results, tables, ablation studies, or baseline comparisons. This absence makes it impossible to evaluate whether the debate process plus MDL adjudication actually delivers the stated gains.

Authors: We acknowledge the referee's point that the abstract, as a high-level summary, does not embed specific numerical results or tables. The full manuscript contains these in Section 4 (Experiments), including Table 1 for direct comparisons against supervised SOTA detectors on standard benchmarks, Table 2 and associated analysis for generalization performance on unseen generative architectures, and Section 4.3 for ablations on the debate and MDL components. To make the abstract more self-contained and address the evaluation concern, we will revise it to include a concise statement of key quantitative outcomes. revision: partial
Referee: [Abstract] Abstract: No equations, pseudocode, or procedural details are supplied for computing the Explanatory Cost under the MDL framework or for how the agents are prompted and how cross-examination is structured. These omissions are load-bearing because the reliability of convergence on correct labels depends directly on these mechanisms.

Authors: The manuscript supplies the requested details outside the abstract: the MDL Explanatory Cost is formally defined with its computation in Section 3.2 (including the relevant equation), while agent prompting, cross-examination structure, and iteration protocol appear in Section 3.1 together with pseudocode as Algorithm 1. We agree that the abstract would benefit from a brief procedural pointer to these mechanisms, and we will add one sentence summarizing the MDL adjudication and debate structure in the revised version. revision: yes
Referee: [Abstract] Abstract: The description of GenVideoKB as supplying 'high-level reasoning heuristics on generative boundaries and failure modes' is given without any characterization of its coverage, update mechanism, or handling of novel generators, leaving the generalization claim without concrete support.

Authors: Section 3.4 of the manuscript provides the requested characterization of GenVideoKB, covering its construction and scope (heuristics drawn from analysis of multiple generative video models), the update process, and the extrapolation rules used for novel generators. We will incorporate a short supporting clause into the abstract to make this concrete and strengthen the generalization claim. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The DVAR framework is presented as a training-free process relying on multi-agent cross-examination, MDL-based Explanatory Cost adjudication, and external GenVideoKB heuristics. The abstract and described mechanism contain no equations, fitted parameters, self-definitional loops, or load-bearing self-citations that reduce any claimed result to its own inputs by construction. The central claims rest on logical convergence and external knowledge rather than any statistical fitting or renaming of known patterns within the target data, rendering the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 3 invented entities

The framework rests on the assumption that Occam's Razor via MDL will select the correct explanation and that the supplied knowledge base contains reliable generative failure modes; no numerical free parameters are mentioned.

axioms (1)

domain assumption Occam's Razor can be operationalized via Minimum Description Length to adjudicate between competing explanations
Invoked when defining Explanatory Cost to choose between generative and natural-mechanism accounts

invented entities (3)

Generative Hypothesis Agent no independent evidence
purpose: Proposes and defends generative explanations for observed video features
Core component of the debate framework
Natural Mechanism Agent no independent evidence
purpose: Proposes and defends natural-process explanations for observed video features
Core component of the debate framework
GenVideoKB no independent evidence
purpose: Dynamic repository of high-level reasoning heuristics on generative boundaries and failure modes
Provides external knowledge to the agents

pith-pipeline@v0.9.0 · 5512 in / 1414 out tokens · 43657 ms · 2026-05-10T07:45:39.757990+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 33 canonical work pages · 3 internal anchors

[1]

Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lučić, and Cordelia Schmid. 2021. ViViT: A Video Vision Transformer. arXiv:2103.15691 [cs.CV] https://arxiv.org/abs/2103.15691

work page arXiv 2021
[2]

2022.Deepfakes and synthetic media in the financial system: Assessing threat scenarios

Jon Bateman. 2022.Deepfakes and synthetic media in the financial system: Assessing threat scenarios. Carnegie Endowment for International Peace

2022
[3]

Gedas Bertasius, Heng Wang, and Lorenzo Torresani. 2021. Is Space-Time Atten- tion All You Need for Video Understanding?. InProceedings of the International Conference on Machine Learning (ICML)

2021
[4]

Yinqi Cai, Jichang Li, Zhaolun Li, Weikai Chen, Rushi Lan, Xi Xie, Xiaonan Luo, and Guanbin Li. 2025. DeepShield: Fortifying Deepfake Video Detection with Local and Global Forgery Analysis. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 12524–12534

2025
[5]

Joao Carreira and Andrew Zisserman. 2018. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. arXiv:1705.07750 [cs.CV] https://arxiv.org/ abs/1705.07750

work page Pith review arXiv 2018
[6]

Ritabrata Chakraborty, Rajatsubhra Chakraborty, Ali Khaleghi Rahimian, and Thomas MacDougall. 2025. TruthLens:A Training-Free Paradigm for DeepFake Detection. arXiv:2503.15342 [cs.CV] https://arxiv.org/abs/2503.15342

work page arXiv 2025
[7]

Haoxing Chen, Yan Hong, Zizheng Huang, Zhuoer Xu, Zhangxuan Gu, Yaohui Li, Jun Lan, Huijia Zhu, Jianfu Zhang, Weiqiang Wang, and Huaxiong Li. 2024. DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark. arXiv preprint arXiv:2405.19707(2024)

work page arXiv 2024
[8]

Haoxin Chen, Yong Zhang, Xiaodong Cun, Menghan Xia, Xintao Wang, Chao Weng, and Ying Shan. 2024. VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models. arXiv:2401.09047 [cs.CV]

work page arXiv 2024
[9]

Prafulla Dhariwal and Alex Nichol. 2021. Diffusion Models Beat GANs on Image Synthesis. arXiv:2105.05233 [cs.LG] https://arxiv.org/abs/2105.05233

work page internal anchor Pith review arXiv 2021
[10]

Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. 2019. SlowFast Networks for Video Recognition. arXiv:1812.03982 [cs.CV] https: //arxiv.org/abs/1812.03982

work page arXiv 2019
[11]

Niki Maria Foteinopoulou, Enjie Ghorbel, and Djamila Aouada. 2024. A Hitch- hikers Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning. arXiv:2410.00485 [cs.CV] https://arxiv.org/abs/2410.00485

work page arXiv 2024
[12]

Joel Frank, Thorsten Eisenhofer, Lea Schönherr, Asja Fischer, Dorothea Kolossa, and Thorsten Holz. 2020. Leveraging Frequency Analysis for Deep Fake Image Recognition. arXiv:2003.08685 [cs.CV] https://arxiv.org/abs/2003.08685

work page arXiv 2020
[13]

Xiao Guo, Xiufeng Song, Yue Zhang, Xiaohong Liu, and Xiaoming Liu. 2025. Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector. arXiv:2503.20188 [cs.CV] https://arxiv.org/abs/2503.20188

work page arXiv 2025
[14]

Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh. 2018. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? arXiv:1711.09577 [cs.CV] https://arxiv.org/abs/1711.09577

work page arXiv 2018
[15]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. arXiv:1512.03385 [cs.CV] https://arxiv.org/abs/ 1512.03385

work page internal anchor Pith review arXiv 2015
[16]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. InProceedings of the 34th International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA

2020
[17]

Sohail Ahmed Khan and Duc-Tien Dang-Nguyen. 2024. CLIPping the Deception: Adapting Vision-Language Models for Universal Deepfake Detection. InProceed- ings of the 2024 International Conference on Multimedia Retrieval. 1006–1015

2024
[18]

Dong Li, Jiaying Zhu, Xueyang Fu, Xun Guo, Yidi Liu, Gang Yang, Jiawei Liu, and Zheng-Jun Zha. 2024. Noise-Assisted Prompt Learning for Image Forgery Detection and Localization. InComputer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part XI(Milan, Italy). Springer-Verlag, Berlin, Heidelberg, 18–36...

work page doi:10.1007/978-3-031-73247- 2024
[19]

Kunchang Li, Yali Wang, Yinan He, Yizhuo Li, Yi Wang, Limin Wang, and Yu Qiao. 2022. UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer. arXiv:2211.09552 [cs.CV]

work page arXiv 2022
[20]

Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang Wen, and Baining Guo. 2020. Face X-ray for More General Face Forgery Detection. arXiv:1912.13458 [cs.CV] https://arxiv.org/abs/1912.13458

work page arXiv 2020
[21]

Yanghao Li, Chao-Yuan Wu, Haoqi Fan, Karttikeya Mangalam, Bo Xiong, Jitendra Malik, and Christoph Feichtenhofer. 2022. MViTv2: Improved multiscale vision transformers for classification and detection. InCVPR

2022
[22]

Yiheng Li, Yang Yang, Zichang Tan, Huan Liu, Weihua Chen, Xu Zhou, and Zhen Lei. 2025. Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation. arXiv:2506.05890 [cs.CV] https://arxiv.org/abs/2506.05890

work page arXiv 2025
[23]

Ji Lin, Chuang Gan, and Song Han. 2018. Temporal Shift Module for Efficient Video Understanding.arXiv preprint arXiv:1811.08383(2018)

work page arXiv 2018
[24]

Kaiqing Lin, Yuzhen Lin, Weixiang Li, Taiping Yao, and Bin Li. 2025. Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection. arXiv:2409.02664 [cs.CV] https://arxiv.org/abs/2409.02664 , , Hongyuan Qi, Feifei Shao, Ming Li, Hehe Fan, Jun Xiao

work page arXiv 2025
[25]

Yixin Liu, Kai Zhang, Yuan Li, Zhiling Yan, Chujie Gao, Ruoxi Chen, Zhengqing Yuan, Yue Huang, Hanchi Sun, Jianfeng Gao, Lifang He, and Lichao Sun. 2024. Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models. arXiv:2402.17177 [cs.CV] https://arxiv.org/abs/2402.17177

work page internal anchor Pith review arXiv 2024
[26]

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows.arXiv preprint arXiv:2103.14030(2021)

work page arXiv 2021
[27]

Ze Liu, Jia Ning, Yue Cao, Yixuan Wei, Zheng Zhang, Stephen Lin, and Han Hu
[28]

Video Swin Transformer.arXiv preprint arXiv:2106.13230(2021)

work page arXiv 2021
[29]

Abdullahi, and Ahmad Neyaz Khan

Asad Malik, Minoru Kuribayashi, Sani M. Abdullahi, and Ahmad Neyaz Khan
[30]

doi:10.1109/ACCESS.2022.3151186

DeepFake Detection for Human Face Images and Videos: A Survey.IEEE Access10 (2022), 18757–18775. doi:10.1109/ACCESS.2022.3151186

work page doi:10.1109/access.2022.3151186 2022
[31]

Scott McCloskey and Michael Albright. 2018. Detecting GAN-generated Imagery using Color Cues. arXiv:1812.08247 [cs.CV] https://arxiv.org/abs/1812.08247

work page arXiv 2018
[32]

Bappy, Amit K

Lakshmanan Nataraj, Tajuddin Manhar Mohammed, Shivkumar Chandrasekaran, Arjuna Flenner, Jawadul H. Bappy, Amit K. Roy-Chowdhury, and B. S. Manjunath
[33]

arXiv preprint arXiv:1903.06836 (2019)

Detecting GAN generated Fake Images using Co-occurrence Matrices. arXiv:1903.06836 [cs.CV] https://arxiv.org/abs/1903.06836

work page arXiv 1903
[34]

Zhenliang Ni, Qiangyu Yan, Mouxiao Huang, Tianning Yuan, Yehui Tang, Hailin Hu, Xinghao Chen, and Yunhe Wang. 2025. GenVidBench: A Challeng- ing Benchmark for Detecting AI-Generated Video. arXiv:2501.11340 [cs.CV] https://arxiv.org/abs/2501.11340

work page arXiv 2025
[35]

Utkarsh Ojha, Yuheng Li, and Yong Jae Lee. 2023. Towards Universal Fake Image Detectors that Generalize Across Generative Models. InCVPR

2023
[36]

Jonas Ricker, Simon Damm, Thorsten Holz, and Asja Fischer. 2024. Towards the Detection of Diffusion Model Deepfakes. arXiv:2210.14571 [cs.CV]

work page arXiv 2024
[37]

Andreas Rössler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. FaceForensics++: Learning to Detect Manipulated Facial Images. InInternational Conference on Computer Vision (ICCV)

2019
[38]

Haixu Song, Shiyu Huang, Yinpeng Dong, and Wei-Wei Tu. 2023. Robustness and Generalizability of Deepfake Detection: A Study with Diffusion Models. arXiv:2309.02218 [cs.CV] https://arxiv.org/abs/2309.02218

work page arXiv 2023
[39]

Khoa-Dang Tran. 2025. Explainable Manipulated Videos Detection Using Multi- modal Large Language Models. InCompanion Proceedings of the ACM on Web Conference 2025(Sydney NSW, Australia)(WWW ’25). Association for Computing Machinery, New York, NY, USA, 725–728. doi:10.1145/3701716.3715283

work page doi:10.1145/3701716.3715283 2025
[40]

Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Hezhen Hu, Hong Chen, and Houqiang Li. 2023. DIRE for Diffusion-Generated Image Detection. arXiv preprint arXiv:2303.09295(2023)

work page arXiv 2023
[41]

Haiquan Wen, Tianxiao Li, Zhenglin Huang, Yiwei He, and Guangliang Cheng
[42]

arXiv:2507.14632 [cs.CV] https://arxiv.org/abs/ 2507.14632

BusterX++: Towards Unified Cross-Modal AI-Generated Content Detection and Explanation with MLLM. arXiv:2507.14632 [cs.CV] https://arxiv.org/abs/ 2507.14632

work page arXiv
[43]

Mika Westerlund. 2019. The emergence of deepfake technology: A review.Tech- nology innovation management review9, 11 (2019)

2019
[44]

Yuting Xu, Jian Liang, Gengyun Jia, Ziming Yang, Yanhao Zhang, and Ran He. 2024. TALL: Thumbnail Layout for Deepfake Video Detection. arXiv:2307.07494 [cs.CV] https://arxiv.org/abs/2307.07494

work page arXiv 2024
[45]

Yongqi Yang, Zhihao Qian, Ye Zhu, Olga Russakovsky, and Yu Wu. 2025. D3: Scaling Up Deepfake Detection by Learning from Discrepancy. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

2025
[46]

Yue Zhang, Ben Colman, Xiao Guo, Ali Shahriyari, and Gaurav Bharaj. 2024. Common Sense Reasoning for Deepfake Detection. arXiv:2402.00126 [cs.CV] https://arxiv.org/abs/2402.00126

work page arXiv 2024
[47]

Ziyin Zhou, Yunpeng Luo, Yuanchen Wu, Ke Sun, Jiayi Ji, Ke Yan, Shouhong Ding, Xiaoshuai Sun, Yunsheng Wu, and Rongrong Ji. 2025. AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models.arXiv preprint arXiv:2507.02664(2025)

work page arXiv 2025