Auditing Generalization in AI-Generated Video Detection: A Six-Control Protocol and the VidAudit Toolkit

Hasan Kurban; Mehmet Dalkilic; Mert Onur Cakiroglu; Zhihe Lu

arxiv: 2606.31004 · v1 · pith:LTJFWQ3Hnew · submitted 2026-06-30 · 💻 cs.CV

Auditing Generalization in AI-Generated Video Detection: A Six-Control Protocol and the VidAudit Toolkit

Mert Onur Cakiroglu , Zhihe Lu , Mehmet Dalkilic , Hasan Kurban This is my paper

Pith reviewed 2026-07-01 00:50 UTC · model grok-4.3

classification 💻 cs.CV

keywords AI-generated video detectiongeneralization auditleave-one-generator-out evaluationforensic video analysisevaluation protocolVidAudit toolkitcross-substrate feature fusion

0 comments

The pith

Audited evaluation shows most reported generalization in AI-video detectors is an artifact of uncontrolled confounds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that existing benchmarks for AI-generated video detection allow spurious signals to inflate apparent generalization performance. A simple clip-length classifier reaches 0.998 leave-one-generator-out AUC on one benchmark when evaluation is unaudited, yet measures nothing about motion. A survey of twenty prior papers finds none apply all six standard controls needed to catch such failures. When the authors enforce the full protocol across six feature sources and two datasets, the trivial classifier drops to chance level, some published detectors are revealed to carry dataset identity, and only certain methods maintain high scores with proper coherence checks. The work therefore replaces single-number AUC reporting with a four-element audited tuple that includes operating-point recall and calibration.

Core claim

The six-control audited protocol applied to leave-one-generator-out evaluation collapses a three-feature clip-length classifier to 0.529 AUC, exposes a CLIP baseline as carrying dataset identity, certifies the 2025 WaveRep detector at 0.996 out-of-distribution AUC with chance-level real-vs-real coherence, shows that multiple high-AUC methods drop to single-digit recall at 0.1 percent FPR, and demonstrates that cross-substrate fusion of TemporalSpec motion vectors supplies complementarity that survives the audit.

What carries the argument

The six-control protocol assembled from a twenty-paper survey, implemented in the VidAudit toolkit that unifies fourteen detectors behind one API and supplies Croissant metadata for cross-dataset re-evaluation.

If this is right

Leaderboard order changes once recall at deployable false-positive rate replaces raw AUC.
Cross-substrate fusion can add genuine signal that survives the audit.
Any new detector must be accompanied by the four-element audited tuple rather than AUC alone.
Real-versus-real coherence must be reported to confirm that a method is not merely separating generator families.
VidAudit provides a single plugin API and leaderboard for consistent audited comparisons.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Designers of future detectors can use the six controls as a checklist during method development rather than after the fact.
The same protocol structure could be adapted to image or audio generation detection tasks that currently rely on similar unaudited leave-one-generator-out splits.
Deployment pipelines would gain reliability by requiring the operating-point recall and calibration numbers the audit emphasizes.
Community adoption of the open toolkit could shift the field from ranking papers to verifying that claimed generalization actually measures motion or synthesis artifacts.

Load-bearing premise

That the six controls identified in the survey are both necessary and jointly sufficient to remove every confound capable of producing spurious generalization signals.

What would settle it

A detector that passes all six controls yet is later shown to exploit an uncontrolled artifact such as generator-specific metadata or compression signature not captured by the protocol.

Figures

Figures reproduced from arXiv: 2606.31004 by Hasan Kurban, Mehmet Dalkilic, Mert Onur Cakiroglu, Zhihe Lu.

**Figure 1.** Figure 1: The audit’s verdict at a glance. Each bar is a detector’s audited LOGO OOD on the matched 27k GenVidBench subset, split into the dataset-identity floor it shares with a real-vsreal probe (gray; C3) and the above-floor margin (colored). Green detectors clear the floor by a wide margin and are certified; orange readouts sit close to high floors; the three-feature clip-length probe (red) collapses from 0.99… view at source ↗

**Figure 2.** Figure 2: The leaderboard metric does not predict deployability. For each detector, the hollow marker is its old leaderboard LOGO OOD AUC and the bold filled marker its new deployable recall (true-positive rate, TPR, at FPR= 0.1%) on the matched 27k subset; the connector traces the drop. WaveRep and ReStraV (green) remain effective at this operating point; the high-AUC published baselines (gray) collapse to ≤ 3% r… view at source ↗

read the original abstract

AI-generated video detection benchmarks such as GenVidBench and AIGVDBench are the de facto leaderboards, yet most evaluation protocols leave uncontrolled confounds that can inflate reported generalization. As an existence proof, a three-feature clip-length classifier reaches a leave-one-generator-out (LOGO) AUC of 0.998 on GenVidBench under unaudited evaluation, while measuring nothing about motion. A 20-paper survey finds none applying all six standard controls that would catch this, so we combine them into an audited protocol and apply it to six representative feature sources (three published detectors and three repurposed signal sources), re-running it cross-dataset on AIGVDBench. The audit both debunks and certifies: the trivial classifier collapses to near chance (0.529), a CLIP baseline is caught carrying dataset identity, and the 2025 forensic detector WaveRep clears the floor at out-of-distribution LOGO AUC 0.996 with chance-level real-vs-real coherence. At a deployable FPR of 0.1%, multiple high-AUC methods fall to single-digit recall and the leaderboard order changes, so we recommend an audited tuple (AUC, above-floor margin, operating-point recall, and calibration) over a single number. As a white-box positive control, we add TemporalSpec (codec motion vectors); via cross-substrate feature fusion (XSFF), a second substrate adds genuine complementarity that survives the audit. We release VidAudit, to our knowledge the largest unified and audited detector collection for this task, providing 14 detectors behind one plugin API, a leaderboard, and Croissant metadata, available at https://github.com/KurbanIntelligenceLab/vidaudit. Together, the protocol and toolkit move evaluation from leaderboard rank toward whether a result measures what it claims.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's main contribution is a six-control audit protocol plus toolkit that exposes how clip-length and other simple confounds can inflate LOGO generalization numbers in video deepfake detectors, with concrete drops shown for a trivial baseline and a hold-up for WaveRep.

read the letter

The useful part is the explicit protocol and the survey result that none of the 20 prior papers applied all six controls. They demonstrate the point with a length-only classifier that hits 0.998 AUC under standard LOGO but falls to 0.529 once the controls are applied, and they show WaveRep staying at 0.996 out-of-distribution with real-vs-real coherence near chance. The cross-dataset run on AIGVDBench and the operating-point recall numbers at 0.1% FPR are concrete, and the released VidAudit collection with 14 detectors behind one API is a practical addition. The recommendation to report an audited tuple rather than raw AUC alone is reasonable.

The soft spot is that the six controls are derived from the survey and shown to work on the length example and a few detectors, but the paper does not provide a general argument or exhaustive check that they jointly eliminate every route to spurious signals. The abstract gives the numeric results, yet without the full methods section it is not possible to verify the exact definitions, exclusion rules, or LOGO split construction for hidden selection effects. The TemporalSpec positive control is a nice touch but does not address the completeness question.

This is a subfield paper aimed at people building or evaluating video deepfake detectors for forensics or misinformation work. It is worth sending to peer review because the empirical demonstration is clear enough to be checked and the toolkit lowers the barrier for others to apply the protocol.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a six-control auditing protocol for leave-one-generator-out (LOGO) evaluation of AI-generated video detectors to control confounds identified in a 20-paper survey. It shows that a trivial three-feature clip-length classifier reaches 0.998 LOGO AUC without the controls but drops to 0.529 with them, applies the protocol to six feature sources (including published detectors and repurposed signals), certifies the 2025 WaveRep detector at 0.996 out-of-distribution LOGO AUC with chance-level real-vs-real coherence, demonstrates that high-AUC methods drop to single-digit recall at 0.1% FPR (changing leaderboard order), and recommends an audited metric tuple. It also introduces TemporalSpec via cross-substrate feature fusion and releases the VidAudit toolkit with 14 detectors, leaderboard, and Croissant metadata.

Significance. If the protocol holds, the work would strengthen evaluation standards in AI video forensics by moving beyond single-number leaderboards toward controlled, falsifiable claims; the empirical collapse of the clip-length baseline and the certification of WaveRep provide concrete evidence of the protocol's utility. The open release of VidAudit with a unified plugin API and reproducible metadata is a clear strength for the field, enabling future audits and cross-substrate experiments.

major comments (3)

[§3 and §4] §3 (Protocol Definition) and §4 (Survey): the central claim that the six controls are both necessary and jointly sufficient to eliminate all confounds capable of producing spurious LOGO generalization signals rests on the 20-paper survey showing that no prior work used all six, but provides no general argument, exhaustive enumeration of possible confounds, or proof of completeness beyond the single clip-length classifier example. This is load-bearing for the certification of WaveRep at 0.996 AUC and the debunks of other methods.
[§5] §5 (Results, cross-dataset evaluation): the exact definitions of the six controls, data exclusion rules, and construction of LOGO splits on both GenVidBench and AIGVDBench are not specified at a level that permits independent verification of absence of post-hoc selection or implementation gaps, undermining confidence in the reported AUC drops and operating-point recalls at 0.1% FPR.
[§6] §6 (WaveRep certification): the claim that WaveRep 'clears the floor' at LOGO AUC 0.996 with chance-level real-vs-real coherence is presented as the outcome of the audited protocol, yet without a demonstrated completeness argument for the controls this result cannot be distinguished from an artifact of the particular control set chosen.

minor comments (2)

[§4.3] The introduction of TemporalSpec (codec motion vectors) as a white-box positive control is useful, but its precise feature extraction pipeline and integration via XSFF would benefit from an explicit equation or pseudocode block for reproducibility.
[Tables 2-4] Figure captions and table headers should explicitly state which controls were active for each reported AUC to avoid ambiguity when readers compare to prior unaudited numbers.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive review and for recognizing the potential value of the auditing protocol and VidAudit toolkit. We address each major comment below with honest responses and indicate planned revisions to improve clarity and reproducibility without misrepresenting the manuscript's contributions or claims.

read point-by-point responses

Referee: [§3 and §4] the central claim that the six controls are both necessary and jointly sufficient to eliminate all confounds capable of producing spurious LOGO generalization signals rests on the 20-paper survey showing that no prior work used all six, but provides no general argument, exhaustive enumeration of possible confounds, or proof of completeness beyond the single clip-length classifier example. This is load-bearing for the certification of WaveRep at 0.996 AUC and the debunks of other methods.

Authors: We agree that the manuscript does not offer a formal proof or exhaustive enumeration establishing that the six controls are jointly sufficient against every conceivable confound. The controls are derived directly from confounds documented in the 20-paper survey, with the clip-length classifier serving as an existence proof of their practical effect. We do not claim the set is complete or eliminates all possible spurious signals. In revision we will update §3 to state explicitly that the protocol is a standardized auditing framework addressing observed confounds from the literature, rather than asserting completeness, and will add a limitations paragraph on this point. revision: yes
Referee: [§5] the exact definitions of the six controls, data exclusion rules, and construction of LOGO splits on both GenVidBench and AIGVDBench are not specified at a level that permits independent verification of absence of post-hoc selection or implementation gaps, undermining confidence in the reported AUC drops and operating-point recalls at 0.1% FPR.

Authors: We accept that the current level of detail in §5 is insufficient for full independent verification. Although the controls are defined in §3, we will expand §5 with explicit pseudocode for each control, precise data exclusion criteria, and step-by-step descriptions of LOGO split construction on both datasets. These additions will be included in the revised manuscript to support reproducibility. revision: yes
Referee: [§6] the claim that WaveRep 'clears the floor' at LOGO AUC 0.996 with chance-level real-vs-real coherence is presented as the outcome of the audited protocol, yet without a demonstrated completeness argument for the controls this result cannot be distinguished from an artifact of the particular control set chosen.

Authors: The certification of WaveRep is presented as its performance under the defined audited protocol, not as unconditional proof of generalization. We will revise §6 to clarify the conditional nature of the result, emphasize that it shows absence of the specific spurious signals addressed by the controls, and note that future identification of additional confounds could require protocol updates. This framing avoids overclaiming while preserving the empirical observation. revision: yes

Circularity Check

0 steps flagged

No circularity; purely empirical protocol with independent controls

full rationale

The manuscript is an empirical audit: it surveys 20 prior papers to identify six controls, applies those controls to re-evaluate six feature sources on two benchmarks, and reports measured AUC, recall, and coherence values. No equations, fitted parameters, or derivations appear; the central claims are experimental outcomes under the stated protocol rather than reductions of outputs to inputs by construction. The survey finding that no prior work used all six controls is a factual count, not a self-referential premise. Self-citations, if present, are not load-bearing for the reported results.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

Empirical benchmarking study. Relies on standard ML metrics (AUC, FPR) and the domain assumption that the six controls suffice to isolate generalization about generation artifacts.

axioms (2)

standard math AUC and operating-point recall at fixed FPR are appropriate primary metrics for detection generalization
Standard practice in binary classification evaluation
domain assumption The six controls identified in the 20-paper survey are the complete set required to remove confounds
Central premise of the proposed protocol

invented entities (1)

TemporalSpec (codec motion vectors) no independent evidence
purpose: White-box positive control substrate for motion information
Introduced as an additional feature source to demonstrate cross-substrate fusion

pith-pipeline@v0.9.1-grok · 5876 in / 1490 out tokens · 46326 ms · 2026-07-01T00:50:55.559227+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

50 extracted references · 1 canonical work pages

[1]

Croissant: A metadata format for ml- ready datasets

Mubashara Akhtar, Omar Benjelloun, Costanza Conforti, Luca Foschini, Pieter Gijsbers, Joan Giner-Miguelez, Su- jata Goswami, Nitisha Jain, Michalis Karamousadakis, Satyapriya Krishna, Michael Kuchnik, Sylvain Lesage, Quentin Lhoest, Pierre Marcenac, Manil Maskey, Peter Mattson, Luis Oala, Hamidah Oderinwale, Pierre Ruyssen, Tim Santos, Rajat Shinde, Elena...
[2]

Ai-generated video detection via spatial-temporal anomaly learning

Jianfa Bai, Man Lin, Gang Cao, and Zijie Lou. Ai-generated video detection via spatial-temporal anomaly learning. In Pattern Recognition and Computer Vision, pages 460–470, Singapore, 2025. Springer Nature Singapore. 27

2025
[3]

Stable video diffusion: Scaling latent video diffusion models to large datasets, 2023

Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram V oleti, Adam Letts, Varun Jampani, and Robin Rombach. Stable video diffusion: Scaling latent video diffusion models to large datasets, 2023. 1, 2

2023
[4]

Deepfakes Are Coming for the Finan- cial Sector.The Wall Street Journal, 2024

Isabelle Bousquette. Deepfakes Are Coming for the Finan- cial Sector.The Wall Street Journal, 2024. 1

2024
[5]

Deepfake-eval-2024: A multi-modal in-the-wild benchmark of deepfakes circulated in 2024

Nuria Alina Chandra, Hannah Lee, Ryan Murtfeldt, Lin Qiu, Arnab Karmakar, Emmanuel Tanumihardja, Kevin Farhat, Ben Caffee, Changyeon Lee, Jongwook Choi, Sejin Paik, Aerin Kim, and Oren Etzioni. Deepfake-eval-2024: A multi-modal in-the-wild benchmark of deepfakes circulated in 2024. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Re...

2024
[6]

How far are ai-generated videos from simulating the 3d visual world: A learned 3d evaluation approach, 2025

Chirui Chang, Jiahui Liu, Zhengzhe Liu, Xiaoyang Lyu, Yi- Hua Huang, Xin Tao, Pengfei Wan, Di Zhang, and Xiaojuan Qi. How far are ai-generated videos from simulating the 3d visual world: A learned 3d evaluation approach, 2025. 27

2025
[7]

Videocrafter2: Overcoming data limitations for high-quality video diffu- sion models

Haoxin Chen, Yong Zhang, Xiaodong Cun, Menghan Xia, Xintao Wang, Chao Weng, and Ying Shan. Videocrafter2: Overcoming data limitations for high-quality video diffu- sion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7310–7320, 2024. 1, 2

2024
[8]

Demamba: Ai- generated video detection on million-scale genvideo bench- mark.Science China Information Sciences, 69(6):162103,

Haoxing Chen, Yan Hong, Zizheng Huang, Zhuoer Xu, Zhangxuan Gu, Yaohui Li, Jun Lan, Huijia Zhu, Jianfu Zhang, Weiqiang Wang, and Huaxiong Li. Demamba: Ai- generated video detection on million-scale genvideo bench- mark.Science China Information Sciences, 69(6):162103,
[9]

Seeing what matters: Generalizable ai-generated video detection with forensic-oriented augmentation

Riccardo Corvi, Davide Cozzolino, Ekta Prashnani, Shalini De Mello, Koki Nagano, and Luisa Verdoliva. Seeing what matters: Generalizable ai-generated video detection with forensic-oriented augmentation. InAdvances in Neural In- formation Processing Systems, pages 26418–26446. Curran Associates, Inc., 2025. 2, 6, 7, 18, 26, 27

2025
[10]

You can ground earlier than see: An effective and efficient pipeline for temporal sentence grounding in compressed videos

Xiang Fang, Daizong Liu, Pan Zhou, and Guoshun Nan. You can ground earlier than see: An effective and efficient pipeline for temporal sentence grounding in compressed videos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2448–2460, 2023. 3

2023
[11]

Leveraging fre- quency analysis for deep fake image recognition

Joel Frank, Thorsten Eisenhofer, Lea Sch ¨onherr, Asja Fis- cher, Dorothea Kolossa, and Thorsten Holz. Leveraging fre- quency analysis for deep fake image recognition. InPro- ceedings of the 37th International Conference on Machine Learning, pages 3247–3258. PMLR, 2020. 4, 22

2020
[12]

On the content bias in frechet video distance

Songwei Ge, Aniruddha Mahapatra, Gaurav Parmar, Jun- Yan Zhu, and Jia-Bin Huang. On the content bias in frechet video distance. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7277–7288, 2024. 3

2024
[13]

Exposing ai-generated videos: A benchmark dataset and a local-and-global temporal defect based detec- tion method, 2024

Peisong He, Leyao Zhu, Jiaxing Li, Shiqi Wang, and Hao- liang Li. Exposing ai-generated videos: A benchmark dataset and a local-and-global temporal defect based detec- tion method, 2024. 27

2024
[14]

Gans trained by a two time-scale update rule converge to a local nash equilib- rium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilib- rium. InAdvances in Neural Information Processing Sys- tems. Curran Associates, Inc., 2017. 3

2017
[15]

Cogvideo: Large-scale pretraining for text-to-video generation via transformers, 2022

Wenyi Hong, Ming Ding, Wendi Zheng, Xinghan Liu, and Jie Tang. Cogvideo: Large-scale pretraining for text-to-video generation via transformers, 2022. 1, 2

2022
[16]

Vbench: Com- prehensive benchmark suite for video generative models

Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, Limin Wang, Dahua Lin, Yu Qiao, and Ziwei Liu. Vbench: Com- prehensive benchmark suite for video generative models. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Reco...
[17]

Ai-generated video detection via perceptual straightening

Christian Intern `o, Robert Geirhos, Markus Olhofer, Sunny Liu, Barbara Hammer, and David Klindt. Ai-generated video detection via perceptual straightening. InAdvances in Neural Information Processing Systems, pages 20672–20705. Cur- ran Associates, Inc., 2025. 2, 3, 4, 5, 6, 7, 8, 18, 27, 29

2025
[18]

Text2video-zero: Text-to- image diffusion models are zero-shot video generators, 2023

Levon Khachatryan, Andranik Movsisyan, Vahram Tade- vosyan, Roberto Henschel, Zhangyang Wang, Shant Navasardyan, and Humphrey Shi. Text2video-zero: Text-to- image diffusion models are zero-shot video generators, 2023. 1, 2

2023
[19]

Roy-Chowdhury

Rohit Kundu, Hao Xiong, Vishal Mohanty, Athula Balachan- dran, and Amit K. Roy-Chowdhury. Towards a universal synthetic video detector: From face or background manip- ulations to fully ai-generated content. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 28050–28060, 2025. 27

2025
[20]

Preserving forgery artifacts: Ai- generated video detection at native scale, 2026

Zhengcen Li, Chenyang Jiang, Hang Zhao, Shiyang Zhou, Yunyang Mo, Feng Gao, Fan Yang, Qiben Shan, Shaocong Wu, and Jingyong Su. Preserving forgery artifacts: Ai- generated video detection at native scale, 2026. 27

2026
[21]

Fr ´echet video motion distance: A metric for evaluating motion consistency in videos, 2024

Jiahe Liu, Youran Qu, Qi Yan, Xiaohui Zeng, Lele Wang, and Renjie Liao. Fr ´echet video motion distance: A metric for evaluating motion consistency in videos, 2024. 2, 3, 6, 7, 26, 27 9

2024
[22]

Detecting ai-generated video via frame consistency

Long Ma, Zhiyuan Yan, Qinglang Guo, Yong Liao, Haiyang Yu, and Pengyuan Zhou. Detecting ai-generated video via frame consistency. In2025 IEEE International Conference on Multimedia and Expo (ICME), pages 1–6, 2025. 3, 8, 27

2025
[23]

Your one-stop solution for ai-generated video detection, 2026

Long Ma, Zihao Xue, Yan Wang, Zhiyuan Yan, Jin Xu, Xi- aorui Jiang, Haiyang Yu, Yong Liao, and Zhen Bi. Your one-stop solution for ai-generated video detection, 2026. 1, 2, 3, 8, 26, 27, 29

2026
[24]

William J. McGill. Multivariate information transmission. Psychometrika, 19(2):97–116, 1954. 24

1954
[25]

Video diffusion models: A survey, 2024

Andrew Melnik, Michal Ljubljanac, Cong Lu, Qi Yan, Weiming Ren, and Helge Ritter. Video diffusion models: A survey, 2024. 1

2024
[26]

Genvidbench: A challenging benchmark for detecting ai-generated video, 2025

Zhenliang Ni, Qiangyu Yan, Mouxiao Huang, Tianning Yuan, Yehui Tang, Hailin Hu, Xinghao Chen, and Yunhe Wang. Genvidbench: A challenging benchmark for detecting ai-generated video, 2025. 1, 5, 27

2025
[27]

Vidguard-r1: Ai-generated video detection and explanation via reasoning mllms and rl, 2026

Kyoungjun Park, Yifan Yang, Juheon Yi, Shicheng Zheng, Yifei Shen, Dongqi Han, Caihua Shan, Muhammad Muaz, and Lili Qiu. Vidguard-r1: Ai-generated video detection and explanation via reasoning mllms and rl, 2026. 3, 8, 27

2026
[28]

Pika 2.0: Integrating custom characters and scenes in ai video generation.https://venturebeat

Pika Labs. Pika 2.0: Integrating custom characters and scenes in ai video generation.https://venturebeat. com / ai / pika - 2 - 0 - launches - in - wake - of - sora - integrating - your - own - characters - objects-scenes-in-new-ai-videos/, 2024. Ac- cessed: 2025-06-07. 2

2024
[29]

On the spectral bias of neural networks

Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred Hamprecht, Yoshua Bengio, and Aaron Courville. On the spectral bias of neural networks. InProceedings of the 36th International Conference on Ma- chine Learning, pages 5301–5310. PMLR, 2019. 4, 21

2019
[30]

AI Deepfakes Are Getting Weirder and Harder to Spot in the Midterms.The Wall Street Journal,

Amrith Ramkumar. AI Deepfakes Are Getting Weirder and Harder to Spot in the Midterms.The Wall Street Journal,
[31]

Richardson.The H.264 Advanced Video Compression Standard

Iain E. Richardson.The H.264 Advanced Video Compression Standard. Wiley Publishing, 2nd edition, 2010. 3, 4

2010
[32]

Dmc-net: Generating discriminative motion cues for fast compressed video action recognition

Zheng Shou, Xudong Lin, Yannis Kalantidis, Laura Sevilla- Lara, Marcus Rohrbach, Shih-Fu Chang, and Zhicheng Yan. Dmc-net: Generating discriminative motion cues for fast compressed video action recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 3

2019
[33]

Solana-Cipres, G

C. Solana-Cipres, G. Fernandez-Escribano, L. Rodriguez- Benitez, J. Moreno-Garcia, and L. Jimenez-Linares. Real- time moving object segmentation in h.264 compressed do- main based on approximate reasoning.International Journal of Approximate Reasoning, 51(1):99–114, 2009. 3

2009
[34]

Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand

Gary J. Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand. Overview of the high efficiency video coding (hevc) standard.IEEE Transactions on Circuits and Systems for Video Technology, 22(12):1649–1668, 2012. 3, 4, 5

2012
[35]

Videoveritas: Ai-generated video detection via perception pretext reinforcement learning, 2026

Hao Tan, Jun Lan, Senyuan Shi, Zichang Tan, Zijian Yu, Huijia Zhu, Weiqiang Wang, Jun Wan, and Zhen Lei. Videoveritas: Ai-generated video detection via perception pretext reinforcement learning, 2026. 27

2026
[36]

Raft: Recurrent all-pairs field transforms for optical flow

Zachary Teed and Jia Deng. Raft: Recurrent all-pairs field transforms for optical flow. InComputer Vision – ECCV 2020, pages 402–419, Cham, 2020. Springer International Publishing. 2, 6, 7

2020
[37]

To- wards accurate generative models of video: A new metric & challenges, 2019

Thomas Unterthiner, Sjoerd van Steenkiste, Karol Kurach, Raphael Marinier, Marcin Michalski, and Sylvain Gelly. To- wards accurate generative models of video: A new metric & challenges, 2019. 3

2019
[38]

Cmta: Leveraging cross-modal tem- poral artifacts for generalizable ai-generated video detection,

Hang Wang, Chao Shen, Chenhao Lin, Minghui Yang, Lei Zhang, and Cong Wang. Cmta: Leveraging cross-modal tem- poral artifacts for generalizable ai-generated video detection,
[39]

Atss: Detecting ai-generated videos via anomalous temporal self- similarity, 2026

Hang Wang, Chao Shen, Lei Zhang, and Zhi-Qi Cheng. Atss: Detecting ai-generated videos via anomalous temporal self- similarity, 2026. 3, 8, 27

2026
[40]

Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A. Efros. Cnn-generated images are surprisingly easy to spot... for now. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 4, 22

2020
[41]

Vidprom: A million-scale real prompt-gallery dataset for text-to-video diffusion models

Wenhao Wang and Yi Yang. Vidprom: A million-scale real prompt-gallery dataset for text-to-video diffusion models. Thirty-eighth Conference on Neural Information Processing Systems, 2024. 3

2024
[42]

Swap attention in spa- tiotemporal diffusions for text-to-video generation, 2024

Wenjing Wang, Huan Yang, Zixi Tuo, Huiguo He, Junchen Zhu, Jianlong Fu, and Jiaying Liu. Swap attention in spa- tiotemporal diffusions for text-to-video generation, 2024. 3

2024
[43]

Compression-aware video super- resolution

Yingwei Wang, Takashi Isobe, Xu Jia, Xin Tao, Huchuan Lu, and Yu-Wing Tai. Compression-aware video super- resolution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2012–2021, 2023. 3

2012
[44]

Robustsora: De- watermarked benchmark for robust ai-generated video detec- tion, 2026

Zhuo Wang, Xiliang Liu, and Ligang Sun. Robustsora: De- watermarked benchmark for robust ai-generated video detec- tion, 2026. 2, 26, 27

2026
[45]

P. Welch. The use of fast fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms.IEEE Transactions on Audio and Electroacoustics, 15(2):70–73, 1967. 5

1967
[46]

Wiegand, G.J

T. Wiegand, G.J. Sullivan, G. Bjontegaard, and A. Luthra. Overview of the h.264/avc video coding standard.IEEE Transactions on Circuits and Systems for Video Technology, 13(7):560–576, 2003. 4, 23

2003
[47]

Manmatha, Alexander J

Chao-Yuan Wu, Manzil Zaheer, Hexiang Hu, R. Manmatha, Alexander J. Smola, and Philipp Kr ¨ahenb¨uhl. Compressed video action recognition. InProceedings of the IEEE Confer- ence on Computer Vision and Pattern Recognition (CVPR),
[48]

Physics-driven spatiotemporal modeling for ai-generated video detection

Shuhai Zhang, ZiHao Lian, Jiahao Yang, Daiyuan Li, Guox- uan Pang, Feng Liu, Bo Han, Shutao Li, and Mingkui Tan. Physics-driven spatiotemporal modeling for ai-generated video detection. InAdvances in Neural Information Pro- cessing Systems, pages 174683–174730. Curran Associates, Inc., 2025. 26, 27

2025
[49]

D3: Training-free ai-generated video detection using 10 second-order features

Chende Zheng, Ruiqi Suo, Chenhao Lin, Zhengyu Zhao, Le Yang, Shuai Liu, Minghui Yang, Cong Wang, and Chao Shen. D3: Training-free ai-generated video detection using 10 second-order features. InProceedings of the IEEE/CVF In- ternational Conference on Computer Vision (ICCV), pages 12852–12862, 2025. 2, 3, 4, 6, 7, 17, 27

2025
[50]

combined FVMD

Qiang Zhu, Jinhua Hao, Yukang Ding, Yu Liu, Qiao Mo, Ming Sun, Chao Zhou, and Shuyuan Zhu. Cpga: Cod- ing priors-guided aggregation network for compressed video quality enhancement. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2964–2974, 2024. 3 11 Supplementary Material This supplementary material is g...

work page arXiv 2024

[1] [1]

Croissant: A metadata format for ml- ready datasets

Mubashara Akhtar, Omar Benjelloun, Costanza Conforti, Luca Foschini, Pieter Gijsbers, Joan Giner-Miguelez, Su- jata Goswami, Nitisha Jain, Michalis Karamousadakis, Satyapriya Krishna, Michael Kuchnik, Sylvain Lesage, Quentin Lhoest, Pierre Marcenac, Manil Maskey, Peter Mattson, Luis Oala, Hamidah Oderinwale, Pierre Ruyssen, Tim Santos, Rajat Shinde, Elena...

[2] [2]

Ai-generated video detection via spatial-temporal anomaly learning

Jianfa Bai, Man Lin, Gang Cao, and Zijie Lou. Ai-generated video detection via spatial-temporal anomaly learning. In Pattern Recognition and Computer Vision, pages 460–470, Singapore, 2025. Springer Nature Singapore. 27

2025

[3] [3]

Stable video diffusion: Scaling latent video diffusion models to large datasets, 2023

Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram V oleti, Adam Letts, Varun Jampani, and Robin Rombach. Stable video diffusion: Scaling latent video diffusion models to large datasets, 2023. 1, 2

2023

[4] [4]

Deepfakes Are Coming for the Finan- cial Sector.The Wall Street Journal, 2024

Isabelle Bousquette. Deepfakes Are Coming for the Finan- cial Sector.The Wall Street Journal, 2024. 1

2024

[5] [5]

Deepfake-eval-2024: A multi-modal in-the-wild benchmark of deepfakes circulated in 2024

Nuria Alina Chandra, Hannah Lee, Ryan Murtfeldt, Lin Qiu, Arnab Karmakar, Emmanuel Tanumihardja, Kevin Farhat, Ben Caffee, Changyeon Lee, Jongwook Choi, Sejin Paik, Aerin Kim, and Oren Etzioni. Deepfake-eval-2024: A multi-modal in-the-wild benchmark of deepfakes circulated in 2024. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Re...

2024

[6] [6]

How far are ai-generated videos from simulating the 3d visual world: A learned 3d evaluation approach, 2025

Chirui Chang, Jiahui Liu, Zhengzhe Liu, Xiaoyang Lyu, Yi- Hua Huang, Xin Tao, Pengfei Wan, Di Zhang, and Xiaojuan Qi. How far are ai-generated videos from simulating the 3d visual world: A learned 3d evaluation approach, 2025. 27

2025

[7] [7]

Videocrafter2: Overcoming data limitations for high-quality video diffu- sion models

Haoxin Chen, Yong Zhang, Xiaodong Cun, Menghan Xia, Xintao Wang, Chao Weng, and Ying Shan. Videocrafter2: Overcoming data limitations for high-quality video diffu- sion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7310–7320, 2024. 1, 2

2024

[8] [8]

Demamba: Ai- generated video detection on million-scale genvideo bench- mark.Science China Information Sciences, 69(6):162103,

Haoxing Chen, Yan Hong, Zizheng Huang, Zhuoer Xu, Zhangxuan Gu, Yaohui Li, Jun Lan, Huijia Zhu, Jianfu Zhang, Weiqiang Wang, and Huaxiong Li. Demamba: Ai- generated video detection on million-scale genvideo bench- mark.Science China Information Sciences, 69(6):162103,

[9] [9]

Seeing what matters: Generalizable ai-generated video detection with forensic-oriented augmentation

Riccardo Corvi, Davide Cozzolino, Ekta Prashnani, Shalini De Mello, Koki Nagano, and Luisa Verdoliva. Seeing what matters: Generalizable ai-generated video detection with forensic-oriented augmentation. InAdvances in Neural In- formation Processing Systems, pages 26418–26446. Curran Associates, Inc., 2025. 2, 6, 7, 18, 26, 27

2025

[10] [10]

You can ground earlier than see: An effective and efficient pipeline for temporal sentence grounding in compressed videos

Xiang Fang, Daizong Liu, Pan Zhou, and Guoshun Nan. You can ground earlier than see: An effective and efficient pipeline for temporal sentence grounding in compressed videos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2448–2460, 2023. 3

2023

[11] [11]

Leveraging fre- quency analysis for deep fake image recognition

Joel Frank, Thorsten Eisenhofer, Lea Sch ¨onherr, Asja Fis- cher, Dorothea Kolossa, and Thorsten Holz. Leveraging fre- quency analysis for deep fake image recognition. InPro- ceedings of the 37th International Conference on Machine Learning, pages 3247–3258. PMLR, 2020. 4, 22

2020

[12] [12]

On the content bias in frechet video distance

Songwei Ge, Aniruddha Mahapatra, Gaurav Parmar, Jun- Yan Zhu, and Jia-Bin Huang. On the content bias in frechet video distance. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7277–7288, 2024. 3

2024

[13] [13]

Exposing ai-generated videos: A benchmark dataset and a local-and-global temporal defect based detec- tion method, 2024

Peisong He, Leyao Zhu, Jiaxing Li, Shiqi Wang, and Hao- liang Li. Exposing ai-generated videos: A benchmark dataset and a local-and-global temporal defect based detec- tion method, 2024. 27

2024

[14] [14]

Gans trained by a two time-scale update rule converge to a local nash equilib- rium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilib- rium. InAdvances in Neural Information Processing Sys- tems. Curran Associates, Inc., 2017. 3

2017

[15] [15]

Cogvideo: Large-scale pretraining for text-to-video generation via transformers, 2022

Wenyi Hong, Ming Ding, Wendi Zheng, Xinghan Liu, and Jie Tang. Cogvideo: Large-scale pretraining for text-to-video generation via transformers, 2022. 1, 2

2022

[16] [16]

Vbench: Com- prehensive benchmark suite for video generative models

Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, Limin Wang, Dahua Lin, Yu Qiao, and Ziwei Liu. Vbench: Com- prehensive benchmark suite for video generative models. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Reco...

[17] [17]

Ai-generated video detection via perceptual straightening

Christian Intern `o, Robert Geirhos, Markus Olhofer, Sunny Liu, Barbara Hammer, and David Klindt. Ai-generated video detection via perceptual straightening. InAdvances in Neural Information Processing Systems, pages 20672–20705. Cur- ran Associates, Inc., 2025. 2, 3, 4, 5, 6, 7, 8, 18, 27, 29

2025

[18] [18]

Text2video-zero: Text-to- image diffusion models are zero-shot video generators, 2023

Levon Khachatryan, Andranik Movsisyan, Vahram Tade- vosyan, Roberto Henschel, Zhangyang Wang, Shant Navasardyan, and Humphrey Shi. Text2video-zero: Text-to- image diffusion models are zero-shot video generators, 2023. 1, 2

2023

[19] [19]

Roy-Chowdhury

Rohit Kundu, Hao Xiong, Vishal Mohanty, Athula Balachan- dran, and Amit K. Roy-Chowdhury. Towards a universal synthetic video detector: From face or background manip- ulations to fully ai-generated content. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 28050–28060, 2025. 27

2025

[20] [20]

Preserving forgery artifacts: Ai- generated video detection at native scale, 2026

Zhengcen Li, Chenyang Jiang, Hang Zhao, Shiyang Zhou, Yunyang Mo, Feng Gao, Fan Yang, Qiben Shan, Shaocong Wu, and Jingyong Su. Preserving forgery artifacts: Ai- generated video detection at native scale, 2026. 27

2026

[21] [21]

Fr ´echet video motion distance: A metric for evaluating motion consistency in videos, 2024

Jiahe Liu, Youran Qu, Qi Yan, Xiaohui Zeng, Lele Wang, and Renjie Liao. Fr ´echet video motion distance: A metric for evaluating motion consistency in videos, 2024. 2, 3, 6, 7, 26, 27 9

2024

[22] [22]

Detecting ai-generated video via frame consistency

Long Ma, Zhiyuan Yan, Qinglang Guo, Yong Liao, Haiyang Yu, and Pengyuan Zhou. Detecting ai-generated video via frame consistency. In2025 IEEE International Conference on Multimedia and Expo (ICME), pages 1–6, 2025. 3, 8, 27

2025

[23] [23]

Your one-stop solution for ai-generated video detection, 2026

Long Ma, Zihao Xue, Yan Wang, Zhiyuan Yan, Jin Xu, Xi- aorui Jiang, Haiyang Yu, Yong Liao, and Zhen Bi. Your one-stop solution for ai-generated video detection, 2026. 1, 2, 3, 8, 26, 27, 29

2026

[24] [24]

William J. McGill. Multivariate information transmission. Psychometrika, 19(2):97–116, 1954. 24

1954

[25] [25]

Video diffusion models: A survey, 2024

Andrew Melnik, Michal Ljubljanac, Cong Lu, Qi Yan, Weiming Ren, and Helge Ritter. Video diffusion models: A survey, 2024. 1

2024

[26] [26]

Genvidbench: A challenging benchmark for detecting ai-generated video, 2025

Zhenliang Ni, Qiangyu Yan, Mouxiao Huang, Tianning Yuan, Yehui Tang, Hailin Hu, Xinghao Chen, and Yunhe Wang. Genvidbench: A challenging benchmark for detecting ai-generated video, 2025. 1, 5, 27

2025

[27] [27]

Vidguard-r1: Ai-generated video detection and explanation via reasoning mllms and rl, 2026

Kyoungjun Park, Yifan Yang, Juheon Yi, Shicheng Zheng, Yifei Shen, Dongqi Han, Caihua Shan, Muhammad Muaz, and Lili Qiu. Vidguard-r1: Ai-generated video detection and explanation via reasoning mllms and rl, 2026. 3, 8, 27

2026

[28] [28]

Pika 2.0: Integrating custom characters and scenes in ai video generation.https://venturebeat

Pika Labs. Pika 2.0: Integrating custom characters and scenes in ai video generation.https://venturebeat. com / ai / pika - 2 - 0 - launches - in - wake - of - sora - integrating - your - own - characters - objects-scenes-in-new-ai-videos/, 2024. Ac- cessed: 2025-06-07. 2

2024

[29] [29]

On the spectral bias of neural networks

Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred Hamprecht, Yoshua Bengio, and Aaron Courville. On the spectral bias of neural networks. InProceedings of the 36th International Conference on Ma- chine Learning, pages 5301–5310. PMLR, 2019. 4, 21

2019

[30] [30]

AI Deepfakes Are Getting Weirder and Harder to Spot in the Midterms.The Wall Street Journal,

Amrith Ramkumar. AI Deepfakes Are Getting Weirder and Harder to Spot in the Midterms.The Wall Street Journal,

[31] [31]

Richardson.The H.264 Advanced Video Compression Standard

Iain E. Richardson.The H.264 Advanced Video Compression Standard. Wiley Publishing, 2nd edition, 2010. 3, 4

2010

[32] [32]

Dmc-net: Generating discriminative motion cues for fast compressed video action recognition

Zheng Shou, Xudong Lin, Yannis Kalantidis, Laura Sevilla- Lara, Marcus Rohrbach, Shih-Fu Chang, and Zhicheng Yan. Dmc-net: Generating discriminative motion cues for fast compressed video action recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 3

2019

[33] [33]

Solana-Cipres, G

C. Solana-Cipres, G. Fernandez-Escribano, L. Rodriguez- Benitez, J. Moreno-Garcia, and L. Jimenez-Linares. Real- time moving object segmentation in h.264 compressed do- main based on approximate reasoning.International Journal of Approximate Reasoning, 51(1):99–114, 2009. 3

2009

[34] [34]

Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand

Gary J. Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand. Overview of the high efficiency video coding (hevc) standard.IEEE Transactions on Circuits and Systems for Video Technology, 22(12):1649–1668, 2012. 3, 4, 5

2012

[35] [35]

Videoveritas: Ai-generated video detection via perception pretext reinforcement learning, 2026

Hao Tan, Jun Lan, Senyuan Shi, Zichang Tan, Zijian Yu, Huijia Zhu, Weiqiang Wang, Jun Wan, and Zhen Lei. Videoveritas: Ai-generated video detection via perception pretext reinforcement learning, 2026. 27

2026

[36] [36]

Raft: Recurrent all-pairs field transforms for optical flow

Zachary Teed and Jia Deng. Raft: Recurrent all-pairs field transforms for optical flow. InComputer Vision – ECCV 2020, pages 402–419, Cham, 2020. Springer International Publishing. 2, 6, 7

2020

[37] [37]

To- wards accurate generative models of video: A new metric & challenges, 2019

Thomas Unterthiner, Sjoerd van Steenkiste, Karol Kurach, Raphael Marinier, Marcin Michalski, and Sylvain Gelly. To- wards accurate generative models of video: A new metric & challenges, 2019. 3

2019

[38] [38]

Cmta: Leveraging cross-modal tem- poral artifacts for generalizable ai-generated video detection,

Hang Wang, Chao Shen, Chenhao Lin, Minghui Yang, Lei Zhang, and Cong Wang. Cmta: Leveraging cross-modal tem- poral artifacts for generalizable ai-generated video detection,

[39] [39]

Atss: Detecting ai-generated videos via anomalous temporal self- similarity, 2026

Hang Wang, Chao Shen, Lei Zhang, and Zhi-Qi Cheng. Atss: Detecting ai-generated videos via anomalous temporal self- similarity, 2026. 3, 8, 27

2026

[40] [40]

Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A. Efros. Cnn-generated images are surprisingly easy to spot... for now. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 4, 22

2020

[41] [41]

Vidprom: A million-scale real prompt-gallery dataset for text-to-video diffusion models

Wenhao Wang and Yi Yang. Vidprom: A million-scale real prompt-gallery dataset for text-to-video diffusion models. Thirty-eighth Conference on Neural Information Processing Systems, 2024. 3

2024

[42] [42]

Swap attention in spa- tiotemporal diffusions for text-to-video generation, 2024

Wenjing Wang, Huan Yang, Zixi Tuo, Huiguo He, Junchen Zhu, Jianlong Fu, and Jiaying Liu. Swap attention in spa- tiotemporal diffusions for text-to-video generation, 2024. 3

2024

[43] [43]

Compression-aware video super- resolution

Yingwei Wang, Takashi Isobe, Xu Jia, Xin Tao, Huchuan Lu, and Yu-Wing Tai. Compression-aware video super- resolution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2012–2021, 2023. 3

2012

[44] [44]

Robustsora: De- watermarked benchmark for robust ai-generated video detec- tion, 2026

Zhuo Wang, Xiliang Liu, and Ligang Sun. Robustsora: De- watermarked benchmark for robust ai-generated video detec- tion, 2026. 2, 26, 27

2026

[45] [45]

P. Welch. The use of fast fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms.IEEE Transactions on Audio and Electroacoustics, 15(2):70–73, 1967. 5

1967

[46] [46]

Wiegand, G.J

T. Wiegand, G.J. Sullivan, G. Bjontegaard, and A. Luthra. Overview of the h.264/avc video coding standard.IEEE Transactions on Circuits and Systems for Video Technology, 13(7):560–576, 2003. 4, 23

2003

[47] [47]

Manmatha, Alexander J

Chao-Yuan Wu, Manzil Zaheer, Hexiang Hu, R. Manmatha, Alexander J. Smola, and Philipp Kr ¨ahenb¨uhl. Compressed video action recognition. InProceedings of the IEEE Confer- ence on Computer Vision and Pattern Recognition (CVPR),

[48] [48]

Physics-driven spatiotemporal modeling for ai-generated video detection

Shuhai Zhang, ZiHao Lian, Jiahao Yang, Daiyuan Li, Guox- uan Pang, Feng Liu, Bo Han, Shutao Li, and Mingkui Tan. Physics-driven spatiotemporal modeling for ai-generated video detection. InAdvances in Neural Information Pro- cessing Systems, pages 174683–174730. Curran Associates, Inc., 2025. 26, 27

2025

[49] [49]

D3: Training-free ai-generated video detection using 10 second-order features

Chende Zheng, Ruiqi Suo, Chenhao Lin, Zhengyu Zhao, Le Yang, Shuai Liu, Minghui Yang, Cong Wang, and Chao Shen. D3: Training-free ai-generated video detection using 10 second-order features. InProceedings of the IEEE/CVF In- ternational Conference on Computer Vision (ICCV), pages 12852–12862, 2025. 2, 3, 4, 6, 7, 17, 27

2025

[50] [50]

combined FVMD

Qiang Zhu, Jinhua Hao, Yukang Ding, Yu Liu, Qiao Mo, Ming Sun, Chao Zhou, and Shuyuan Zhu. Cpga: Cod- ing priors-guided aggregation network for compressed video quality enhancement. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2964–2974, 2024. 3 11 Supplementary Material This supplementary material is g...

work page arXiv 2024