TALL: Thumbnail Layout for Deepfake Video Detection

Gengyun Jia; Jian Liang; Ran He; Yanhao Zhang; Yuting Xu; Ziming Yang

arxiv: 2307.07494 · v3 · pith:A3UXCTSJnew · submitted 2023-07-14 · 💻 cs.CV

TALL: Thumbnail Layout for Deepfake Video Detection

Yuting Xu , Jian Liang , Gengyun Jia , Ziming Yang , Yanhao Zhang , Ran He This is my paper

classification 💻 cs.CV

keywords talllayoutvideotall-swinthumbnailcodecross-datasetdeepfake

0 comments

read the original abstract

The growing threats of deepfakes to society and cybersecurity have raised enormous public concerns, and increasing efforts have been devoted to this critical topic of deepfake video detection. Existing video methods achieve good performance but are computationally intensive. This paper introduces a simple yet effective strategy named Thumbnail Layout (TALL), which transforms a video clip into a pre-defined layout to realize the preservation of spatial and temporal dependencies. Specifically, consecutive frames are masked in a fixed position in each frame to improve generalization, then resized to sub-images and rearranged into a pre-defined layout as the thumbnail. TALL is model-agnostic and extremely simple by only modifying a few lines of code. Inspired by the success of vision transformers, we incorporate TALL into Swin Transformer, forming an efficient and effective method TALL-Swin. Extensive experiments on intra-dataset and cross-dataset validate the validity and superiority of TALL and SOTA TALL-Swin. TALL-Swin achieves 90.79$\%$ AUC on the challenging cross-dataset task, FaceForensics++ $\to$ Celeb-DF. The code is available at https://github.com/rainy-xu/TALL4Deepfake.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

CAM-VFD: Cross-Attention Multimodal Video Forgery Detection
cs.CV 2026-05 unverdicted novelty 6.0

CAM-VFD detects video forgeries by using cross-attention to identify contradictions between CLIP appearance, VideoMAE motion, and MiDaS depth features.
DVAR: Adversarial Multi-Agent Debate for Video Authenticity Detection
cs.CV 2026-04 unverdicted novelty 6.0

DVAR turns video authenticity detection into an iterative debate between a generative hypothesis agent and a natural mechanism agent, resolved via minimum description length and a knowledge base for better generalizat...