arxiv: 2604.08847 · v1 · submitted 2026-04-10 · 💻 cs.CV

Recognition: unknown

DeFakeQ: Enabling Real-Time Deepfake Detection on Edge Devices via Adaptive Bidirectional Quantization

Xiangyu Li , Yujing Sun , Yuhang Zheng , Yuexin Ma , Kwok-Yan Lam

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:36 UTC · model grok-4.3

classification 💻 cs.CV

keywords deepfake detectionmodel quantizationedge computingadaptive compressionreal-time inferencemodel compressionmedia forensicsmobile deployment

0 comments

The pith

DeFakeQ uses adaptive bidirectional quantization to put real-time deepfake detection on edge devices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DeFakeQ as the first quantization framework built specifically for deepfake detectors. Standard compression methods degrade the tiny forgery cues these models need, so the authors design an adaptive bidirectional strategy that exploits feature correlations while cutting redundancy. This keeps detection accuracy high while shrinking the model enough for mobile hardware. Experiments on five datasets with eleven backbones show it beats prior quantization baselines, and real mobile deployments confirm real-time operation in everyday use cases.

Core claim

DeFakeQ is the first quantization framework tailored for deepfake detectors, enabling real-time deployment on edge devices. Our approach introduces a novel adaptive bidirectional compression strategy that simultaneously leverages feature correlations and eliminates redundancy, achieving an effective balance between model compactness and detection performance. Extensive experiments across five benchmark datasets and eleven state-of-the-art backbone detectors demonstrate that DeFakeQ consistently surpasses existing quantization and model compression baselines. Furthermore, we deploy DefakeQ on mobile devices in real-world scenarios, demonstrating its capability for real-time deepfake detection

What carries the argument

adaptive bidirectional compression strategy that leverages feature correlations and eliminates redundancy while preserving subtle forgery artifacts

If this is right

Deepfake detectors become small enough and fast enough for on-device, real-time use in mobile apps for payments and social media.
The method avoids the accuracy collapse typical of generic quantization when applied to deepfake models.
The same framework works across eleven different backbone detectors and five benchmark datasets without retraining each one from scratch.
Edge deployment removes the need to send media to the cloud for forensic checks, lowering latency and privacy risks.
Real-world mobile tests confirm the quantized models run at usable speeds in actual user scenarios.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same bidirectional correlation-aware compression could be tried on other fine-detail tasks such as medical anomaly detection or satellite image analysis.
Widespread on-phone deepfake checks could shorten the window between fake media creation and public exposure during live events.
Task-specific quantization may become standard for any detector where small visual cues carry the signal.
Hardware co-design that matches the bidirectional pattern to particular mobile NPUs could push speeds even higher.

Load-bearing premise

The adaptive bidirectional quantization can keep the extremely subtle forgery artifacts needed for accurate deepfake detection intact even though those cues are known to be fragile under ordinary compression.

What would settle it

Measure whether a DeFakeQ-quantized detector on the FaceForensics++ dataset drops more than a few percent below its full-precision accuracy while failing to sustain at least 30 frames per second inference on a standard smartphone.

Figures

Figures reproduced from arXiv: 2604.08847 by Kwok-Yan Lam, Xiangyu Li, Yuexin Ma, Yuhang Zheng, Yujing Sun.

**Figure 2.** Figure 2: The overview of DeFakeQ. It comprises two core components: Horizontal Adaptive Quantization (HAQ) and Vertical Efficient Feature Fine-Tuning (VEFT). HAQ first calculates the importance scores of weights within each network block. Guided by a specially designed horizontal loss function Lhor, it adaptively identifies blocks rich in discriminative information and assigns them higher bit-widths, ensuring criti… view at source ↗

**Figure 3.** Figure 3: Visual Comparisons of detected salient forgeries during the quantization process. [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: Real-world detection instances of DeFakeQ deployed on an Android device. The quantized model processes in around 35 ms, and outputs the prediction result and the corresponding confidence score. Please refer to the demo video for our real time and real work depoyment on the android phone. 5.6 Limitations and Future Works Current real-world deployment is limited to mobile devices; we have not extended the va… view at source ↗

read the original abstract

Deepfake detection has become a fundamental component of modern media forensics. Despite significant progress in detection accuracy, most existing methods remain computationally intensive and parameter-heavy, limiting their deployment on resource-constrained edge devices that require real-time, on-site inference. This limitation is particularly critical in an era where mobile devices are extensively used for media-centric applications, including online payments, virtual meetings, and social networking. Meanwhile, due to the unique requirement of capturing extremely subtle forgery artifacts for deepfake detection, state-of-the-art quantization techniques usually underperform for such a challenging task. These fine-grained cues are highly sensitive to model compression and can be easily degraded during quantization, leading to noticeable performance drops. This challenge highlights the need for quantization strategies specifically designed to preserve the discriminative features essential for reliable deepfake detection. To address this gap, we propose DefakeQ, the first quantization framework tailored for deepfake detectors, enabling real-time deployment on edge devices. Our approach introduces a novel adaptive bidirectional compression strategy that simultaneously leverages feature correlations and eliminates redundancy, achieving an effective balance between model compactness and detection performance. Extensive experiments across five benchmark datasets and eleven state-of-the-art backbone detectors demonstrate that DeFakeQ consistently surpasses existing quantization and model compression baselines. Furthermore, we deploy DefakeQ on mobile devices in real-world scenarios, demonstrating its capability for real-time deepfake detection and its practical applicability in edge environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DeFakeQ gives a practical quantization tweak for deepfake detectors that targets edge hardware, but the real test is whether the numbers show it actually keeps the tiny forgery cues intact.

read the letter

The main point is that this paper introduces an adaptive bidirectional compression strategy inside a quantization framework built specifically for deepfake detectors. Standard quantization tends to wipe out the faint artifacts these models rely on, so the authors try to keep feature correlations while cutting redundancy. That focus on the deepfake task itself is what sets it apart from generic compression work. They back it with tests on five benchmark datasets and eleven different backbone detectors, plus a real mobile deployment showing real-time inference. Those steps show they thought about practical use cases like on-device forensics. The soft spot is that the abstract gives no concrete accuracy figures, ablation results, or error breakdowns, so it is hard to judge how much the bidirectional part actually helps versus standard methods or how small the performance drop really is. If the full paper has clear tables and controls, that would strengthen the case; without them the central claim stays harder to evaluate. This is aimed at people working on efficient computer vision for security or media verification who need models that run on phones without losing reliability. It is worth sending for peer review because the problem is concrete, the experiments span multiple setups, and they include hardware validation, even if the gains turn out to be incremental.

Referee Report

2 major / 1 minor

Summary. The paper proposes DeFakeQ, the first quantization framework tailored for deepfake detectors. It introduces an adaptive bidirectional compression strategy that leverages feature correlations while eliminating redundancy to preserve subtle forgery artifacts, enabling real-time inference on edge devices. The authors claim consistent outperformance over existing quantization and compression baselines across five benchmark datasets and eleven state-of-the-art backbone detectors, with successful real-world deployment on mobile devices demonstrating practical applicability.

Significance. If the empirical claims hold, this work would be significant for practical media forensics by addressing the deployment barrier of computationally heavy deepfake detectors on resource-constrained devices. The focus on preserving fine-grained, quantization-sensitive cues through a specialized bidirectional approach could advance edge-based detection in applications like online payments and social media, provided the method demonstrably avoids the performance drops typical of standard compression techniques.

major comments (2)

[Abstract] Abstract: The central claim that DeFakeQ 'consistently surpasses existing quantization and model compression baselines' and enables 'real-time deepfake detection' on mobile devices is asserted without any quantitative metrics, accuracy values, compression ratios, inference latency figures, or error analysis. This absence leaves the outperformance and deployment success without visible empirical support, directly undermining evaluation of the balance between compactness and detection performance.
[Abstract] Abstract/Methods (implied): The novel 'adaptive bidirectional compression strategy' is described as simultaneously leveraging feature correlations and eliminating redundancy to protect 'extremely subtle forgery artifacts' that are 'highly sensitive to model compression.' However, no details are provided on the mechanism (e.g., how bidirectionality is implemented, what correlations are used, or ablation on artifact preservation), making it impossible to assess whether it addresses the noted sensitivity without degradation.

minor comments (1)

[Abstract] The abstract would benefit from a brief mention of the specific datasets and backbones used, as well as at least one key quantitative result (e.g., accuracy retention at a given compression level) to strengthen the summary of contributions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We agree that the abstract can be strengthened with additional quantitative support and a clearer high-level description of the method. We have revised the abstract accordingly while preserving its brevity, with full technical details and results remaining in the body of the paper.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that DeFakeQ 'consistently surpasses existing quantization and model compression baselines' and enables 'real-time deepfake detection' on mobile devices is asserted without any quantitative metrics, accuracy values, compression ratios, inference latency figures, or error analysis. This absence leaves the outperformance and deployment success without visible empirical support, directly undermining evaluation of the balance between compactness and detection performance.

Authors: We agree that including key quantitative metrics would make the abstract more informative. In the revised version, we have incorporated representative results from our experiments across the five datasets and eleven backbones, such as average accuracy gains over baselines, typical compression ratios achieved, and measured inference latencies on mobile devices that confirm real-time performance. A short reference to the error analysis is also included. These figures are directly drawn from the detailed tables and figures in Sections 4 and 5. revision: yes
Referee: [Abstract] Abstract/Methods (implied): The novel 'adaptive bidirectional compression strategy' is described as simultaneously leveraging feature correlations and eliminating redundancy to protect 'extremely subtle forgery artifacts' that are 'highly sensitive to model compression.' However, no details are provided on the mechanism (e.g., how bidirectionality is implemented, what correlations are used, or ablation on artifact preservation), making it impossible to assess whether it addresses the noted sensitivity without degradation.

Authors: The full mechanism, including the bidirectional quantization procedure, the specific correlation metrics employed, and the ablation studies on forgery artifact preservation, is presented in Section 3. To address the abstract-level concern, we have added a concise clause describing the core idea of the adaptive bidirectional approach and its focus on correlation-aware retention of subtle features. This provides readers with an immediate sense of the method while directing them to the detailed exposition and ablations in the methodology section. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper proposes an empirical engineering framework (DeFakeQ) for quantizing deepfake detectors via adaptive bidirectional compression, validated through experiments on five datasets and eleven backbones plus real edge deployment. No derivation chain, equations, fitted parameters renamed as predictions, or self-citation load-bearing premises appear in the provided text. The central claims rest on experimental results rather than any mathematical reduction to inputs by construction, rendering the work self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that deepfake detection depends on preserving subtle forgery artifacts that standard quantization destroys; no free parameters, invented entities, or additional axioms beyond standard deep-learning assumptions are specified in the abstract.

axioms (1)

domain assumption Fine-grained forgery artifacts in deepfake detection are highly sensitive to model compression and can be easily degraded during quantization.
This premise is explicitly stated as the reason existing quantization techniques underperform and motivates the need for a tailored strategy.

pith-pipeline@v0.9.0 · 5566 in / 1250 out tokens · 66142 ms · 2026-05-10T16:36:38.248889+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

11 extracted references · 10 canonical work pages · 4 internal anchors

[1]

A white paper on neural network quantization,

Markus Nagel, Marios Fournarakis, Rana Ali Amjad, Yelysei Bondarenko, Mart van Baalen, and Tijmen Blankevoort. A white paper on neural network quantization. arxiv 2021.arXiv preprint arXiv:2106.08295, 4,

work page arXiv 2021
[2]

Exposing DeepFake Videos By Detecting Face Warping Artifacts

Liang Chen, Yong Zhang, Yibing Song, Lingqiao Liu, and Jue Wang. Self-supervised learning of adversarial example: Towards good generalizations for deepfake detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18710–18719, 2022a. Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang Wen, and Baining...

work page Pith review arXiv
[3]

Learning to disentangle gan fingerprint for fake image attribution.arXiv preprint arXiv:2106.08749, 2021

Tianyun Yang, Juan Cao, Qiang Sheng, Lei Li, Jiaqi Ji, Xirong Li, and Sheng Tang. Learning to disentangle gan fingerprint for fake image attribution.arXiv preprint arXiv:2106.08749, 2021b. Hanqing Zhao, Wenbo Zhou, Dongdong Chen, Tianyi Wei, Weiming Zhang, and Nenghai Yu. Multi-attentional deepfake detection. InProceedings of the IEEE/CVF conference on co...

work page arXiv
[4]

Data-independent operator: A training-free artifact representation extractor for generalizable deepfake detection.arXiv preprint arXiv:2403.06803, 2024a

Chuangchuang Tan, Ping Liu, RenShuai Tao, Huan Liu, Yao Zhao, Baoyuan Wu, and Yunchao Wei. Data-independent operator: A training-free artifact representation extractor for generalizable deepfake detection.arXiv preprint arXiv:2403.06803, 2024a. Hongrui Zheng, Yuezun Li, Liejun Wang, Yunfeng Diao, and Zhiqing Guo. Boosting active defense persistence: A two...

work page arXiv
[5]

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Elias Frantar, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh. Gptq: Accurate post-training quantization for generative pre-trained transformers.arXiv preprint arXiv:2210.17323,

work page internal anchor Pith review arXiv
[6]

Billm: Pushing the limit of post-training quantization for llms.arXiv preprint arXiv:2402.04291, 2024

Wei Huang, Yangdong Liu, Haotong Qin, Ying Li, Shiming Zhang, Xianglong Liu, Michele Magno, and Xiaojuan Qi. Billm: Pushing the limit of post-training quantization for llms.arXiv preprint arXiv:2402.04291,

work page arXiv
[7]

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Hong-Shuo Chen, Shuowen Hu, Suya You, C-C Jay Kuo, et al. Defakehop++: An enhanced lightweight deepfake detector.APSIPA Transactions on Signal and Information Processing, 11(2), 2022b. Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding.arXiv preprint arXiv:1510.00149,

work page internal anchor Pith review arXiv
[8]

A simple and effective l_2 norm-based strategy for kv cache compression

Alessio Devoto, Yu Zhao, Simone Scardapane, and Pasquale Minervini. A simple and effective l_2 norm-based strategy for kv cache compression. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 18476–18499,

2024
[9]

arXiv preprint arXiv:1910.08854 , year=

Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. Celeb-df: A large-scale challenging dataset for deepfake forensics. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3207–3216, 2020b. Brian Dolhansky, Russ Howes, Ben Pflaum, Nicole Baram, and Cristian Canton Ferrer. The deepfake detection challenge (dfdc) ...

work page arXiv 1910
[10]

The DeepFake Detection Challenge (DFDC) Dataset

Brian Dolhansky, Joanna Bitton, Ben Pflaum, Jikuo Lu, Russ Howes, Menglin Wang, and Cristian Canton Ferrer. The deepfake detection challenge (dfdc) dataset.arXiv preprint arXiv:2006.07397,

work page internal anchor Pith review arXiv 2006
[11]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929,

work page internal anchor Pith review Pith/arXiv arXiv 2010