pith. machine review for the scientific record. sign in

arxiv: 2605.14091 · v1 · submitted 2026-05-13 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

Venus-DeFakerOne: Unified Fake Image Detection & Localization

Authors on Pith no claims yet

Pith reviewed 2026-05-15 05:25 UTC · model grok-4.3

classification 💻 cs.CV
keywords fake image detectionforgery localizationunified foundation modeldeepfake detectionAIGC detectioncross-domain artifactsInternVL2SAM2
0
0 comments X

The pith

DeFakerOne integrates InternVL2 and SAM2 into one model that detects and localizes image forgeries across many generation types at once.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Modern generative tools now produce forgeries that blend document edits, natural-image changes, deepfakes, and full AI synthesis, yet most detectors remain locked to one forgery family. DeFakerOne tackles the resulting mismatch by building a single foundation model that performs both whole-image detection and pixel-level localization. It does so by joining InternVL2 for semantic feature extraction with SAM2 for precise mask prediction, then training the pair under fine-grained supervision across many domains. Experiments show the combined system beats prior methods on 39 detection benchmarks and 9 localization benchmarks while holding up under real-world noise and newer generators. The work also maps how data volume, artifact overlap, and supervision granularity affect unified performance.

Core claim

DeFakerOne, formed by integrating InternVL2 and SAM2 with fine-grained supervision, supplies a unified foundation model that jointly detects image-level forgeries and localizes them at the pixel level, surpassing prior specialized methods on 39 detection benchmarks and 9 localization benchmarks while remaining robust to perturbations and advanced generators such as GPT-Image-2.

What carries the argument

DeFakerOne, the integration of InternVL2 for high-level vision-language features and SAM2 for segmentation masks, trained with fine-grained labels to model cross-domain artifact transfer and interference patterns.

If this is right

  • One model can replace the current set of domain-specific detectors for document, deepfake, and AIGC forgeries.
  • Scaling training data while preserving original-resolution artifacts improves both detection accuracy and localization precision.
  • Fine-grained supervision is required to disentangle interfering artifacts from different forgery sources.
  • The same architecture shows robustness against perturbations and against generators not seen during training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same integration pattern could be tested on video sequences by adding temporal consistency constraints to the segmentation branch.
  • The observed artifact-transfer patterns suggest a way to curate synthetic training sets that deliberately mix forgery types to improve generalization.
  • Content platforms could use the localization output to route suspicious regions to human reviewers rather than discarding entire images.

Load-bearing premise

That combining InternVL2 and SAM2 under fine-grained supervision is enough to learn forgery patterns that transfer across domains without needing separate models for each forgery type.

What would settle it

A new generator that creates realistic forgeries whose artifact patterns lie outside the current training distribution yet still produce images the model consistently fails to flag or localize on held-out tests.

read the original abstract

In recent years, the rapid evolution of generative AI has fundamentally reshaped the paradigm of image forgery, breaking the traditional boundaries between document editing, natural image manipulation, DeepFake generation, and full-image AIGC synthesis. Despite this shift toward unified forgery generation, existing research in Fake Image Detection and Localization (FIDL) remains fragmented. This creates a mismatch between increasingly unified forgery generation mechanisms and the domain-specific detection paradigm. Bridging this mismatch poses two key challenges for FIDL: understanding cross-domain artifacts transfer and interference, and building a high-capacity unified foundation model for joint detection and localization. To address these challenges, we propose DeFakerOne, a data-centric, unified FIDL foundation model integrating InternVL2 and SAM2. DeFakerOne enables simultaneous image-level detection and pixel-level forgery localization across diverse scenarios. Extensive experiments demonstrate that DeFakerOne achieves state-of-the-art performance, outperforming baselines on 39 forgery detection benchmarks and 9 localization benchmarks. Furthermore, the model exhibits superior robustness against real-world perturbations and state-of-the-art generators such as GPT-Image-2. Finally, we provide a systematic analysis of data scaling laws, cross-domain artifacts transfer-interference patterns, the necessity of fine-grained supervision, and the original resolution artifacts preservation, highlighting the design principles for scalable, robust, and unified FIDL.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents DeFakerOne, a data-centric unified foundation model for Fake Image Detection and Localization (FIDL) that integrates InternVL2 and SAM2 with fine-grained supervision. It claims to simultaneously perform image-level detection and pixel-level localization across diverse forgery types, achieving SOTA results by outperforming baselines on 39 detection benchmarks and 9 localization benchmarks, while also demonstrating robustness to perturbations and new generators such as GPT-Image-2, and providing analyses of data scaling laws, cross-domain artifact transfer-interference, and the role of fine-grained supervision.

Significance. If the performance claims and analyses are substantiated with rigorous evidence, the work would be significant for computer vision by addressing the mismatch between unified generative forgery mechanisms and prior domain-specific FIDL methods, potentially providing a scalable foundation model that captures cross-domain artifacts and sets new standards for joint detection-localization tasks.

major comments (2)
  1. [Abstract] Abstract: The assertion that DeFakerOne 'achieves state-of-the-art performance, outperforming baselines on 39 forgery detection benchmarks and 9 localization benchmarks' supplies no quantitative metrics, error bars, dataset details, or references to specific tables/figures, leaving the central empirical claims unsupported by visible evidence.
  2. [Abstract] Abstract / Experiments (implied): No ablation studies are described that isolate the contribution of the proposed unification, joint architecture, or fine-grained supervision from the scale and pre-training of the frozen InternVL2+SAM2 base models; without this, gains cannot be attributed to the data-centric design rather than parameter count.
minor comments (1)
  1. [Abstract] Title vs. Abstract: The title uses 'Venus-DeFakerOne' while the body refers only to 'DeFakerOne'; standardize the model name for consistency.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which helps clarify the presentation of our empirical claims. We have revised the manuscript to address both major comments by strengthening the abstract and adding explicit ablation studies.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The assertion that DeFakerOne 'achieves state-of-the-art performance, outperforming baselines on 39 forgery detection benchmarks and 9 localization benchmarks' supplies no quantitative metrics, error bars, dataset details, or references to specific tables/figures, leaving the central empirical claims unsupported by visible evidence.

    Authors: We agree that the abstract would benefit from explicit pointers to the supporting evidence. In the revised version, we have updated the abstract to reference the primary results tables (Tables 1-2 for the 39 detection benchmarks and Tables 3-4 for the 9 localization benchmarks), where the quantitative metrics, dataset details, and baseline comparisons are reported. Error bars from repeated runs on key benchmarks are included in the supplementary material. This ensures the central claims are directly supported by visible evidence in the manuscript. revision: yes

  2. Referee: [Abstract] Abstract / Experiments (implied): No ablation studies are described that isolate the contribution of the proposed unification, joint architecture, or fine-grained supervision from the scale and pre-training of the frozen InternVL2+SAM2 base models; without this, gains cannot be attributed to the data-centric design rather than parameter count.

    Authors: We acknowledge this valid point regarding attribution. Although the base models are frozen, we have added a dedicated ablation study in the revised manuscript (new Section 4.3 and Table 5) that compares the full DeFakerOne model against variants using only the frozen InternVL2+SAM2 backbones without our unified training data or fine-grained supervision. These results demonstrate that the performance improvements stem from the data-centric unification and supervision strategy rather than base model scale alone. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical SOTA claims rest on external benchmarks

full rationale

The paper proposes DeFakerOne by integrating existing models (InternVL2 and SAM2) with fine-grained supervision and reports performance on 39 detection and 9 localization benchmarks. No equations, derivations, fitted parameters renamed as predictions, or self-citation load-bearing steps appear. Central claims are validated against independent external benchmarks rather than reducing to inputs by construction. This is a standard empirical ML paper with self-contained experimental validation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the approach relies on standard integration of two pre-trained foundation models plus fine-grained supervision whose details are not provided.

pith-pipeline@v0.9.0 · 5529 in / 1006 out tokens · 38291 ms · 2026-05-15T05:25:41.568710+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

240 extracted references · 240 canonical work pages · 12 internal anchors

  1. [1]

    Findings of the Association for Computational Linguistics: EMNLP 2025 , pages=

    TABDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data , author=. Findings of the Association for Computational Linguistics: EMNLP 2025 , pages=

  2. [2]

    Ivy-Fake: A Unified Explainable Framework and Benchmark for Image and Video AIGC Detection

    Ivy-Fake: A unified explainable framework and benchmark for image and video aigc detection , author=. arXiv preprint arXiv:2506.00979 , year=

  3. [3]

    Fake-HR1: Rethinking Reasoning of Vision Language Model for Synthetic Image Detection , year=

    Jiang, Changjiang and Sha, Xinkuan and Yu, Fengchang and Liu, Jingjing and Liu, Jian and Fang, Mingqi and Zhang, Chenfeng and Lu, Wei , booktitle=. Fake-HR1: Rethinking Reasoning of Vision Language Model for Synthetic Image Detection , year=

  4. [4]

    The Fourteenth International Conference on Learning Representations , year=

    SpineBench: A Clinically Salient, Level-Aware Benchmark Powered by the SpineMed-450k Corpus , author=. The Fourteenth International Conference on Learning Representations , year=

  5. [5]

    ICML , year=

    Effort: Efficient Orthogonal Modeling for Generalizable AI-Generated Image Detection , author=. ICML , year=

  6. [6]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Mesoscopic insights: Orchestrating multi-scale & hybrid architecture for image manipulation localization , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  7. [7]

    arXiv preprint arXiv:2505.11003 , year=

    Forensichub: A unified benchmark & codebase for all-domain fake image detection and localization , author=. arXiv preprint arXiv:2505.11003 , year=

  8. [8]

    NeurIPS , year=

    Spot the fake: Large multimodal model-based synthetic image detection with artifact explanation , author=. NeurIPS , year=

  9. [9]

    ICLR , year=

    LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models , author=. ICLR , year=

  10. [10]

    arXiv preprint arXiv:2310.17419 , year=

    Antifakeprompt: Prompt-tuned vision-language models are fake image detectors , author=. arXiv preprint arXiv:2310.17419 , year=

  11. [11]

    FakeBench: Probing Explainable Fake Image Detection via Large Multimodal Models , year=

    Li, Yixuan and Liu, Xuelin and Wang, Xiaoyang and Lee, Bu Sung and Wang, Shiqi and Rocha, Anderson and Lin, Weisi , journal=. FakeBench: Probing Explainable Fake Image Detection via Large Multimodal Models , year=

  12. [12]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Aigi-holmes: Towards explainable and generalizable ai-generated image detection via multimodal large language models , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  13. [13]

    arXiv preprint arXiv:2509.25502 , year=

    Seeing before reasoning: A unified framework for generalizable and explainable fake image detection , author=. arXiv preprint arXiv:2509.25502 , year=

  14. [14]

    DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning

    Deepeyes: Incentivizing" thinking with images" via reinforcement learning , author=. arXiv preprint arXiv:2505.14362 , year=

  15. [15]

    arXiv preprint arXiv:2602.12916 , year=

    Reliable Thinking with Images , author=. arXiv preprint arXiv:2602.12916 , year=

  16. [16]

    arXiv preprint arXiv:2602.11858 , year=

    Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception , author=. arXiv preprint arXiv:2602.11858 , year=

  17. [17]

    ICLR , year=

    Veritas: Generalizable deepfake detection via pattern-aware reasoning , author=. ICLR , year=

  18. [18]

    Proceedings of the 61st annual meeting of the association for computational linguistics (volume 1: Long papers) , pages=

    Diffusiondb: A large-scale prompt gallery dataset for text-to-image generative models , author=. Proceedings of the 61st annual meeting of the association for computational linguistics (volume 1: Long papers) , pages=

  19. [19]

    Breaking Latent Prior Bias in Detectors for Generalizable

    Yue Zhou and Xinan He and Kaiqing Lin and Bing Fan and Feng Ding and Bin Li , booktitle=. Breaking Latent Prior Bias in Detectors for Generalizable. 2025 , url=

  20. [20]

    arXiv preprint arXiv:2311.12397 , year=

    Patchcraft: Exploring texture patch for efficient ai-generated image detection , author=. arXiv preprint arXiv:2311.12397 , year=

  21. [21]

    Qwen3-VL Technical Report

    Qwen3-vl technical report , author=. arXiv preprint arXiv:2511.21631 , year=

  22. [22]

    2025 , howpublished =

    Open Multi-Modal AI Security Evaluation Benchmark (OpenMMSec) , author =. 2025 , howpublished =

  23. [23]

    , booktitle=

    Wang, Sheng-Yu and Wang, Oliver and Zhang, Richard and Owens, Andrew and Efros, Alexei A. , booktitle=. CNN-Generated Images Are Surprisingly Easy to Spot… for Now , year=

  24. [24]

    2026 , journal=

    AgentFoX: LLM Agent-Guided Fusion with eXplainability for AI-Generated Image Detection , author=. 2026 , journal=

  25. [25]

    ICLR , year=

    Fakeshield: Explainable image forgery detection and localization via multi-modal large language models , author=. ICLR , year=

  26. [26]

    arXiv preprint arXiv:2510.03161 , year=

    UniShield: An Adaptive Multi-Agent Framework for Unified Forgery Image Detection and Localization , author=. arXiv preprint arXiv:2510.03161 , year=

  27. [27]

    Proceedings of the International conference on machine learning, Long Beach, CA, USA , volume=

    Rethinking model scaling for convolutional neural networks , author=. Proceedings of the International conference on machine learning, Long Beach, CA, USA , volume=

  28. [28]

    Advances in neural information processing systems , volume=

    Dynamic routing between capsules , author=. Advances in neural information processing systems , volume=

  29. [29]

    ICLR , year=

    A sanity check for ai-generated image detection , author=. ICLR , year=

  30. [30]

    Leveraging frequency analysis for deep fake image recognition , year =

    Frank, Joel and Eisenhofer, Thorsten and Sch\". Leveraging frequency analysis for deep fake image recognition , year =. ICML , articleno =

  31. [31]

    Fusing Global and Local Features for Generalized AI-Synthesized Image Detection , year=

    Ju, Yan and Jia, Shan and Ke, Lipeng and Xue, Hongfei and Nagano, Koki and Lyu, Siwei , booktitle=. Fusing Global and Local Features for Generalized AI-Synthesized Image Detection , year=

  32. [32]

    Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XIV , pages =

    Liu, Bo and Yang, Fan and Bi, Xiuli and Xiao, Bin and Li, Weisheng and Gao, Xinbo , title =. Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XIV , pages =. 2022 , isbn =. doi:10.1007/978-3-031-19781-9_6 , abstract =

  33. [33]

    Learning on Gradients: Generalized Artifacts Representation for GAN-Generated Images Detection , year=

    Tan, Chuangchuang and Zhao, Yao and Wei, Shikui and Gu, Guanghua and Wei, Yunchao , booktitle=. Learning on Gradients: Generalized Artifacts Representation for GAN-Generated Images Detection , year=

  34. [34]

    Rethinking the Up-Sampling Operations in CNN-Based Generative Network for Generalizable Deepfake Detection , year=

    Tan, Chuangchuang and Liu, Huan and Zhao, Yao and Wei, Shikui and Gu, Guanghua and Liu, Ping and Wei, Yunchao , booktitle=. Rethinking the Up-Sampling Operations in CNN-Based Generative Network for Generalizable Deepfake Detection , year=

  35. [35]

    DIRE for Diffusion-Generated Image Detection , year=

    Wang, Zhendong and Bao, Jianmin and Zhou, Wengang and Wang, Weilun and Hu, Hezhen and Chen, Hong and Li, Houqiang , booktitle=. DIRE for Diffusion-Generated Image Detection , year=

  36. [36]

    arXiv preprint arXiv:2603.09242 , year=

    When Detectors Forget Forensics: Blocking Semantic Shortcuts for Generalizable AI-Generated Image Detection , author=. arXiv preprint arXiv:2603.09242 , year=

  37. [37]

    Locate-Then-Examine: Grounded Region Reasoning Improves Detection of AI-Generated Images

    Zoom-In to Sort AI-Generated Images Out , author=. arXiv preprint arXiv:2510.04225 , year=

  38. [38]

    arXiv preprint arXiv:2511.01293 , year=

    Detecting Generated Images by Fitting Natural Image Distributions , author=. arXiv preprint arXiv:2511.01293 , year=

  39. [39]

    Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =

    Fang, Mingqi and Li, Ziguang and Yu, Lingyun and Yang, Quanwei and Xie, Hongtao and Zhang, Yongdong , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =. 2025 , pages =

  40. [40]

    arXiv preprint arXiv:2505.12335 , year=

    Is artificial intelligence generated image detection a solved problem? , author=. arXiv preprint arXiv:2505.12335 , year=

  41. [41]

    arXiv e-prints , pages=

    Revisiting Reconstruction-based AI-generated Image Detection: A Geometric Perspective , author=. arXiv e-prints , pages=

  42. [42]

    arXiv preprint arXiv:2602.02222 , year=

    MIRROR: Manifold Ideal Reference ReconstructOR for Generalizable AI-Generated Image Detection , author=. arXiv preprint arXiv:2602.02222 , year=

  43. [43]

    IEEE Transactions on Information Forensics and Security , author =

    Constrained Convolutional Neural Networks: A New Approach Towards General Purpose Image Manipulation Detection , volume =. IEEE Transactions on Information Forensics and Security , author =. 2018 , month =. doi:10.1109/TIFS.2018.2825953 , abstractnote =

  44. [44]

    IEEE transactions on image processing , volume=

    Image quality assessment: from error visibility to structural similarity , author=. IEEE transactions on image processing , volume=. 2004 , publisher=

  45. [45]

    End-to-End Object Detection with Transformers , volume =

    Carion, Nicolas and Massa, Francisco and Synnaeve, Gabriel and Usunier, Nicolas and Kirillov, Alexander and Zagoruyko, Sergey , editor =. End-to-End Object Detection with Transformers , volume =. Computer Vision – ECCV 2020 , publisher =. 2020 , pages =. doi:10.1007/978-3-030-58452-8_13 , abstractnote =

  46. [46]

    CASIA Image Tampering Detection Evaluation Database , isbn =

    Dong, Jing and Wang, Wei and Tan, Tieniu , year =. CASIA Image Tampering Detection Evaluation Database , isbn =. 2013 IEEE China Summit and International Conference on Signal and Information Processing , publisher =. doi:10.1109/ChinaSIP.2013.6625374 , abstractnote =

  47. [47]

    Backpropagation applied to handwritten zip code recognition , volume =

    LeCun, Yann and Boser, Bernhard and Denker, John S and Henderson, Donnie and Howard, Richard E and Hubbard, Wayne and Jackel, Lawrence D , year =. Backpropagation applied to handwritten zip code recognition , volume =. Neural computation , publisher =

  48. [48]

    Detecting Image Splicing using Geometry Invariants and Camera Characteristics Consistency , isbn =

    Hsu, Yu-feng and Chang, Shih-fu , year =. Detecting Image Splicing using Geometry Invariants and Camera Characteristics Consistency , isbn =. 2006 IEEE International Conference on Multimedia and Expo , publisher =. doi:10.1109/ICME.2006.262447 , abstractnote =

  49. [49]

    COVERAGE — A novel database for copy-move forgery detection , isbn =

    Wen, Bihan and Zhu, Ye and Subramanian, Ramanathan and Ng, Tian-Tsong and Shen, Xuanjing and Winkler, Stefan , year =. COVERAGE — A novel database for copy-move forgery detection , isbn =. doi:10.1109/ICIP.2016.7532339 , booktitle =

  50. [50]

    Splicebuster: A new blind image splicing detector , isbn =

    Cozzolino, Davide and Poggi, Giovanni and Verdoliva, Luisa , year =. Splicebuster: A new blind image splicing detector , isbn =. 2015 IEEE International Workshop on Information Forensics and Security (WIFS) , publisher =. doi:10.1109/WIFS.2015.7368565 , abstractnote =

  51. [51]

    IEEE Transactions on Information Forensics and Security , author =

    Efficient Dense-Field Copy–Move Forgery Detection , volume =. IEEE Transactions on Information Forensics and Security , author =. 2015 , month =. doi:10.1109/TIFS.2015.2455334 , abstractnote =

  52. [52]

    IEEE Transactions on Pattern Analysis and Machine Intelligence , author =

    DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , volume =. IEEE Transactions on Pattern Analysis and Machine Intelligence , author =. 2018 , month =. doi:10.1109/TPAMI.2017.2699184 , number =

  53. [53]

    DEFACTO: Image and Face Manipulation Dataset , isbn =

    Mahfoudi, Gael and Tajini, Badr and Retraint, Florent and Morain-Nicolier, Frederic and Dugelay, Jean Luc and Pic, Marc , year =. DEFACTO: Image and Face Manipulation Dataset , isbn =. 2019 27th European Signal Processing Conference (EUSIPCO) , publisher =. doi:10.23919/EUSIPCO.2019.8903181 , abstractnote =

  54. [54]

    Understanding the Effective Receptive Field in Deep Convolutional Neural Networks , volume =

    Luo, Wenjie and Li, Yujia and Urtasun, Raquel and Zemel, Richard , editor =. Understanding the Effective Receptive Field in Deep Convolutional Neural Networks , volume =. Advances in Neural Information Processing Systems , publisher =

  55. [55]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Feature pyramid networks for object detection , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  56. [56]

    Proceedings of the AAAI Conference on Artificial Intelligence , author =

    Generate, Segment, and Refine: Towards Generic Manipulation Segmentation , volume =. Proceedings of the AAAI Conference on Artificial Intelligence , author =. 2020 , month =. doi:10.1609/aaai.v34i07.7007 , abstractnote =

  57. [57]

    Rethinking Spatial Dimensions of Vision Transformers , abstractnote =

    Heo, Byeongho and Yun, Sangdoo and Han, Dongyoon and Chun, Sanghyuk and Choe, Junsuk and Oh, Seong Joon , language =. Rethinking Spatial Dimensions of Vision Transformers , abstractnote =

  58. [58]

    , editor =

    Huh, Minyoung and Liu, Andrew and Owens, Andrew and Efros, Alexei A. , editor =. Fighting Fake News: Image Splice Detection via Learned Self-Consistency , volume =. Computer Vision – ECCV 2018 , publisher =. 2018 , pages =. doi:10.1007/978-3-030-01252-6_7 , abstractnote =

  59. [59]

    ImageNet:

    Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Kai Li and Li Fei-Fei , year =. ImageNet: A large-scale hierarchical image database , isbn =. 2009 IEEE Conference on Computer Vision and Pattern Recognition , publisher =. doi:10.1109/CVPR.2009.5206848 , abstractnote =

  60. [60]

    The Point Where Reality Meets Fantasy: Mixed Adversarial Generators for Image Splice Detection , abstractnote =

    Kniaz, Vladimir V and Knyaz, Vladimir and Remondino, Fabio , language =. The Point Where Reality Meets Fantasy: Mixed Adversarial Generators for Image Splice Detection , abstractnote =

  61. [61]

    Communications of the ACM , author =

    ImageNet classification with deep convolutional neural networks , volume =. Communications of the ACM , author =. 2017 , month =. doi:10.1145/3065386 , abstractnote =

  62. [62]

    ManTra-Net: Manipulation Tracing Network for Detection and Localization of Image Forgeries With Anomalous Features , isbn =

    Wu, Yue and others , year =. ManTra-Net: Manipulation Tracing Network for Detection and Localization of Image Forgeries With Anomalous Features , isbn =. doi:10.1109/CVPR.2019.00977 , booktitle =

  63. [63]

    Lawrence , editor =

    Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Dollár, Piotr and Zitnick, C. Lawrence , editor =. Microsoft COCO: Common Objects in Context , volume =. Computer Vision – ECCV 2014 , publisher =. 2014 , pages =. doi:10.1007/978-3-319-10602-1_48 , abstractnote =

  64. [64]

    and Delgado, Andrew and Zhou, Daniel and Kheyrkhah, Timothee and Smith, Jeff and Fiscus, Jonathan , year =

    Guan, Haiying and Kozak, Mark and Robertson, Eric and Lee, Yooyoung and Yates, Amy N. and Delgado, Andrew and Zhou, Daniel and Kheyrkhah, Timothee and Smith, Jeff and Fiscus, Jonathan , year =. MFC Datasets: Large-Scale Benchmark Datasets for Media Forensic Challenge Evaluation , isbn =. doi:10.1109/WACVW.2019.00018 , booktitle =

  65. [65]

    A deep learning approach to detection of splicing and copy-move forgeries in images , isbn =

    Rao, Yuan and Ni, Jiangqun , year =. A deep learning approach to detection of splicing and copy-move forgeries in images , isbn =. 2016 IEEE International Workshop on Information Forensics and Security (WIFS) , publisher =. doi:10.1109/WIFS.2016.7823911 , abstractnote =

  66. [66]

    Deep Residual Learning for Image Recognition , isbn =

    He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian , year =. Deep Residual Learning for Image Recognition , isbn =. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , publisher =. doi:10.1109/CVPR.2016.90 , abstractnote =

  67. [67]

    Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XXI 16 , pages =

    SPAN: Spatial pyramid attention network for image manipulation localization , author =. Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XXI 16 , pages =. 2020 , organization =

  68. [68]

    Advances in neural information processing systems , volume=

    Attention is all you need , author=. Advances in neural information processing systems , volume=

  69. [69]

    Very Deep Convolutional Networks for Large-Scale Image Recognition , url =

    Simonyan, Karen and Zisserman, Andrew , year =. Very Deep Convolutional Networks for Large-Scale Image Recognition , url =

  70. [70]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , url =

    Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil , year =. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , url =

  71. [71]

    Signal Processing: Image Communication , author =

    A deep learning approach to patch-based image inpainting forensics , volume =. Signal Processing: Image Communication , author =. 2018 , month =. doi:10.1016/j.image.2018.05.015 , abstractnote =

  72. [72]

    Constrained R-CNN: A general image manipulation detection model , booktitle =

    Yang, Chao and Li, Huizhou and Lin, Fangting and Jiang, Bin and Zhao, Hao , year =. Constrained R-CNN: A general image manipulation detection model , booktitle =

  73. [73]

    MViTv2: Improved Multiscale Vision Transformers for Classification and Detection , isbn =

    Li, Yanghao and Wu, Chao-Yuan and Fan, Haoqi and Mangalam, Karttikeya and Xiong, Bo and Malik, Jitendra and Feichtenhofer, Christoph , year =. MViTv2: Improved Multiscale Vision Transformers for Classification and Detection , isbn =. doi:10.1109/CVPR52688.2022.00476 , booktitle =

  74. [74]

    Induced and reduced unbounded operator algebras

    Yang, Chao and Wang, Zhiyu and Shen, Huawei and Li, Huizhou and Jiang, Bin , year =. Multi-Modality Image Manipulation Detection , isbn =. 2021 IEEE International Conference on Multimedia and Expo (ICME) , publisher =. doi:10.1109/ICME51207.2021.9428232 , abstractnote =

  75. [75]

    European Conference on Computer Vision , pages=

    Exploring plain vision transformer backbones for object detection , author=. European Conference on Computer Vision , pages=. 2022 , organization=

  76. [76]

    Benchmarking Detection Transfer Learning with Vision Transformers , url =

    Li, Yanghao and Xie, Saining and Chen, Xinlei and Dollar, Piotr and He, Kaiming and Girshick, Ross , year =. Benchmarking Detection Transfer Learning with Vision Transformers , url =

  77. [77]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages =

    Fully convolutional networks for semantic segmentation , author =. Proceedings of the IEEE conference on computer vision and pattern recognition , pages =

  78. [78]

    Advances in neural information processing systems , volume=

    SegFormer: Simple and efficient design for semantic segmentation with transformers , author=. Advances in neural information processing systems , volume=

  79. [79]

    1983 , publisher =

    Image Analysis and Mathematical Morphology , author =. 1983 , publisher =

  80. [80]

    SGDR: Stochastic Gradient Descent with Warm Restarts , url =

    Loshchilov, Ilya and Hutter, Frank , year =. SGDR: Stochastic Gradient Descent with Warm Restarts , url =

Showing first 80 references.