arxiv: 2605.09089 · v1 · submitted 2026-05-09 · 💻 cs.CV · cs.AI

Recognition: 1 theorem link

· Lean Theorem

Field-Localized Forgery Detection for Digital Identity Documents

Abhishek Kumar , Riya Tapwal , Carsten Maple , Mark Hooper

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:25 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords forgery detectionidentity documentsfield localizationlightweight neural networkdigital forensicsface and text fieldsmanipulation detectionAUC EER

0 comments

The pith

Targeting face and text fields in identity documents detects forgeries more accurately and with far less computation than full-image methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FLiD, a framework that first localizes face and text fields in digital identity documents with an object detector, then extracts embeddings from those regions only and classifies them for forgery. This targeted processing yields AUC scores of 0.880 on face attacks, 0.954 on text attacks, and 0.923 on combined attacks, with corresponding EER reductions of 29-35 percentage points over full-document baselines. The approach also beats general manipulation detectors while using 13 times fewer parameters and 21 times fewer FLOPs. Readers would care because remote identity verification depends on these documents and is vulnerable to localized edits that standard full-image methods miss.

Core claim

FLiD localizes face and text fields using a fine-tuned object detector, extracts compact field-level embeddings with a frozen MobileNetV3-Small backbone, and classifies them via a lightweight neural network containing only 191K trainable parameters. On face, text, and both-field attack scenarios it records AUCs of 0.880, 0.954, and 0.923 with EERs of 18.05%, 11.61%, and 15.16%. These figures represent absolute improvements of 29-35 percentage points over a full-document model trained from scratch and outperform general-purpose detectors such as TruFor, MMFusion, and UniVAD while requiring 13x fewer parameters and 21x fewer FLOPs.

What carries the argument

The two-stage field-localized pipeline that uses object detection to isolate face and text regions, then applies a frozen backbone and small classifier exclusively to those regions.

If this is right

Forgery detection accuracy improves specifically for manipulations confined to the identity photo or textual data.
Computational cost drops enough to support real-time checks on edge hardware.
The same localized strategy outperforms off-the-shelf general forgery detectors on structured documents.
Only the critical identity fields need processing rather than the entire document image.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the localization step remains reliable across new document layouts, the method could extend to passports, driving licences, and other structured identity media.
The low parameter count opens the possibility of on-device deployment without sending images to the cloud.
Further gains may appear by adding localization for additional fields such as signatures or barcodes.
The separation of localization from classification makes it easier to swap in improved detectors without retraining the entire forgery classifier.

Load-bearing premise

The object detector accurately localizes face and text fields even after those fields have been forged or altered.

What would settle it

Performance measured on a held-out set of identity documents from an unseen issuer or containing forgery types absent from training, especially cases where field localization accuracy falls below the level seen in the reported experiments.

Figures

Figures reproduced from arXiv: 2605.09089 by Abhishek Kumar, Carsten Maple, Mark Hooper, Riya Tapwal.

**Figure 1.** Figure 1: Examples of Localized Forgery in Digital Identity Documents Across Face, Text, and Combined Manipulation Scenarios the identity card and a live selfie or captured photograph. This dependence on document images makes such systems vulnerable to localised manipulations, including face replacement, text editing, or both-field field tampering, which can enable identity fraud while remaining visually difficult … view at source ↗

**Figure 2.** Figure 2: Overview of the proposed identity-aware forgery detection system. The pipeline first localizes identity fields using YOLOv8, extracts field-level embeddings using MobileNetV3, and performs forgery classification using a lightweight classifier. TextDiffuser2), and are categorized into face-only, text-only, and both-field face+text attacks. FantasyID has been adopted in the DeepID ICCV Challenge, establish… view at source ↗

**Figure 3.** Figure 3: Equal Error Rate (EER) analysis for FLiD and the baseline across face, text, and both-field attack scenarios. ranking performance across all attack types, with notably low variance for text attacks (±0.016), indicating consistent generalisation across data folds. Score Distribution Analysis: The score histograms in [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison of FLiD and the baseline across face, text, and both-field attack scenarios. (a) ROC curves illustrate ranking performance. (b) Score distributions show class separability and overlap. der text attacks), precision remains lower, indicating increased false positives. MMFusion and UniVAD degrade further across scenarios, highlighting the difficulty of distinguishing structured identity edits from… view at source ↗

read the original abstract

Digital identity verification systems used in remote onboarding rely on document images to authenticate users, making them vulnerable to localized manipulations of key identity fields such as facial photographs and textual information. Existing forgery detection methods, developed primarily for natural-image forensics, show limited transferability to structured identity documents. We propose FLiD, a lightweight field-localized framework that targets critical identity regions rather than processing full-document images. A fine-tuned object detector first localizes face and text fields; a frozen MobileNetV3-Small backbone then extracts compact field-level embeddings, which are classified by lightweight neural network with only 191K trainable parameters. FLiD achieves AUC scores of 0.880 (face), 0.954 (text), and 0.923 (both-field attacks), with corresponding EERs of 18.05%, 11.61%, and 15.16%, representing absolute reductions of 29-35 percentage points over a full-document baseline trained from scratch. FLiD also consistently outperforms general-purpose manipulation detectors (TruFor, MMFusion, UniVAD) across all attack scenarios while requiring 13x fewer parameters and 21x fewer FLOPs

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FLiD improves on full-image baselines for ID document forgery by localizing to face and text fields first, but the detector's accuracy on tampered samples is unverified and could undermine the gains.

read the letter

FLiD improves on full-image baselines for ID document forgery by localizing to face and text fields first, but the detector's accuracy on tampered samples is unverified and could undermine the gains. The paper adapts object detection to crop face and text regions in structured documents, then runs a frozen MobileNetV3-Small for embeddings and a tiny classifier on top. This is a reasonable fit for identity documents where edits target specific areas rather than the whole page. The reported results show clear lifts: AUC of 0.880 on face attacks, 0.954 on text, and 0.923 on both, with EER drops of 29-35 points versus a scratch-trained full-document model. It also beats several general manipulation detectors while using 13x fewer parameters and 21x fewer FLOPs, which is the practical angle that stands out. The numbers are specific and the efficiency comparison is direct, so the core claim holds up on the tests they ran. The soft spot is the localization step. The abstract and setup assume the fine-tuned detector produces reliable crops on both clean and forged documents, yet no mAP, IoU, or failure-mode numbers are given for manipulated samples. If splicing or texture changes throw off the bounding boxes, the field embeddings become unreliable and the AUC/EER advantages over the baseline lose their grounding. That matches the stress-test concern exactly, and it is not a minor detail because the whole pipeline depends on it. This is aimed at people building remote onboarding or document verification tools who need something lightweight and targeted. A reader working on applied forensics or low-resource deployment would get usable ideas from the framework and the baseline comparisons. It deserves a serious referee because the idea is straightforward, the efficiency claims are testable, and the performance deltas are large enough to matter if the localization holds. I would send it to review with a request for localization metrics on the attack sets and an ablation that separates detector error from classification error.

Referee Report

1 major / 2 minor

Summary. The paper proposes FLiD, a lightweight field-localized forgery detection framework for digital identity documents. A fine-tuned object detector localizes face and text fields; embeddings are then extracted from these regions using a frozen MobileNetV3-Small backbone and classified by a small neural network (191K trainable parameters). The method reports AUC scores of 0.880 (face attacks), 0.954 (text attacks), and 0.923 (both-field attacks) with EERs of 18.05%, 11.61%, and 15.16%, claiming absolute gains of 29-35 points over a full-document baseline and consistent outperformance of general-purpose detectors (TruFor, MMFusion, UniVAD) at 13x fewer parameters and 21x fewer FLOPs.

Significance. If the localization step remains reliable on forged inputs, FLiD demonstrates a practical advance for structured-document forensics by focusing computation on identity-critical regions rather than full images. The efficiency numbers and direct comparisons to both task-specific baselines and general manipulation detectors are concrete strengths; the approach could inform deployment in remote identity verification systems where compute and data constraints matter.

major comments (1)

[Experiments] Experiments section: the paper provides no mAP, IoU, precision-recall, or failure-mode statistics for the fine-tuned object detector on forged or manipulated documents. Because the pipeline extracts MobileNetV3 embeddings only from the detector's crops, any degradation in localization accuracy caused by splicing boundaries, texture mismatches, or altered aspect ratios would invalidate the reported AUC/EER gains over the full-document baseline. An ablation isolating localization error from classification error is also absent.

minor comments (2)

[Abstract] Abstract and §3: the EER and AUC figures are presented without confidence intervals or statistical significance tests against the baselines; adding these would strengthen the performance claims.
[Method] The description of the lightweight classifier (191K parameters) would benefit from an explicit layer-by-layer parameter count or diagram to allow direct reproduction.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback emphasizing the need to validate the localization step, which is foundational to the FLiD pipeline. We address the concern point by point below and will revise the manuscript to incorporate the suggested analyses.

read point-by-point responses

Referee: Experiments section: the paper provides no mAP, IoU, precision-recall, or failure-mode statistics for the fine-tuned object detector on forged or manipulated documents. Because the pipeline extracts MobileNetV3 embeddings only from the detector's crops, any degradation in localization accuracy caused by splicing boundaries, texture mismatches, or altered aspect ratios would invalidate the reported AUC/EER gains over the full-document baseline. An ablation isolating localization error from classification error is also absent.

Authors: We agree that the absence of these metrics leaves an important aspect of the pipeline unverified. In the revised manuscript we will report mAP, IoU, precision-recall curves, and qualitative failure-mode analysis for the fine-tuned object detector evaluated separately on pristine and forged document images. We will also add an ablation that replaces the detector's predicted crops with ground-truth bounding boxes and recomputes the downstream AUC/EER; the difference between the two settings will directly quantify the contribution of localization error. These additions will confirm that localization remains sufficiently accurate on manipulated inputs and that the reported gains over the full-document baseline are not artifacts of detector failure. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework with external baselines

full rationale

The paper describes an empirical pipeline (fine-tuned detector + frozen MobileNetV3 + lightweight classifier) and reports AUC/EER metrics obtained from direct evaluation on attack scenarios. These metrics are compared against independently published external detectors (TruFor, MMFusion, UniVAD) and a full-document baseline trained from scratch. No equations, predictions, or uniqueness claims appear; performance numbers are not derived from fitted parameters that are then re-labeled as predictions, nor do they reduce to self-citations or self-definitions. The central claims rest on experimental outcomes rather than any tautological reduction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach builds on standard computer vision techniques with empirical performance claims; no new physical entities or unproven mathematical axioms are introduced.

free parameters (1)

Number of trainable parameters (191K) = 191000
The lightweight classifier has 191K trainable parameters, chosen to keep the model small.

axioms (1)

domain assumption Pre-trained backbones like MobileNetV3-Small can provide useful embeddings for downstream classification tasks when frozen.
Assumes transfer learning works well for this domain.

pith-pipeline@v0.9.0 · 5512 in / 1429 out tokens · 41542 ms · 2026-05-12T02:25:50.961514+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

A fine-tuned object detector first localizes face and text fields; a frozen MobileNetV3-Small backbone then extracts compact field-level embeddings, which are classified by lightweight neural network with only 191K trainable parameters.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

[1]

In: 2020 IEEE International Joint Conference on Biometrics (IJCB)

Albiero, V., Srinivas, N., Villalobos, E., Perez-Facuse, J., Rosenthal, R., Mery, D., Ricanek, K., Bowyer, K.W.: Identity document to selfie face matching across adolescence. In: 2020 IEEE International Joint Conference on Biometrics (IJCB). pp. 1–9. IEEE (2020) 1, 4

work page 2020
[2]

Symmetry17(8) (2025).https://doi.org/10.3390/ sym17081208,https://www.mdpi.com/2073-8994/17/8/12085

Bae, Y.Y., Cho, D.J., Jung, K.H.: Enhancing document forgery detection with edge-focused deep learning. Symmetry17(8) (2025).https://doi.org/10.3390/ sym17081208,https://www.mdpi.com/2073-8994/17/8/12085

work page 2025
[3]

Biometrics, I.: Iso/iec 30107: Information technology—biometric presentation at- tack detection (2016) 9

work page 2016
[4]

Standard, International Organization for Standard- ization, Geneva, Switzerland (2017) 9

Biometrics, I.: Information technology–biometric presentation attack detection– part 3: Testing and reporting. Standard, International Organization for Standard- ization, Geneva, Switzerland (2017) 9

work page 2017
[5]

Ranftl, A

Chen, X., Dong, C., Ji, J., Cao, J., Li, X.: Image manipulation detection by multi- view multi-scale supervision (10 2021).https://doi.org/10.1109/ICCV48922. 2021.013924

work page doi:10.1109/iccv48922 2021
[6]

Journal of Innovation Management13(2), XVI– XXV (Sep 2025).https://doi.org/10.24840/2183-0606_013.002_l0034

Dai, K., Alonso, J., Gutiérrez-Meana, J.: A machine learning framework for forgery detection in digital id documents. Journal of Innovation Management13(2), XVI– XXV (Sep 2025).https://doi.org/10.24840/2183-0606_013.002_l0034

work page doi:10.24840/2183-0606_013.002_l0034 2025
[7]

In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP)

Fathy, M.E., Patel, V.M., Chellappa, R.: Face-based active authentication on mo- bile devices. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). pp. 1687–1691. IEEE (2015) 4

work page 2015
[8]

Pattern Recogn.162(C) (Jun 2025).https://doi.org/ 10.1016/j.patcog.2025.111352,https://doi.org/10.1016/j.patcog.2025

Gonzalez, S., Tapia, J.E.: Forged presentation attack detection for id cards on re- mote verification systems. Pattern Recogn.162(C) (Jun 2025).https://doi.org/ 10.1016/j.patcog.2025.111352,https://doi.org/10.1016/j.patcog.2025. 1113522, 4, 5, 9

work page doi:10.1016/j.patcog.2025.111352 2025
[9]

What’s in the image? a deep-dive into the vision of vision language models

Gu, Z., Zhu, B., Zhu, G., Chen, Y., Tang, M., Wang, J.: Univad: A training- free unified model for few-shot visual anomaly detection. In: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 15194– 15203 (2025).https://doi.org/10.1109/CVPR52734.2025.014152, 4

work page doi:10.1109/cvpr52734.2025.014152 2025
[10]

Assran, Q

Guillaro, F., Cozzolino, D., Sud, A., Dufour, N., Verdoliva, L.: Trufor: Leveraging all-round clues for trustworthy image forgery detection and localization. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 20606–20615 (2023).https://doi.org/10.1109/CVPR52729.2023.019742, 4

work page doi:10.1109/cvpr52729.2023.019742 2023
[11]

Assran, Q

Guo, X., Liu, X., Ren, Z., Grosz, S., Masi, I., Liu, X.: Hierarchical fine-grained image forgery detection and localization. In: 2023 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR). pp. 3155–3165 (2023).https: //doi.org/10.1109/CVPR52729.2023.003084

work page doi:10.1109/cvpr52729.2023.003084 2023
[12]

arXiv preprint arXiv:2507.20808 (2025) 5

Korshunov, P., Mohammadi, A., Vidit, V., Ecabert, C., Marcel, S.: Fantasyid: A dataset for detecting digital manipulations of id-documents. arXiv preprint arXiv:2507.20808 (2025) 5

work page arXiv 2025
[13]

URLhttps://openaccess.thecvf.com/content/WACV2021/html/ Mathew_DocVQA_A_Dataset_for_VQA_on_Document_Images_WACV_2021_paper.html

Kwon, M.J., Yu, I.J., Nam, S.H., Lee, H.K.: Cat-net: Compression artifact tracing network for detection and localization of image splicing. In: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV). pp. 375–384 (2021). https://doi.org/10.1109/WACV48630.2021.000424

work page doi:10.1109/wacv48630.2021.000424 2021
[14]

Signal Processing238, 110123 (2025).https: //doi.org/10.1016/j.sigpro.2025.110123,https://www.sciencedirect.com/ science/article/pii/S01651684250023735 16 A

Li,W.,Li,B.,Zheng,K.,Li,S.,Li,H.:Documentimageforgerydetectionandlocal- ization in desensitization scenarios. Signal Processing238, 110123 (2025).https: //doi.org/10.1016/j.sigpro.2025.110123,https://www.sciencedirect.com/ science/article/pii/S01651684250023735 16 A. Kumar et al

work page doi:10.1016/j.sigpro.2025.110123 2025
[15]

In: CVPR

Liu, H., Tan, Z., Tan, C., Wei, Y., Wang, J., Zhao, Y.: Forgery-aware Adaptive Transformer for Generalizable Synthetic Image Detection . In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 10770– 10780. IEEE Computer Society, Los Alamitos, CA, USA (Jun 2024).https:// doi.org/10.1109/CVPR52733.2024.01024,https://doi.ieeecomp...

work page doi:10.1109/cvpr52733.2024.01024 2024
[16]

IEEE Transactions on Circuits and Systems for Video Technology32(11), 7505–7517 (2022).https: //doi.org/10.1109/TCSVT.2022.31895454

Liu, X., Liu, Y., Chen, J., Liu, X.: Pscc-net: Progressive spatio-channel correlation network for image manipulation detection and localization. IEEE Transactions on Circuits and Systems for Video Technology32(11), 7505–7517 (2022).https: //doi.org/10.1109/TCSVT.2022.31895454

work page doi:10.1109/tcsvt.2022.31895454 2022
[17]

Fouriscale: A frequency perspective on training-free high-resolution image synthesis,

Luo, D., Zhou, Y., Yang, R., Liu, Y., Liu, X., Zeng, J., Yang, B., Huang, Z., Jin, L.: Icdar 2023 competition on detecting tampered text in images. In: Proceedings of theInternationalConferenceonDocumentAnalysisandRecognition(ICDAR).pp. 587–600. ICDAR (2023),https://link.springer.com/chapter/10.1007/978-3- 031-41679-8_364

work page doi:10.1007/978-3- 2023
[18]

Martínez Tornés, B.M., Taburet, T., Boros, E., Rouis, K., Doucet, A., Sidère, N., PoulainD’Andecy,V.:Receiptdatasetfordocumentforgerydetection.In:Proceed- ings of the 17th International Conference on Document Analysis and Recognition (ICDAR). pp. 454–469 (2023),https://hal.science/hal-04295385v1/document 4

work page 2023
[19]

IEEE Transactions on Information Forensics and Security14(5), 1240– 1250 (2018) 4

Perera, P., Patel, V.M.: Face-based multiple user active authentication on mobile devices. IEEE Transactions on Information Forensics and Security14(5), 1240– 1250 (2018) 4

work page 2018
[20]

Journal of Imaging8(7), 181 (2022).https://doi.org/10.3390/ jimaging8070181,https://www.mdpi.com/2313-433X/8/7/1814

Polevoy, D.V., Sigareva, I.V., Ershova, D.M., Arlazarov, V.V., Nikolaev, D.P., Ming, Z., Luqman, M.M., Burie, J.C.: Document liveness challenge dataset (dlc-2021). Journal of Imaging8(7), 181 (2022).https://doi.org/10.3390/ jimaging8070181,https://www.mdpi.com/2313-433X/8/7/1814

work page 2021
[21]

IEEE Transactions on Biometrics, Behavior, and Identity Science1(1), 56–67 (2019) 1, 2, 4

Shi, Y., Jain, A.K.: Docface+: Id document to selfie matching. IEEE Transactions on Biometrics, Behavior, and Identity Science1(1), 56–67 (2019) 1, 2, 4

work page 2019
[22]

In: 2024 IEEE International Joint Conference on Biometrics (IJCB)

Tapia, J.E., Damer, N., Busch, C., Espin, J.M., Barrachina, J., Rocamora, A.S., Ocvirk, K., Alessio, L., Batagelj, B., Patwardhan, S., et al.: First competition on presentation attack detection on id card. In: 2024 IEEE International Joint Conference on Biometrics (IJCB). pp. 1–10. IEEE (2024) 2, 4

work page 2024
[23]

& Sena, C

Triaridis, K., Mezaris, V.: Exploring multi-modal fusion for image manipulation detection and localization. In: Proceedings of the 30th International Conference on MultiMedia Modeling (MMM). Lecture Notes in Computer Science, vol. 14556, pp. 198–211. Springer (2024).https://doi.org/10.1007/978- 3- 031- 53311- 2_15, https://link.springer.com/chapter/10.100...

work page doi:10.1007/978- 2024
[24]

In: Proceedings of the 31st ACM International Conference on Multimedia

Wang, H., Li, S., Cao, S., Yang, R., Zeng, J., Qian, Z., Zhang, X.: On physi- cally occluded fake identity document detection. In: Proceedings of the 31st ACM International Conference on Multimedia. p. 1556–1564. MM ’23, Association for Computing Machinery, New York, NY, USA (2023).https://doi.org/10.1145/ 3581783.3612075,https://doi.org/10.1145/3581783.3...

work page doi:10.1145/3581783.36120752 2023
[25]

Journal of Elec- tronic Imaging34(4), 043018 (2025).https://doi.org/10.1117/1.JEI.34.4

Wang, L., Li, Z., Zhao, W.: Research on identity document image tampering detec- tion based on texture understanding and multistream networks. Journal of Elec- tronic Imaging34(4), 043018 (2025).https://doi.org/10.1117/1.JEI.34.4. 043018,https://doi.org/10.1117/1.JEI.34.4.0430184

work page doi:10.1117/1.jei.34.4 2025
[26]

IEEE Access10, 123337–123348 (2022).https://doi.org/10.1109/ ACCESS.2022.32242354, 5 Abbreviated paper title 17

Xu, J., Jia, D., Lin, Z., Zhou, T.: Psfnet: A deep learning network for fake passport detection. IEEE Access10, 123337–123348 (2022).https://doi.org/10.1109/ ACCESS.2022.32242354, 5 Abbreviated paper title 17

work page arXiv 2022
[27]

In: CVPR

Yu, Z., Ni, J., Lin, Y., Deng, H., Li, B.: Diffforensics: Leveraging diffusion prior to image forgery detection and localization. In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 12765–12774 (2024). https://doi.org/10.1109/CVPR52733.2024.012134

work page doi:10.1109/cvpr52733.2024.012134 2024
[28]

Pattern Recognition161, 111253 (2025) 1

Zhang, Y., Li, Q., Yu, Z., Shen, L.: Distilled transformers with locally enhanced global representations for face forgery detection. Pattern Recognition161, 111253 (2025) 1

work page 2025
[29]

Journal of Visual Communication and Image Representation 58, 380–399 (2019) 2, 4 A ROI Extraction vs

Zheng, L., Zhang, Y., Thing, V.L.: A survey on image tampering and its detection in real-world photos. Journal of Visual Communication and Image Representation 58, 380–399 (2019) 2, 4 A ROI Extraction vs. Whole-Image Input To quantify the benefit of field-level analysis, we train the same lightweight classifier on frozen MobileNetV3-Small embeddings (576-...

work page 2019
[30]

For the combined attack scenario, all three backbones perform comparably (0.944–0.969), suggesting that Abbreviated paper title 19 T able 7:Backbone ablation (5-fold CV)

Notably, the larger backbones perform poorly on text attacks (AUC≈0.54– 0.56, near chance), whereas MobileNetV3-Small achieves an AUC of 0.950, in- dicating that its compact depthwise-separable features capture subtle texture cues introduced by text manipulations more effectively. For the combined attack scenario, all three backbones perform comparably (0...

work page