arxiv: 2604.10132 · v1 · submitted 2026-04-11 · 💻 cs.CV · cs.AI

Recognition: unknown

Semantic Manipulation Localization

Zhenshan Tan , Chenhan Lu , Yuxiang Huang , Ziwen He , Xiang Zhang , Yuzhe Sha , Xianyi Chen , Tianrun Chen

show 1 more author

Zhangjie Fu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:27 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords Semantic Manipulation LocalizationImage Manipulation LocalizationSemantic EditsImage ForensicsTRACE FrameworkComputer VisionBenchmark ConstructionFrequency Perturbation

0 comments

The pith

TRACE localizes subtle semantic edits in images by anchoring meaning, sensing frequency perturbations, and jointly reasoning over content and scope.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Traditional image manipulation localization methods rely on detecting low-level visual artifacts, but modern edits often change an object's attributes, state, or relationships while remaining visually consistent with the surroundings. This paper introduces Semantic Manipulation Localization as a distinct task focused on identifying such meaning-altering changes, along with a fine-grained benchmark built from a semantics-driven editing pipeline. It proposes the TRACE framework, which models semantic sensitivity through three coupled stages: first anchoring semantically meaningful regions, then injecting perturbation-sensitive cues to spot edits under strong consistency, and finally verifying them via joint reasoning on semantic content and scope. Experiments demonstrate that this yields more complete, compact, and coherent localization than prior IML approaches. The work shows that shifting from artifact detection to semantic sensitivity is necessary for handling complex editing scenarios.

Core claim

By identifying semantically meaningful regions, injecting frequency-based perturbation cues to capture subtle changes under visual consistency, and verifying candidates through joint reasoning over semantic content and scope, the TRACE framework localizes meaning-altering manipulations that lack obvious low-level artifacts and outperforms existing IML methods on the new SML benchmark with more complete and semantically coherent results.

What carries the argument

The TRACE framework, which models semantic sensitivity through the three progressively coupled components of semantic anchoring, semantic perturbation sensing, and semantic-constrained reasoning.

If this is right

Image forensics must move beyond artifact detection to handle modern generative edits that preserve visual consistency.
The semantics-driven benchmark provides a standard for evaluating localization of meaning changes rather than pixel inconsistencies.
TRACE's three-stage coupling produces localization that is more complete, compact, and aligned with human interpretation of the edit.
Semantic anchoring first isolates regions critical to image understanding before perturbation analysis is applied.
Joint reasoning over content and scope reduces false positives in areas that look edited but do not alter overall meaning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same anchoring-plus-reasoning structure could be tested on video sequences to localize temporal semantic changes across frames.
If the perturbation-sensing stage proves robust, it might extend to other modalities like audio or text where subtle meaning shifts occur without surface artifacts.
Forensic tools could incorporate TRACE-style semantic scope checks to distinguish intentional edits from natural variations in scene content.

Load-bearing premise

That semantic sensitivity captured by coupling anchoring, frequency perturbation sensing, and joint reasoning can reliably identify edits that alter meaning while staying visually consistent with surrounding content.

What would settle it

A new test set of images containing seamless semantic edits (such as altered object attributes that change scene interpretation) where TRACE produces localization maps that are no more accurate or coherent than those from standard artifact-based IML methods.

Figures

Figures reproduced from arXiv: 2604.10132 by Chenhan Lu, Tianrun Chen, Xiang Zhang, Xianyi Chen, Yuxiang Huang, Yuzhe Sha, Zhangjie Fu, Zhenshan Tan, Ziwen He.

**Figure 2.** Figure 2: The pipeline of two-stage automatic semantically decisive region [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Framework of the proposed TRACE. “WT” denotes the wavelet transform. TRACE decomposes localization into three progressively coupled [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Framework of the semantic-scope coherence reasoning (SSCR). Starting from the mask and edge features, SSCR first constructs four interleaved [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative results of our Trace compared with other SoTAs on our proposed dataset. In addition, We provide negative semantics generated by LLMs [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative results of each proposed module. (a) Original image; (b) Manipulated image; (c) Ground truth; (d) Baseline; (e) Baseline + SPS; (f) [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Visualization results of SPS. (a) Original image; (b) Manipulated [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 9.** Figure 9: Visualization results of SSGM. (a) Manipulated image; (b) Ground [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗

**Figure 10.** Figure 10: Comparison of accuracy–efficiency trade-offs among different [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗

read the original abstract

Image Manipulation Localization (IML) aims to identify edited regions in an image. However, with the increasing use of modern image editing and generative models, many manipulations no longer exhibit obvious low-level artifacts. Instead, they often involve subtle but meaning-altering edits to an object's attributes, state, or relationships while remaining highly consistent with the surrounding content. This makes conventional IML methods less effective because they mainly rely on artifact detection rather than semantic sensitivity. To address this issue, we introduce Semantic Manipulation Localization (SML), a new task that focuses on localizing subtle semantic edits that significantly change image interpretation. We further construct a dedicated fine-grained benchmark for SML using a semantics-driven manipulation pipeline with pixel-level annotations. Based on this task, we propose TRACE (Targeted Reasoning of Attributed Cognitive Edits), an end-to-end framework that models semantic sensitivity through three progressively coupled components: semantic anchoring, semantic perturbation sensing, and semantic-constrained reasoning. Specifically, TRACE first identifies semantically meaningful regions that support image understanding, then injects perturbation-sensitive frequency cues to capture subtle edits under strong visual consistency, and finally verifies candidate regions through joint reasoning over semantic content and semantic scope. Extensive experiments show that TRACE consistently outperforms existing IML methods on our benchmark and produces more complete, compact, and semantically coherent localization results. These results demonstrate the necessity of moving beyond artifact-based localization and provide a new direction for image forensics in complex semantic editing scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper usefully defines semantic manipulation localization and a matching framework, but its self-generated benchmark leaves the superiority claims open to circularity concerns.

read the letter

The punchline for this paper is that it defines a useful new task for localizing semantic edits in images and offers a framework to tackle it, but the reliance on a self-generated benchmark weakens the strength of the performance claims. The work is new in carving out Semantic Manipulation Localization as separate from standard IML, which usually hunts for low-level artifacts. The authors build a dedicated benchmark using a semantics-driven pipeline that creates pixel-level labels for subtle changes to attributes, states, or relationships. Their TRACE framework couples three components: semantic anchoring to find key regions, perturbation-sensitive frequency injection to spot subtle changes despite visual consistency, and semantic-constrained reasoning to verify them. This structure directly targets the problem of edits that look natural but shift meaning, which is increasingly common with advanced generative tools. What the paper does well is to articulate the gap in current methods and to propose a progressive, semantics-focused approach instead of artifact detection. The benchmark construction sounds thorough for the task, and the idea of injecting frequency cues for sensitivity is a reasonable technical choice. The soft spots are in the validation. Since the benchmark comes from the authors' own manipulation process and the model is designed around semantic perturbations of that type, the reported outperformance on completeness, compactness, and coherence could stem from this match rather than general superiority. The abstract does not detail external test sets or cross-method comparisons that would address this. Without those, it's hard to know how TRACE would fare on edits from other sources. This paper is for people in computer vision working on image forensics, deepfake detection, or semantic analysis. A reader dealing with misinformation or digital media verification would find the task definition and the three-component design worth considering, even if they would need to adapt the evaluation. It deserves a serious referee because the problem is real and the framework is thoughtfully constructed. I would recommend sending it to peer review, but flag the need for stronger, independent validation in the reviews.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces Semantic Manipulation Localization (SML) as a new task focused on identifying subtle semantic edits (attributes, states, relationships) that alter image meaning while remaining visually consistent with surrounding content. It constructs a fine-grained benchmark via a semantics-driven manipulation pipeline that supplies pixel-level annotations, and proposes the TRACE end-to-end framework whose three progressively coupled components (semantic anchoring, perturbation-sensitive frequency injection, and semantic-constrained reasoning) are intended to model semantic sensitivity. The central claim is that TRACE outperforms prior IML methods on this benchmark and yields more complete, compact, and semantically coherent localization maps.

Significance. If the central claim holds under external validation, the work would be significant for shifting IML research from low-level artifact detection toward semantic understanding, a timely direction given modern generative editing tools. The introduction of a dedicated SML benchmark and the explicit coupling of the three TRACE components constitute a coherent technical contribution. Credit is due for targeting a problem that conventional IML methods are acknowledged to handle poorly and for supplying pixel-level annotations.

major comments (2)

[Benchmark construction and evaluation sections] Benchmark construction and evaluation sections: the benchmark is generated exclusively by the authors' own semantics-driven manipulation pipeline, and all reported performance (including the claim that TRACE 'consistently outperforms existing IML methods') is measured only on this self-generated data. Because the three TRACE components are themselves designed to target precisely the same class of semantic edits, outperformance may reflect alignment between the generation process and the model's inductive biases rather than robustness to arbitrary visually consistent semantic changes. An external test set or cross-pipeline comparison is required to break this potential circularity.
[Experiments section] Experiments section: the abstract asserts outperformance and 'more complete, compact, and semantically coherent' results, yet supplies no quantitative metrics (e.g., IoU, F1, precision-recall), baseline details, ablation studies on the three components, or error analysis. Without these, the central claim that semantic sensitivity modeled via the coupled components reliably localizes edits lacking low-level artifacts cannot be assessed.

minor comments (1)

[Abstract] Abstract: the description of the second component alternates between 'semantic perturbation sensing' and 'perturbation-sensitive frequency injection'; consistent terminology would improve clarity.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the detailed and constructive comments on our manuscript. We address each of the major comments below and outline the revisions we plan to make to strengthen the paper.

read point-by-point responses

Referee: Benchmark construction and evaluation sections: the benchmark is generated exclusively by the authors' own semantics-driven manipulation pipeline, and all reported performance (including the claim that TRACE 'consistently outperforms existing IML methods') is measured only on this self-generated data. Because the three TRACE components are themselves designed to target precisely the same class of semantic edits, outperformance may reflect alignment between the generation process and the model's inductive biases rather than robustness to arbitrary visually consistent semantic changes. An external test set or cross-pipeline comparison is required to break this potential circularity.

Authors: We acknowledge the potential issue of circularity raised by the referee. The benchmark was constructed using a semantics-driven pipeline to specifically target subtle semantic manipulations that do not produce obvious artifacts, which is the essence of the new SML task. This allows us to demonstrate that TRACE, with its focus on semantic anchoring, perturbation sensing, and constrained reasoning, can localize such edits more effectively than traditional IML approaches. We agree that this setup could benefit from additional validation. In the revised manuscript, we will include a more thorough discussion of the benchmark's design and its relation to real-world semantic edits, as well as qualitative evaluations on a variety of edited images to support broader applicability. revision: partial
Referee: Experiments section: the abstract asserts outperformance and 'more complete, compact, and semantically coherent' results, yet supplies no quantitative metrics (e.g., IoU, F1, precision-recall), baseline details, ablation studies on the three components, or error analysis. Without these, the central claim that semantic sensitivity modeled via the coupled components reliably localizes edits lacking low-level artifacts cannot be assessed.

Authors: We appreciate the referee's observation regarding the presentation of experimental results. The current manuscript's experiments section does include comparisons with IML methods and describes the TRACE components, but we agree that more explicit quantitative metrics, ablation studies, and error analysis would strengthen the paper. We will revise the experiments section to include detailed quantitative results (such as IoU and F1 scores), baseline details, ablations on the three components, and error analysis to better substantiate the claims about semantic sensitivity and localization performance. revision: yes

standing simulated objections not resolved

The need for an external test set or cross-pipeline comparison, as we currently lack such independent data and it would require substantial new data collection efforts.

Circularity Check

1 steps flagged

Self-generated semantics-driven benchmark creates circular validation for TRACE outperformance

specific steps

fitted input called prediction [Abstract]
"We further construct a dedicated fine-grained benchmark for SML using a semantics-driven manipulation pipeline with pixel-level annotations. ... TRACE ... models semantic sensitivity through three progressively coupled components: semantic anchoring, semantic perturbation sensing, and semantic-constrained reasoning. ... Extensive experiments show that TRACE consistently outperforms existing IML methods on our benchmark and produces more complete, compact, and semantically coherent localization results."

The benchmark inputs are produced by the authors' own semantics-driven pipeline targeting subtle meaning-altering edits. TRACE's core is defined to model exactly those semantic sensitivities via its three components. Claiming superior localization on this benchmark therefore reduces to testing alignment between the generation process and the model's inductive biases, with no independent external data to break the loop.

full rationale

The paper introduces SML as a new task focused on subtle semantic edits and constructs its benchmark exclusively via an authors' semantics-driven manipulation pipeline that generates pixel-level annotations for attribute/state/relationship changes. TRACE is then defined with three coupled components explicitly targeting semantic sensitivity to the same edit class. Outperformance is claimed only on this internally generated benchmark with no external test sets or cross-pipeline comparisons referenced, making the superiority result statistically aligned with the construction process rather than independently verified.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the framework introduces semantic anchoring, perturbation sensing, and constrained reasoning as new modeling choices whose grounding is not detailed.

pith-pipeline@v0.9.0 · 5575 in / 958 out tokens · 35692 ms · 2026-05-10T15:27:28.629606+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

50 extracted references · 6 canonical work pages · 1 internal anchor

[1]

Exploring multi-view pixel contrast for general and robust image forgery localization,

Z. Lou, G. Cao, K. Guo, L. Yu, and S. Weng, “Exploring multi-view pixel contrast for general and robust image forgery localization,”IEEE Transactions on Information Forensics and Security, 2025

2025
[2]

A forensic framework with diverse data generation for generalizable forgery localization,

Y . Huang, W. Luo, X. Cao, and J. Huang, “A forensic framework with diverse data generation for generalizable forgery localization,”IEEE Transactions on Information Forensics and Security, 2025

2025
[3]

Reloc: A restoration-assisted framework for robust image tampering localization,

P. Zhuang, H. Li, R. Yang, and J. Huang, “Reloc: A restoration-assisted framework for robust image tampering localization,”IEEE Transactions on Information Forensics and Security, vol. 18, pp. 5243–5257, 2023

2023
[4]

M2sformer: Multi-spectral and multi-scale attention with edge-aware difficulty guidance for image forgery localization,

J.-H. Nam, D.-H. Moon, and S.-C. Lee, “M2sformer: Multi-spectral and multi-scale attention with edge-aware difficulty guidance for image forgery localization,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 15 927–15 938

2025
[5]

Coverage—a novel database for copy-move forgery detection,

B. Wen, Y . Zhu, R. Subramanian, T.-T. Ng, X. Shen, and S. Winkler, “Coverage—a novel database for copy-move forgery detection,” in2016 IEEE international conference on image processing (ICIP). IEEE, 2016, pp. 161–165

2016
[6]

A simple pooling- based design for real-time salient object detection,

J.-J. Liu, Q. Hou, M.-M. Cheng, J. Feng, and J. Jiang, “A simple pooling- based design for real-time salient object detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3917–3926

2019
[7]

Depth scale balance saliency detection with connective feature pyramid and edge guidance,

Z. Tan and X. Gu, “Depth scale balance saliency detection with connective feature pyramid and edge guidance,”Applied Intelligence, vol. 51, no. 8, pp. 5775–5792, 2021

2021
[8]

Mvss-net: Multi- view multi-scale supervised networks for image manipulation detec- tion,

C. Dong, X. Chen, R. Hu, J. Cao, and X. Li, “Mvss-net: Multi- view multi-scale supervised networks for image manipulation detec- tion,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 3, pp. 3539–3553, 2022

2022
[9]

Cfl-net: Image forgery localization using contrastive learning,

F. F. Niloy, K. K. Bhaumik, and S. S. Woo, “Cfl-net: Image forgery localization using contrastive learning,” inProceedings of the IEEE/CVF winter conference on applications of computer vision, 2023, pp. 4642– 4651

2023
[10]

Making deepfakes more spurious: evading deep face forgery detection via trace removal attack,

C. Liu, H. Chen, T. Zhu, J. Zhang, and W. Zhou, “Making deepfakes more spurious: evading deep face forgery detection via trace removal attack,”IEEE Transactions on Dependable and Secure Computing, vol. 20, no. 6, pp. 5182–5196, 2023

2023
[11]

Hdf-net: Capturing homogeny difference features to localize the tampered im- age,

R. Han, X. Wang, N. Bai, Y . Wang, J. Hou, and J. Xue, “Hdf-net: Capturing homogeny difference features to localize the tampered im- age,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 12, pp. 10 005–10 020, 2024

2024
[12]

arXiv preprint arXiv:2505.11003 , year=

B. Du, X. Zhu, X. Ma, C. Qu, K. Feng, Z. Yang, C.-M. Pun, J. Liu, and J.-Z. Zhou, “Forensichub: A unified benchmark & codebase for all-domain fake image detection and localization,”arXiv preprint arXiv:2505.11003, 2025

work page arXiv 2025
[13]

Boosting adversarial transferability of vision transformers,

Y . Wang, C. Zhang, H. Zhou, Z. Ying, Z. Xiong, W. Zhou, and L. Zhu, “Boosting adversarial transferability of vision transformers,” IEEE Transactions on Dependable and Secure Computing, 2025

2025
[14]

Mmfusion: Combining image forensic filters for visual manipulation detection and localization,

K. Triaridis, K. Tsigos, and V . Mezaris, “Mmfusion: Combining image forensic filters for visual manipulation detection and localization,”arXiv preprint arXiv:2312.01790, 2023

work page arXiv 2023
[15]

arXiv preprint arXiv:2307.14863 , year=

X. Ma, B. Du, Z. Jiang, A. Y . A. Hammadi, and J. Zhou, “Iml-vit: Benchmarking image manipulation localization by vision transformer,” arXiv preprint arXiv:2307.14863, 2023. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 14

work page arXiv 2023
[16]

Effective image tampering localization via enhanced transformer and co-attention fusion,

K. Guo, H. Zhu, and G. Cao, “Effective image tampering localization via enhanced transformer and co-attention fusion,” inIcassp 2024-2024 ieee international conference on acoustics, speech and signal processing (icassp). IEEE, 2024, pp. 4895–4899

2024
[17]

Trufor: Leveraging all-round clues for trustworthy image forgery detection and localization,

F. Guillaro, D. Cozzolino, A. Sud, N. Dufour, and L. Verdoliva, “Trufor: Leveraging all-round clues for trustworthy image forgery detection and localization,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 20 606–20 615

2023
[18]

Rethinking image forgery detection via soft contrastive learning and unsupervised clustering,

H. Wu, Y . Chen, J. Zhou, and Y . Li, “Rethinking image forgery detection via soft contrastive learning and unsupervised clustering,” IEEE Transactions on Dependable and Secure Computing, 2025

2025
[19]

Safire: Segment any forged image region,

M.-J. Kwon, W. Lee, S.-H. Nam, M. Son, and C. Kim, “Safire: Segment any forged image region,” inProceedings of the AAAI conference on artificial intelligence, vol. 39, no. 4, 2025, pp. 4437–4445

2025
[20]

Co-saliency detection with two-stage co- attention mining and individual calibration,

Z. Tan, X. Gu, and Q. Cheng, “Co-saliency detection with two-stage co- attention mining and individual calibration,”Engineering Applications of Artificial Intelligence, vol. 127, p. 107201, 2024

2024
[21]

Bridging feature complementarity gap between encoder and decoder for salient object detection,

Z. Tan and X. Gu, “Bridging feature complementarity gap between encoder and decoder for salient object detection,”Digital Signal Pro- cessing, vol. 133, p. 103841, 2023

2023
[22]

Utc: A unified transformer with inter-task contrastive learning for visual dialog,

C. Chen, Z. Tan, Q. Cheng, X. Jiang, Q. Liu, Y . Zhu, and X. Gu, “Utc: A unified transformer with inter-task contrastive learning for visual dialog,” inProceedings of the IEEE/CVF Conference on computer vision and pattern recognition, 2022, pp. 18 103–18 112

2022
[23]

Bridging spatiotem- poral feature gap for video salient object detection,

Z. Tan, C. Chen, K. Wen, Q. Cheng, and Z. Fu, “Bridging spatiotem- poral feature gap for video salient object detection,”Knowledge-Based Systems, vol. 304, p. 112505, 2024

2024
[24]

Eg- net: Edge guidance network for salient object detection,

J.-X. Zhao, J.-J. Liu, D.-P. Fan, Y . Cao, J. Yang, and M.-M. Cheng, “Eg- net: Edge guidance network for salient object detection,” inProceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 8779–8788

2019
[25]

Dual swin-transformer based mutual interactive network for rgb-d salient object detection,

C. Zeng, S. Kwong, and H. Ip, “Dual swin-transformer based mutual interactive network for rgb-d salient object detection,”Neurocomputing, vol. 559, p. 126779, 2023

2023
[26]

Sam2-unet: Segment anything 2 makes strong encoder for natural and medical image segmentation,

X. Xiong, Z. Wu, S. Tan, W. Li, F. Tang, Y . Chen, S. Li, J. Ma, and G. Li, “Sam2-unet: Segment anything 2 makes strong encoder for natural and medical image segmentation,”Visual Intelligence, vol. 4, no. 1, p. 2, 2026

2026
[27]

Samba: A unified mamba-based framework for general salient object detection,

J. He, K. Fu, X. Liu, and Q. Zhao, “Samba: A unified mamba-based framework for general salient object detection,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 25 314– 25 324

2025
[28]

Unifying global- local representations in salient object detection with transformers,

S. Ren, N. Zhao, Q. Wen, G. Han, and S. He, “Unifying global- local representations in salient object detection with transformers,”IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 8, no. 4, pp. 2870–2879, 2024

2024
[29]

Uncertainty guided refinement for fine-grained salient object detection,

Y . Yuan, P. Gao, Q. Dai, J. Qin, and W. Xiang, “Uncertainty guided refinement for fine-grained salient object detection,”IEEE Transactions on Image Processing, 2025

2025
[30]

Conda: Condensed deep association learning for co- salient object detection,

L. Li, N. Liu, D. Zhang, Z. Li, S. Khan, R. Anwer, H. Cholakkal, J. Han, and F. S. Khan, “Conda: Condensed deep association learning for co- salient object detection,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 287–303

2024
[31]

Co-saliency detection with intra-group two-stage group semantics propagation and inter-group contrastive learning,

Z. Tan and X. Gu, “Co-saliency detection with intra-group two-stage group semantics propagation and inter-group contrastive learning,” Knowledge-Based Systems, vol. 252, p. 109356, 2022

2022
[32]

arXiv preprint arXiv:2111.09734 , year=

R. Mokady, A. Hertz, and A. H. Bermano, “Clipcap: Clip prefix for image captioning,”arXiv preprint arXiv:2111.09734, 2021

work page arXiv 2021
[33]

Instructpix2pix: Learning to follow image editing instructions,

T. Brooks, A. Holynski, and A. A. Efros, “Instructpix2pix: Learning to follow image editing instructions,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 18 392–18 402

2023
[34]

Mat: Mask- aware transformer for large hole image inpainting,

W. Li, Z. Lin, K. Zhou, L. Qi, Y . Wang, and J. Jia, “Mat: Mask- aware transformer for large hole image inpainting,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10 758–10 768

2022
[35]

High- resolution image synthesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High- resolution image synthesis with latent diffusion models,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10 684–10 695

2022
[36]

Casia image tampering detection evaluation database,

J. Dong, W. Wang, and T. Tan, “Casia image tampering detection evaluation database,” in2013 IEEE China summit and international conference on signal and information processing. IEEE, 2013, pp. 422–426

2013
[37]

Copy-move forgery detection based on patchmatch,

D. Cozzolino, G. Poggi, and L. Verdoliva, “Copy-move forgery detection based on patchmatch,” in2014 IEEE international conference on image processing (ICIP). IEEE, 2014, pp. 5312–5316

2014
[38]

Multi-scale analysis strategies in prnu-based tampering localization,

P. Korus and J. Huang, “Multi-scale analysis strategies in prnu-based tampering localization,”IEEE Transactions on Information Forensics and Security, vol. 12, no. 4, pp. 809–824, 2016

2016
[39]

Imd2020: A large-scale annotated dataset tailored for detecting manipulated images,

A. Novozamsky, B. Mahdian, and S. Saic, “Imd2020: A large-scale annotated dataset tailored for detecting manipulated images,” inProceed- ings of the IEEE/CVF winter conference on applications of computer vision workshops, 2020, pp. 71–80

2020
[40]

Detection of co-salient objects by looking deep and wide,

D. Zhang, J. Han, C. Li, J. Wang, and X. Li, “Detection of co-salient objects by looking deep and wide,”International Journal of Computer Vision, vol. 120, no. 2, pp. 215–232, 2016

2016
[41]

Learning to detect salient objects with image-level supervision,

L. Wang, H. Lu, Y . Wang, M. Feng, D. Wang, B. Yin, and X. Ruan, “Learning to detect salient objects with image-level supervision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 136–145

2017
[42]

Hierarchical image saliency detection on extended cssd,

J. Shi, Q. Yan, L. Xu, and J. Jia, “Hierarchical image saliency detection on extended cssd,”IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 4, pp. 717–729, 2015

2015
[43]

Visual saliency based on multiscale deep features,

G. Li and Y . Yu, “Visual saliency based on multiscale deep features,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 5455–5463

2015
[44]

Design and perceptual validation of performance measures for salient object segmentation,

V . Movahedi and J. H. Elder, “Design and perceptual validation of performance measures for salient object segmentation,” in2010 IEEE computer society conference on computer vision and pattern recognition- workshops. IEEE, 2010, pp. 49–56

2010
[45]

Saliency detection via graph-based manifold ranking,

C. Yang, L. Zhang, H. Lu, X. Ruan, and M.-H. Yang, “Saliency detection via graph-based manifold ranking,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 3166– 3173

2013
[46]

Salient object detection: A discriminative regional feature integration approach,

H. Jiang, J. Wang, Z. Yuan, Y . Wu, N. Zheng, and S. Li, “Salient object detection: A discriminative regional feature integration approach,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 2083–2090

2013
[47]

SAM 3: Segment Anything with Concepts

N. Carion, L. Gustafson, Y .-T. Hu, S. Debnath, R. Hu, D. Suris, C. Ryali, K. V . Alwala, H. Khedr, A. Huanget al., “Sam 3: Segment anything with concepts,”arXiv preprint arXiv:2511.16719, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[48]

Sam3-adapter: Effi- cient adaptation of segment anything 3 for camouflage object segmentation, shadow detection, and medical im- age segmentation,

T. Chen, R. Cao, X. Yu, L. Zhu, C. Ding, D. Ji, C. Chen, Q. Zhu, C. Xu, P. Maoet al., “Sam3-adapter: Efficient adaptation of segment anything 3 for camouflage object segmentation, shadow detection, and medical image segmentation,”arXiv preprint arXiv:2511.19425, 2025

work page arXiv 2025
[49]

Adversarial examples detection with enhanced image difference features based on local histogram equalization,

Z. Yin, S. Zhu, H. Su, J. Peng, W. Lyu, and B. Luo, “Adversarial examples detection with enhanced image difference features based on local histogram equalization,”IEEE Transactions on Dependable and Secure Computing, 2025

2025
[50]

Frequency bias matters: Diving into robust and generalized deep image forgery detection,

C. Liu, T. Zhu, W. Zhou, and W. Zhao, “Frequency bias matters: Diving into robust and generalized deep image forgery detection,”IEEE Transactions on Dependable and Secure Computing, 2025

2025