arxiv: 2604.02780 · v1 · submitted 2026-04-03 · 💻 cs.CV

Recognition: no theorem link

A Unified Perspective on Adversarial Membership Manipulation in Vision Models

Ruize Gao , Kaiwen Zhou , Yongqiang Chen , Feng Liu

Authors on Pith no claims yet

Pith reviewed 2026-05-13 20:01 UTC · model grok-4.3

classification 💻 cs.CV

keywords membership inference attacksadversarial robustnessvision modelsgradient normsprivacyadversarial examplesmodel extraction

0 comments

The pith

Adversarial perturbations can fabricate membership for non-training images in vision model attacks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows that small, imperceptible changes to images can make membership inference attacks incorrectly classify non-training data as training data in vision models. It identifies a consistent geometric pattern where the norm of gradients collapses in a specific way for these fabricated cases. This pattern differs from true members even when the images look similar semantically. Using this, the authors create ways to detect the manipulation and make inference more robust. If true, it means current privacy assessments for AI models need to consider adversarial inputs.

Core claim

Adversarial membership fabrication pushes non-member images into the member region of MIAs across models and datasets, and a characteristic gradient-norm collapse trajectory separates these fabricated members from genuine ones despite nearly identical semantics, supporting detection and robust inference methods.

What carries the argument

The gradient-norm collapse trajectory that distinguishes fabricated from true members in the attack process.

If this is right

Adversarial fabrication works reliably on various vision architectures and datasets.
Gradient-geometry signals can detect the manipulation effectively.
Robust inference frameworks can reduce the success of such adversarial attacks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This vulnerability suggests that membership inference tools themselves require adversarial training or hardening.
Similar manipulation might apply to other types of privacy attacks beyond membership inference.
Defenses based on gradient norms could be integrated into standard model evaluation pipelines.

Load-bearing premise

The gradient-norm collapse trajectory reliably separates fabricated members from true members despite nearly identical semantic representations.

What would settle it

Finding a set of fabricated and true members where the gradient-norm trajectories are statistically indistinguishable would disprove the separation method.

Figures

Figures reproduced from arXiv: 2604.02780 by Feng Liu, Kaiwen Zhou, Ruize Gao, Yongqiang Chen.

**Figure 1.** Figure 1: Overview of the Background and Our Proposed Research Problems. MEMGUARD defends the model, whereas we show that the auditing mechanism itself can be attacked. A unified geometric perspective. Through theoretical analysis and empirical validation, we find that these perturbed non-members, fabricated members, share a distinct optimization geometry: their input-gradient norms progressively collapse as pertu… view at source ↗

**Figure 2.** Figure 2: Imperceptible Adversarial Perturbations on ImageNet-100. The first row are the original non-members, and the second row are the corresponding perturbed fabricated members. We used ϵ = 2/255 for Bϵ[x] here. The perturbations are extremely imperceptible to the human eye, which demonstrates that the Member Fabrication Attack (MFA) can be successful with only the addition of very small perturbations. 2.2. Memb… view at source ↗

**Figure 3.** Figure 3: Objective of adversarial attacks (left) vs. MFA (right). The black and red dots denote the original input and the perturbed sample within the ϵball (gray region). Adversarial attacks push inputs into the misclassification region (orange), where maxi̸=y pi > py. In contrast, MFA drives inputs into high-confidence regions. that enables reliable detection (§3.2), and finally incorporate this geometric insigh… view at source ↗

**Figure 4.** Figure 4: Visualization of the Distribution of Fabricated and True Members in Different Semantic Feature Spaces Using t-SNE [38]. The two subfigures represent the semantic features at the penultimate and antepenultimate layers, with perturbation constrained to |δ|∞ ≤ 4/255. Red and blue dots denote true and fabricated members, respectively. The high degree of overlap of the red dots and blue dots suggests that seman… view at source ↗

**Figure 5.** Figure 5: Decay of Gradient Norm with Respect to Input Across Steps. As the steps increases, the gradient norm with respect to the input progressively diminishes. For clarity, a large epsilon ball (∥δ∥∞ ≤ 8/255) is selected, alongside a small initial step size of 1/24 × (8/255), across 20 steps. Empirical observation and geometric interpretation. Although fabricated and true members are visually and semantically s… view at source ↗

**Figure 6.** Figure 6: Distribution of fabricated and true members across different detection strategies. The first plot (left) shows the Mahalanobis distance, the second plot (middle) shows the Local Intrinsic Dimensionality (LID) values, and the third plot (right) presents the gradient norm with respect to the input. We use the relative frequency within each membership class as the vertical axis. The gradient norm in the third… view at source ↗

**Figure 7.** Figure 7: Representative Experimental Results. Subfigures (a-c) demonstrate the superior performance of (MFA) compared to baselines, highlighting its effectiveness across different MIAs and datasets. Subfigures (d-f) show the effectiveness of (MFD) across varying perturbation levels ϵ and datasets. Subfigures (g-i) illustrate the effectiveness of our Adversarially Robust Strategies in combination with different stro… view at source ↗

**Figure 8.** Figure 8: Visualization of the Distribution of Fabricated and True Members in Different Semantic Feature Spaces Using t-SNE [38]. The first row displays the semantic features at the penultimate layer, with perturbation constrained to (∥δ∥∞ ≤ 2.0/255) to (∥δ∥∞ ≤ 6.0/255) from left to right. The second row shows the semantic features at the antepenultimate layer, with the same range of perturbations. Red dots represen… view at source ↗

**Figure 9.** Figure 9: Imperceptible Adversarial Perturbations on ImageNet-100. For each pair, the top image is the original non-member, and the bottom image is the corresponding perturbed fabricated member, demonstrating that the perturbations are imperceptible to the human eyes. We used ϵ = 2/255 for Bϵ[x] here [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗

**Figure 10.** Figure 10: Imperceptible Adversarial Perturbations on ImageNet-100. For each pair, the top image is the original non-member, and the bottom image is the corresponding perturbed fabricated member, demonstrating that the perturbations are imperceptible to the human eyes. We used ϵ = 2/255 for Bϵ[x] here [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗

**Figure 11.** Figure 11: Imperceptible Adversarial Perturbations on ImageNet-100. For each pair, the top image is the original non-member, and the bottom image is the corresponding perturbed fabricated member, demonstrating that the perturbations are imperceptible to the human eyes. We used ϵ = 2/255 for Bϵ[x] here [PITH_FULL_IMAGE:figures/full_fig_p023_11.png] view at source ↗

**Figure 12.** Figure 12: Comparison of the Error Area Between Our Member Fabrication Attack and Baselines Across Diverse MIAs (∥δ∥∞ ≤ 4.0/255) [PITH_FULL_IMAGE:figures/full_fig_p024_12.png] view at source ↗

**Figure 13.** Figure 13: Comparison of the Error Area Between Our Member Fabrication Attack and Baselines Across Diverse Perturbation Bounds on CIFAR-10 [PITH_FULL_IMAGE:figures/full_fig_p025_13.png] view at source ↗

**Figure 14.** Figure 14: Comparison of the Error Area Between Our Member Fabrication Attack and Baselines Across Diverse Perturbation Bounds on CIFAR-100 [PITH_FULL_IMAGE:figures/full_fig_p026_14.png] view at source ↗

**Figure 15.** Figure 15: Comparison of the Error Area Between Our Member Fabrication Attack and Baselines Across Diverse Perturbation Bounds on CINIC-10 [PITH_FULL_IMAGE:figures/full_fig_p027_15.png] view at source ↗

**Figure 16.** Figure 16: Comparison of the Error Area Between Our Member Fabrication Attack and Baselines Across Diverse Perturbation Bounds on SVHN [PITH_FULL_IMAGE:figures/full_fig_p028_16.png] view at source ↗

**Figure 17.** Figure 17: Comparison of the Error Area Between Our Member Fabrication Attack and Baselines Across Diverse Perturbation Bounds on ImageNet-100 [PITH_FULL_IMAGE:figures/full_fig_p029_17.png] view at source ↗

**Figure 18.** Figure 18: Comparison of the ROC Curve for Our Member Fabrication Detection Across Diverse Perturbation Bounds on CIFAR-10 [PITH_FULL_IMAGE:figures/full_fig_p030_18.png] view at source ↗

**Figure 19.** Figure 19: Comparison of the ROC Curve for Our Member Fabrication Detection Across Diverse Perturbation Bounds on CIFAR-100 [PITH_FULL_IMAGE:figures/full_fig_p031_19.png] view at source ↗

**Figure 20.** Figure 20: Comparison of the ROC Curve for Our Member Fabrication Detection Across Diverse Perturbation Bounds on CINIC-10 [PITH_FULL_IMAGE:figures/full_fig_p032_20.png] view at source ↗

**Figure 21.** Figure 21: Comparison of the ROC Curve for Our Member Fabrication Detection Across Diverse Perturbation Bounds on SVHN [PITH_FULL_IMAGE:figures/full_fig_p033_21.png] view at source ↗

**Figure 22.** Figure 22: Comparison of the ROC Curve for Our Member Fabrication Detection Across Diverse Perturbation Bounds on ImageNet-100 [PITH_FULL_IMAGE:figures/full_fig_p034_22.png] view at source ↗

**Figure 23.** Figure 23: Comparison of ROC Curves for Our Adversarially Robust MIAs and Baselines on CIFAR-10 [PITH_FULL_IMAGE:figures/full_fig_p035_23.png] view at source ↗

**Figure 24.** Figure 24: Comparison of ROC Curves for Our Adversarially Robust MIAs and Baselines on CIFAR-100 [PITH_FULL_IMAGE:figures/full_fig_p036_24.png] view at source ↗

**Figure 25.** Figure 25: Comparison of ROC Curves for Our Adversarially Robust MIAs and Baselines on SVHN [PITH_FULL_IMAGE:figures/full_fig_p037_25.png] view at source ↗

**Figure 26.** Figure 26: Comparison of ROC Curves for Our Adversarially Robust MIAs and Baselines on CINIC-10 [PITH_FULL_IMAGE:figures/full_fig_p038_26.png] view at source ↗

read the original abstract

Membership inference attacks (MIAs) aim to determine whether a specific data point was part of a model's training set, serving as effective tools for evaluating privacy leakage of vision models. However, existing MIAs implicitly assume honest query inputs, and their adversarial robustness remains unexplored. We show that MIAs for vision models expose a previously overlooked adversarial surface: adversarial membership manipulation, where imperceptible perturbations can reliably push non-member images into the "member" region of state-of-the-art MIAs. In this paper, we provide the first unified perspective on this phenomenon by analyzing its mechanism and implications. We begin by demonstrating that adversarial membership fabrication is consistently effective across diverse architectures and datasets. We then reveal a distinctive geometric signature - a characteristic gradient-norm collapse trajectory - that reliably separates fabricated from true members despite their nearly identical semantic representations. Building on this insight, we introduce a principled detection strategy grounded in gradient-geometry signals and develop a robust inference framework that substantially mitigates adversarial manipulation. Extensive experiments show that fabrication is broadly effective, while our detection and robust inference strategies significantly enhance resilience. This work establishes the first comprehensive framework for adversarial membership manipulation in vision models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows adversarial perturbations can reliably fool MIAs on vision models by inducing gradient-norm collapse, with a proposed detector that works against non-adaptive attacks but remains untested against adaptive ones.

read the letter

The core finding is that small perturbations can push non-member images into the member region for existing MIAs across vision models, and the authors identify a gradient-norm collapse trajectory as a way to flag these cases. This is presented as a new adversarial surface on privacy tools. The work does a decent job laying out the mechanism with experiments on multiple architectures and datasets, showing fabrication success is consistent rather than isolated. The geometric signature idea is a concrete observation that prior MIA papers do not appear to have used for detection. They also sketch a robust inference approach built on it. That part earns credit for moving beyond just pointing out the problem. The soft spot is the lack of adaptive attack testing. The detection relies on the collapse trajectory separating fabricated from true members, but an adversary aware of the detector could add a term to match true-member gradients. The reported results use standard objectives without that knowledge, so the claim that the framework substantially mitigates manipulation is not yet stress-tested. Minor issues include limited detail on exact thresholds and data splits in the high-level description, but nothing that breaks the main empirical pattern. This paper is for researchers working on membership inference, privacy auditing, or adversarial robustness in vision models. A reader who evaluates or deploys MIAs would find the attack surface and the proposed signature useful to consider. It deserves a serious referee because the observation is grounded in experiments and opens a practical question about how MIAs behave under perturbation, even if the defense needs more adaptive evaluation.

Referee Report

2 major / 2 minor

Summary. The paper claims that membership inference attacks (MIAs) on vision models have an overlooked adversarial surface: imperceptible perturbations can fabricate membership for non-member images with high success across architectures and datasets. It identifies a characteristic gradient-norm collapse trajectory as a geometric signature that separates fabricated members from true members despite similar semantics. Building on this, the authors propose a gradient-geometry-based detection strategy and a robust inference framework that substantially mitigates fabrication success. Extensive experiments are reported to show consistent fabrication effectiveness and improved resilience under the proposed defenses.

Significance. If the gradient-norm separation and detection hold under realistic conditions, the work provides the first unified framework for adversarial membership manipulation in vision models. It highlights a new privacy leakage vector beyond standard MIAs and offers concrete detection and robust inference tools. Strengths include the broad experimental coverage across architectures/datasets and the insight that geometric signatures can distinguish fabricated from true members even when semantic representations are nearly identical.

major comments (2)

[§5] §5 and abstract: The central claim that the robust inference framework 'substantially mitigates' adversarial fabrication rests on the gradient-norm collapse trajectory reliably separating classes under the paper's threat model. However, all reported experiments (§4) use standard PGD-style objectives without knowledge of the detector. No evaluation is provided for adaptive adversaries that explicitly optimize perturbations to match the gradient-norm trajectory of true members (e.g., by adding a matching term to the loss). This is load-bearing for the resilience conclusion.
[§4] §4 (experimental setup): The soundness of the fabrication effectiveness and detection claims cannot be fully verified without reported data splits, exact MIA implementations, statistical significance tests (e.g., confidence intervals on success rates), or ablation on the gradient-norm threshold choice. The abstract notes consistent effectiveness, but missing details prevent independent reproduction of the separation results.

minor comments (2)

[§3] Notation for the gradient-norm trajectory and detection threshold should be formalized with an equation in §3 to improve clarity and reproducibility.
[Figures in §4] Figure captions and axis labels in the geometric signature plots could be expanded to explicitly state what 'collapse trajectory' quantifies (e.g., norm over perturbation steps).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below and will incorporate revisions to strengthen the reproducibility and robustness claims of the manuscript.

read point-by-point responses

Referee: [§5] §5 and abstract: The central claim that the robust inference framework 'substantially mitigates' adversarial fabrication rests on the gradient-norm collapse trajectory reliably separating classes under the paper's threat model. However, all reported experiments (§4) use standard PGD-style objectives without knowledge of the detector. No evaluation is provided for adaptive adversaries that explicitly optimize perturbations to match the gradient-norm trajectory of true members (e.g., by adding a matching term to the loss). This is load-bearing for the resilience conclusion.

Authors: We agree that evaluating resilience against adaptive adversaries who are aware of the detector is necessary to fully support the claim. In the revised manuscript, we will add experiments in which the adversary augments the PGD objective with an explicit term that penalizes deviation from the gradient-norm trajectory of true members. We will report fabrication success rates under this stronger threat model and discuss any degradation in the separation provided by the gradient-norm signal. revision: yes
Referee: [§4] §4 (experimental setup): The soundness of the fabrication effectiveness and detection claims cannot be fully verified without reported data splits, exact MIA implementations, statistical significance tests (e.g., confidence intervals on success rates), or ablation on the gradient-norm threshold choice. The abstract notes consistent effectiveness, but missing details prevent independent reproduction of the separation results.

Authors: We acknowledge the need for greater experimental transparency. In the revision we will (i) explicitly state the train/test splits and random seeds used for all datasets, (ii) provide precise pseudocode and hyper-parameter settings for each MIA implementation, (iii) report 95% confidence intervals on all success-rate figures, and (iv) include an ablation study varying the gradient-norm threshold to demonstrate the stability of the detection performance. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical observations and strategies are independent of fitted inputs or self-citations

full rationale

The paper presents no mathematical derivations, equations, or parameter-fitting steps that reduce to self-definition or prior results by construction. Claims of fabrication effectiveness and the gradient-norm collapse trajectory as a separating signature are introduced as empirical findings from experiments across architectures and datasets. The detection strategy and robust inference framework are built directly on these observed geometric signals without invoking self-citations, uniqueness theorems, or ansatzes that loop back to the paper's own inputs. No load-bearing self-citation chains or renamed known results appear in the abstract or described content. The work is self-contained via external experimental validation, consistent with a score of 0.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard assumptions from membership inference literature plus an empirically observed geometric signature; no new free parameters or invented entities are explicitly introduced beyond experimental thresholds.

free parameters (1)

gradient-norm threshold
Likely used to separate collapse trajectories in the detection strategy; value not stated in abstract.

axioms (1)

domain assumption MIAs can be evaluated via output or gradient signals from the target model
Standard premise in membership inference research invoked throughout the abstract.

pith-pipeline@v0.9.0 · 5501 in / 1110 out tokens · 39199 ms · 2026-05-13T20:01:40.294456+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

74 extracted references · 74 canonical work pages · 1 internal anchor

[1]

Quantifying membership inference vulner- ability via generalization gap and other model metrics.arXiv preprint arXiv:2009.05669, 2020

Jason W Bentley, Daniel Gibney, Gary Hoppenworth, and Sumit Kumar Jha. Quantifying membership inference vulner- ability via generalization gap and other model metrics.arXiv preprint arXiv:2009.05669, 2020. 13

work page arXiv 2009
[2]

A survey of black-box adversarial attacks on computer vision models.arXiv preprint arXiv:1912.01667,

Siddhant Bhambri, Sumanyu Muku, Avinash Tulasi, and Arun Balaji Buduru. A survey of black-box adversarial attacks on computer vision models.arXiv preprint arXiv:1912.01667,

work page arXiv 1912
[3]

Pattern recognition.Machine learning, 128(9), 2006

Christopher M Bishop. Pattern recognition.Machine learning, 128(9), 2006. 13

work page 2006
[4]

Adversarial examples are not easily detected: Bypassing ten detection methods

Nicholas Carlini and David Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods. In ACM Workshop on Artificial Intelligence and Security, 2017. 13

work page 2017
[5]

Towards evaluating the robustness of neural networks

Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. InCVPR, 2017. 3, 14, 19, 20

work page 2017
[6]

The secret sharer: Evaluating and testing unintended memorization in neural networks

Nicholas Carlini, Chang Liu, ´Ulfar Erlingsson, Jernej Kos, and Dawn Song. The secret sharer: Evaluating and testing unintended memorization in neural networks. InUSENIX Security, 2019. 13

work page 2019
[7]

Membership inference attacks from first principles

Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, and Florian Tramer. Membership inference attacks from first principles. InIEEE Symposium on Security and Privacy, 2022. 1, 2, 3, 4, 8, 12, 13, 18, 19, 20

work page 2022
[8]

Deepdriving: Learning affordance for direct perception in autonomous driving

Chenyi Chen, Ari Seff, Alain Kornhauser, and Jianxiong Xiao. Deepdriving: Learning affordance for direct perception in autonomous driving. InICCV, 2015. 13

work page 2015
[9]

Gan- leaks: A taxonomy of membership inference attacks against generative models

Dingfan Chen, Ning Yu, Yang Zhang, and Mario Fritz. Gan- leaks: A taxonomy of membership inference attacks against generative models. InACM SIGSAC Conference on Computer and Communications Security, 2020. 13

work page 2020
[10]

When machine unlearn- ing jeopardizes privacy

Min Chen, Zhikun Zhang, Tianhao Wang, Michael Backes, Mathias Humbert, and Yang Zhang. When machine unlearn- ing jeopardizes privacy. InACM SIGSAC Conference on Computer and Communications Security, 2021. 12

work page 2021
[11]

Adversarial robustness: From self-supervised pre-training to fine-tuning

Tianlong Chen, Sijia Liu, Shiyu Chang, Yu Cheng, Lisa Amini, and Zhangyang Wang. Adversarial robustness: From self-supervised pre-training to fine-tuning. InCVPR, 2020. 14

work page 2020
[12]

Label-only membership infer- ence attacks

Christopher A Choquette-Choo, Florian Tramer, Nicholas Carlini, and Nicolas Papernot. Label-only membership infer- ence attacks. InICML, 2021. 12

work page 2021
[13]

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter- free attacks

Francesco Croce and Matthias Hein. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter- free attacks. InICML, 2020. 3, 4, 8, 14, 19, 20

work page 2020
[14]

Cinic-10 is not imagenet or cifar-10.arXiv preprint arXiv:1810.03505, 2018

Luke N Darlow, Elliot J Crowley, Antreas Antoniou, and Amos J Storkey. Cinic-10 is not imagenet or cifar-10.arXiv preprint arXiv:1810.03505, 2018. 8, 19, 20

work page arXiv 2018
[15]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. InCVPR, 2009. 8, 19, 20

work page 2009
[16]

Model inversion attacks that exploit confidence information and basic countermeasures

Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. Model inversion attacks that exploit confidence information and basic countermeasures. InACM SIGSAC Conference on Computer and Communications Security, 2015. 12

work page 2015
[17]

Property inference attacks on fully connected neural networks using permutation invariant representations

Karan Ganju, Qi Wang, Wei Yang, Carl A Gunter, and Nikita Borisov. Property inference attacks on fully connected neural networks using permutation invariant representations. In ACM SIGSAC Conference on Computer and Communications Security, 2018. 12

work page 2018
[18]

Maximum mean discrepancy test is aware of adversarial attacks

Ruize Gao, Feng Liu, Jingfeng Zhang, Bo Han, Tongliang Liu, Gang Niu, and Masashi Sugiyama. Maximum mean discrepancy test is aware of adversarial attacks. InICML,

work page
[19]

Fast and reliable evaluation of adversarial robustness with minimum- margin attack

Ruize Gao, Jiongxiao Wang, Kaiwen Zhou, Feng Liu, Binghui Xie, Gang Niu, Bo Han, and James Cheng. Fast and reliable evaluation of adversarial robustness with minimum- margin attack. InICML, 2022. 3, 14, 19

work page 2022
[20]

Explaining and harnessing adversarial examples

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. InICLR,

work page
[21]

On the ( Statistical ) Detection of Adversarial Examples

Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, and Patrick McDaniel. On the (statistical) detection of adversarial examples.arXiv:1702.06280, 2017. 14

work page arXiv 2017
[22]

Simple black-box adversarial attacks

Chuan Guo, Jacob Gardner, Yurong You, Andrew Gordon Wilson, and Kilian Weinberger. Simple black-box adversarial attacks. InICML, 2019. 4

work page 2019
[23]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InCVPR, 2016. 8, 12, 19

work page 2016
[24]

Sorami Hisamoto, Matt Post, and Kevin Duh. Membership inference attacks on sequence-to-sequence models: Is my data in your machine translation system?Transactions of the Association for Computational Linguistics, 8:49–63, 2020. 12

work page 2020
[25]

Resolving individuals contributing trace amounts of dna to highly complex mixtures using high-density snp genotyping microarrays.PLOS Genetics, 4:1–9, 2008

Nils Homer, Szabolcs Szelinger, Margot Redman, David Dug- gan, Waibhav Tembe, Jill Muehling, John V Pearson, Di- etrich A Stephan, Stanley F Nelson, and David W Craig. Resolving individuals contributing trace amounts of dna to highly complex mixtures using high-density snp genotyping microarrays.PLOS Genetics, 4:1–9, 2008. 1, 3

work page 2008
[26]

Scalable continuous-time diffusion framework for network inference and influence estimation

Keke Huang, Ruize Gao, Bogdan Cautis, and Xiaokui Xiao. Scalable continuous-time diffusion framework for network inference and influence estimation. InWWW, 2024. 12

work page 2024
[27]

Black-box adversarial attacks with limited queries and information

Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. Black-box adversarial attacks with limited queries and information. InICML, 2018. 4

work page 2018
[28]

Memguard: Defending against black- box membership inference attacks via adversarial examples

Jinyuan Jia, Ahmed Salem, Michael Backes, Yang Zhang, and Neil Zhenqiang Gong. Memguard: Defending against black- box membership inference attacks via adversarial examples. InProceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (CCS), 2019. 1

work page 2019
[29]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009. 8, 18, 19, 20

work page 2009
[30]

Adver- sarial examples in the physical world

Alexey Kurakin, Ian Goodfellow, Samy Bengio, et al. Adver- sarial examples in the physical world. InICLR, 2017. 13, 19, 20

work page 2017
[31]

A simple unified framework for detecting out-of-distribution samples and adversarial attacks

Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. InNeurIPS, 2018. 5, 14

work page 2018
[32]

Stolen memories: Leverag- ing model memorization for calibrated white-box membership inference

Klas Leino and Matt Fredrikson. Stolen memories: Leverag- ing model memorization for calibrated white-box membership inference. InUSENIX Security, 2020. 12, 13, 18

work page 2020
[33]

Membership in- ference attacks and defenses in classification models

Jiacheng Li, Ninghui Li, and Bruno Ribeiro. Membership in- ference attacks and defenses in classification models. InACM Conference on Data and Application Security and Privacy,

work page
[34]

Adversarial examples detection in deep networks with convolutional filter statistics

Xin Li and Fuxin Li. Adversarial examples detection in deep networks with convolutional filter statistics. InICCV, 2017. 14

work page 2017
[35]

SGDR: Stochastic Gradient Descent with Warm Restarts

Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient descent with warm restarts.arXiv preprint arXiv:1608.03983,

work page internal anchor Pith review Pith/arXiv arXiv
[36]

Characterizing adversarial sub- spaces using local intrinsic dimensionality

Xingjun Ma, Bo Li, Yisen Wang, Sarah M Erfani, Sudanthi Wijewickrema, Grant Schoenebeck, Dawn Song, Michael E Houle, and James Bailey. Characterizing adversarial sub- spaces using local intrinsic dimensionality. InICLR, 2018. 5, 14

work page 2018
[37]

Understanding adversarial at- tacks on deep learning based medical image analysis systems

Xingjun Ma, Yuhao Niu, Lin Gu, Yisen Wang, Yitian Zhao, James Bailey, and Feng Lu. Understanding adversarial at- tacks on deep learning based medical image analysis systems. Pattern Recognition, 2021. 13

work page 2021
[38]

Visualizing data using t-sne.Journal of machine learning research, 9 (Nov):2579–2605, 2008

Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of machine learning research, 9 (Nov):2579–2605, 2008. 5, 17, 19

work page 2008
[39]

Towards deep learning models resistant to adversarial attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. InICLR, 2018. 3, 4, 8, 13, 19, 20

work page 2018
[40]

Exploiting unintended feature leakage in collaborative learning

Luca Melis, Congzheng Song, Emiliano De Cristofaro, and Vitaly Shmatikov. Exploiting unintended feature leakage in collaborative learning. InIEEE Symposium on Security and Privacy, 2019. 18

work page 2019
[41]

On detecting adversarial perturbations

Jan Hendrik Metzen, Tim Genewein, V olker Fischer, and Bastian Bischoff. On detecting adversarial perturbations. arXiv:1702.04267, 2017. 14

work page arXiv 2017
[42]

Deep learning for healthcare: review, opportunities and challenges.Briefings in bioinformatics, 19 (6):1236–1246, 2018

Riccardo Miotto, Fei Wang, Shuang Wang, Xiaoqian Jiang, and Joel T Dudley. Deep learning for healthcare: review, opportunities and challenges.Briefings in bioinformatics, 19 (6):1236–1246, 2018. 12

work page 2018
[43]

Ml privacy meter: Aiding regulatory compliance by quantifying the privacy risks of machine learning.arXiv preprint arXiv:2007.09339, 2020

Sasi Kumar Murakonda and Reza Shokri. Ml privacy meter: Aiding regulatory compliance by quantifying the privacy risks of machine learning.arXiv preprint arXiv:2007.09339, 2020. 13, 14, 18

work page arXiv 2007
[44]

Machine learning with membership privacy using adversarial regu- larization

Milad Nasr, Reza Shokri, and Amir Houmansadr. Machine learning with membership privacy using adversarial regu- larization. InACM SIGSAC Conference on Computer and Communications Security, 2018. 12

work page 2018
[45]

Compre- hensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning

Milad Nasr, Reza Shokri, and Amir Houmansadr. Compre- hensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning. InIEEE symposium on security and privacy, 2019. 7, 12

work page 2019
[46]

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y . Ng. Reading digits in natural images with unsupervised feature learning. InNeurIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011. 19

work page 2011
[47]

Jerzy Neyman and Egon Sharpe Pearson. Ix. on the problem of the most efficient tests of statistical hypotheses.Philosoph- ical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 231(694-706):289–337, 1933. 12

work page 1933
[48]

Deep neural networks are easily fooled: High confidence predictions for unrecognizable images

Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. InCVPR, 2015. 13

work page 2015
[49]

Curtail: Characterizing and thwarting adversarial deep learning.arXiv:1709.02538, 2017

Bita Darvish Rouhani, Mohammad Samragh, Tara Javidi, and Farinaz Koushanfar. Curtail: Characterizing and thwarting adversarial deep learning.arXiv:1709.02538, 2017. 14

work page arXiv 2017
[50]

White-box vs black- box: Bayes optimal strategies for membership inference

Alexandre Sablayrolles, Matthijs Douze, Yann Ollivier, Cordelia Schmid, and Herv ´e J ´egou. White-box vs black- box: Bayes optimal strategies for membership inference. In ICML, 2019. 12

work page 2019
[51]

Ml-leaks: Model and data indepen- dent membership inference attacks and defenses on machine learning models

Ahmed Salem, Yang Zhang, Mathias Humbert, Mario Fritz, and Michael Backes. Ml-leaks: Model and data indepen- dent membership inference attacks and defenses on machine learning models. InAnnual Network and Distributed System Security Symposium, 2019. 12, 13, 18

work page 2019
[52]

Hats: Hardness- aware trajectory synthesis for gui agents

Rui Shao, Ruize Gao, Bin Xie, Yixing Li, Kaiwen Zhou, Shuai Wang, Weili Guan, and Gongwei Chen. Hats: Hardness- aware trajectory synthesis for gui agents. InCVPR, 2026. 12

work page 2026
[53]

Membership inference attacks against machine learning models

Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. InIEEE Symposium on Security and Privacy,

work page
[54]

Machine learning models that remember too much

Congzheng Song, Thomas Ristenpart, and Vitaly Shmatikov. Machine learning models that remember too much. InACM SIGSAC Conference on Computer and Communications Se- curity, 2017. 13

work page 2017
[55]

Systematic evaluation of pri- vacy risks of machine learning models

Liwei Song and Prateek Mittal. Systematic evaluation of pri- vacy risks of machine learning models. InUSENIX Security,

work page
[56]

Introducing a new privacy testing library in tensorflow, 2020

Shuang Song and David Marn. Introducing a new privacy testing library in tensorflow, 2020. 14

work page 2020
[57]

Intriguing properties of neural networks

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. InICLR, 2014. 3, 13

work page 2014
[58]

Stealing machine learning models via prediction apis

Florian Tram`er, Fan Zhang, Ari Juels, Michael K Reiter, and Thomas Ristenpart. Stealing machine learning models via prediction apis. InUSENIX Security, 2016. 12

work page 2016
[59]

The stronger the diffusion model, the easier the backdoor: Data poisoning to induce copyright breaches without adjusting finetuning pipeline

Haonan Wang, Qianli Shen, Yao Tong, Yang Zhang, and Kenji Kawaguchi. The stronger the diffusion model, the easier the backdoor: Data poisoning to induce copyright breaches without adjusting finetuning pipeline. InICML, 2024. 14

work page 2024
[60]

On the convergence and robustness of adversarial training

Yisen Wang, Xingjun Ma, James Bailey, Jinfeng Yi, Bowen Zhou, and Quanquan Gu. On the convergence and robustness of adversarial training. InICML, 2019. 13, 14

work page 2019
[61]

On the importance of difficulty calibra- tion in membership inference attacks.arXiv preprint arXiv:2111.08440, 2021

Lauren Watson, Chuan Guo, Graham Cormode, and Alex Sablayrolles. On the importance of difficulty calibra- tion in membership inference attacks.arXiv preprint arXiv:2111.08440, 2021. 1, 3, 12, 18

work page arXiv 2021
[62]

Adversarial weight perturbation helps robust generalization

Dongxian Wu, Shu-Tao Xia, and Yisen Wang. Adversarial weight perturbation helps robust generalization. InNeurIPS,

work page
[63]

The human splicing code reveals new insights into the genetic determinants of disease.Science, 347(6218), 2015

Hui Y Xiong, Babak Alipanahi, Leo J Lee, Hannes Bretschneider, Daniele Merico, Ryan KC Yuen, Yimin Hua, Serge Gueroussov, Hamed S Najafabadi, Timothy R Hughes, et al. The human splicing code reveals new insights into the genetic determinants of disease.Science, 347(6218), 2015. 12

work page 2015
[64]

Enhanced membership in- ference attacks against machine learning models

Jiayuan Ye, Aadyaa Maddi, Sasi Kumar Murakonda, Vincent Bindschaedler, and Reza Shokri. Enhanced membership in- ference attacks against machine learning models. InACM SIGSAC Conference on Computer and Communications Se- curity, 2022. 1, 2, 3, 4, 8, 12, 18, 19

work page 2022
[65]

Privacy risk in machine learning: Analyzing the connection to overfitting

Samuel Yeom, Irene Giacomelli, Matt Fredrikson, and Somesh Jha. Privacy risk in machine learning: Analyzing the connection to overfitting. InIEEE computer security foundations symposium, 2018. 1, 3, 4, 8, 12, 13, 19

work page 2018
[66]

Wide residual networks

Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. InBMVC, 2016. 8, 19

work page 2016
[67]

Low-cost high-power membership inference attacks

Sajjad Zarifzadeh, Philippe Liu, and Reza Shokri. Low-cost high-power membership inference attacks. InICML, 2024. 1, 3, 4, 8, 12, 19, 20

work page 2024
[68]

Understanding deep learning (still) re- quires rethinking generalization.Communications of the ACM, 64:107–115, 2021

Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning (still) re- quires rethinking generalization.Communications of the ACM, 64:107–115, 2021. 13

work page 2021
[69]

Geometry-aware instance-reweighted adversarial training

Jingfeng Zhang, Jianing Zhu, Gang Niu, Bo Han, Masashi Sugiyama, and Mohan Kankanhalli. Geometry-aware instance-reweighted adversarial training. InICLR, 2021. 14

work page 2021
[70]

Dual-path distillation: A unified framework to improve black- box attacks

Yonggang Zhang, Ya Li, Tongliang Liu, and Xinmei Tian. Dual-path distillation: A unified framework to improve black- box attacks. InICML, 2020. 13 A. Related Works A.1. Privacy Risks in Machine Learning Machine learning(ML) models, particularly deep neural networks, have become integral to advancements in vari- ous high-stakes domains such as healthcare, ...

work page 2020
[71]

Instead of taking a single step of size ϵ in the direction of the gradient sign, multiple smaller steps are taken in PGD (the result is clipped by the same ϵ)

introduced a simple refinement of FGSM, which is the projected gradient descent (PGD) attack. Instead of taking a single step of size ϵ in the direction of the gradient sign, multiple smaller steps are taken in PGD (the result is clipped by the same ϵ). Specifically, we start with setting x0 =x , and then in each iteration: x′ (t+1) = ΠBϵ[x(0)](x′ (t) +αs...

work page
[72]

Motivated by this, [5] replaced the CE loss with several possible choices

observed the phenomenon of gradient vanishing in the widely used CE loss for potential failure. Motivated by this, [5] replaced the CE loss with several possible choices. Among these choices, the widely used one for the untargeted attack is CW(x, y) =−z y(x′) + max i̸=y zi(x′).(21) wherezis the logits of the model outputs. AutoAttack and Minimum Margin at...

work page
[73]

gradient-norm collapse

argued that the high computational cost ofAutoAttack is unnecessary for identifyingthe most adversarial example: Definition 6(The most adversarial example).Given a natu- ral example x with its true label y and, the most adversarial examplex ∗ withinB ϵ[x]is defined as: ∀x′ ∈ B ϵ[x], x∗ = arg max x′ −(zy(x′)−max i̸=y zi(x′)), (22) where Bϵ[x] ={x ′ |d ∞(x,...

work page
[74]

Shadow Models.ForAttack R[ 64], we train 100 reference models (OUT-Models)

The learning rate is initialized to τ= 0.1 and follows a cosine annealing schedule, gradually decaying to zero over 100 epochs. Shadow Models.ForAttack R[ 64], we train 100 reference models (OUT-Models). ForLiRA[ 7] andRMIA[ 67], we train 100 IN-Models and 100 OUT-Models for modeling ˜Qin/out, and ensure that the same shadow models are used across differe...

work page arXiv 1948