Semantic Membership Inference Attack against Large Language Models
read the original abstract
Membership Inference Attacks (MIAs) determine whether a specific data point was included in the training set of a target model. In this paper, we introduce the Semantic Membership Inference Attack (SMIA), a novel approach that enhances MIA performance by leveraging the semantic content of inputs and their perturbations. SMIA trains a neural network to analyze the target model's behavior on perturbed inputs, effectively capturing variations in output probability distributions between members and non-members. We conduct comprehensive evaluations on the Pythia and GPT-Neo model families using the Wikipedia dataset. Our results show that SMIA significantly outperforms existing MIAs; for instance, SMIA achieves an AUC-ROC of 67.39% on Pythia-12B, compared to 58.90% by the second-best attack.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
Loss Landscape Poisoning: Targeted Extraction of Unseen Training Data from LLMs
Poisoning training data reshapes the loss landscape to enable targeted extraction of unseen data from LLMs with high success rates in language and vision-language models.
-
Data Compressibility Quantifies LLM Memorization
Set-level data entropy estimators show linear correlation with LLM memorization scores, forming the Entropy-Memorization Linearity.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.