Membership Inference Attacks From First Principles

Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, Florian Tramer · 2022 · arXiv 2112.03570

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

MRMMIA: Membership Inference Attacks on Memory in Chat Agents

cs.CR · 2026-05-27 · unverdicted · novelty 7.0

MRMMIA is a multi-recall-probe membership inference attack that extracts signals from chat agent memory and outperforms baselines in black-, gray-, and white-box settings.

Robust Statistical Estimators with Bounded Empirical Sensitivity

math.ST · 2026-05-21 · conditional · novelty 7.0

Defines empirical sensitivity and proves Ω(η + √(η d/n)) lower bound (tight up to logs) for any Gaussian mean estimator achieving optimal O(√(d/n)) ℓ₂ error.

Quantifying the Agreement Between Data-Influence and Data-Similarity to Understand LLM Behavior

cs.LG · 2026-06-22 · unverdicted · novelty 6.0

Data-similarity and data-influence produce significantly overlapping rankings of training documents for LLM outputs, with asymmetry allowing a favorable cost-accuracy trade-off.

idSCD: Identifying Training Datasets through Semantic Correlation Descriptors

cs.LG · 2026-05-28 · unverdicted · novelty 6.0

idSCD uses semantic correlation descriptors to perform dataset membership inference by comparing learned semantic structures, outperforming baselines in NLI, emotion, and medical text experiments.

Extracting memorized pieces of (copyrighted) books from open-weight language models

cs.CL · 2025-05-18 · conditional · novelty 6.0

A new extraction technique applied to 200 books and 14 LLMs finds that memorization of full books is rare except in specific high-capacity models where entire texts can be recovered verbatim.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Quantifying the Agreement Between Data-Influence and Data-Similarity to Understand LLM Behavior cs.LG · 2026-06-22 · unverdicted · none · ref 63
Data-similarity and data-influence produce significantly overlapping rankings of training documents for LLM outputs, with asymmetry allowing a favorable cost-accuracy trade-off.
idSCD: Identifying Training Datasets through Semantic Correlation Descriptors cs.LG · 2026-05-28 · unverdicted · none · ref 5
idSCD uses semantic correlation descriptors to perform dataset membership inference by comparing learned semantic structures, outperforming baselines in NLI, emotion, and medical text experiments.

Membership Inference Attacks From First Principles

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer