Hey, That's My Data! Token-Only Dataset Inference in Large Language Models

Chen Xiong; Haixu Tang; Jingwei Xiong; Pin-Yu Chen; Rui Zhu; Tsung-Yi Ho; Zihao Wang

arxiv: 2506.06057 · v2 · submitted 2025-06-06 · 💻 cs.CL · cs.AI

Hey, That's My Data! Token-Only Dataset Inference in Large Language Models

Chen Xiong , Zihao Wang , Rui Zhu , Tsung-Yi Ho , Pin-Yu Chen , Jingwei Xiong , Haixu Tang This is my paper

Pith reviewed 2026-05-19 10:58 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords dataset inferencelarge language modelscatastrophic forgettingtoken-only accessmembership inferenceproprietary dataLLM output shiftstraining data detection

0 comments

The pith

CatShift detects whether a dataset trained an LLM by comparing larger output shifts from fine-tuning on it versus on unseen data, using only generated tokens.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CatShift as a token-only method to infer if a dataset was included in an LLM's training. It builds on the observation that fine-tuning on already-seen data produces bigger changes in the model's token outputs due to overwriting earlier knowledge. These changes are measured against shifts from fine-tuning on a known non-member set to decide membership. A reader would care because the approach works even when models hide log probabilities, letting data owners check for unauthorized use in both open and closed systems.

Core claim

CatShift is a token-only dataset inference framework based on catastrophic forgetting, where models overwrite prior knowledge when trained on new data. Fine-tuning an LLM on a subset of its training data induces larger output shifts than fine-tuning on unseen data. CatShift compares these shifts against those from a known non-member validation set to infer whether a dataset was included in training.

What carries the argument

CatShift framework, which measures output shifts after fine-tuning on a suspect dataset and compares them to shifts from a known non-member validation set to detect membership via catastrophic forgetting.

Load-bearing premise

Catastrophic forgetting produces reliably larger and distinguishable output shifts specifically for training data subsets rather than for other factors such as data similarity or fine-tuning details.

What would settle it

Experiments that find output shifts of similar size after fine-tuning on training subsets and on unrelated new data would show the method cannot distinguish membership.

Figures

Figures reproduced from arXiv: 2506.06057 by Chen Xiong, Haixu Tang, Jingwei Xiong, Pin-Yu Chen, Rui Zhu, Tsung-Yi Ho, Zihao Wang.

**Figure 1.** Figure 1: An overview of the CatShift framework for dataset inference on label-only large language models (LLMs). The [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗

**Figure 2.** Figure 2: p-values of dataset inference by applying dataset inference to the Pythia-410M model with 1000 data points. We [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: p-values of dataset inference by applying dataset inference to Pythia [ [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

read the original abstract

Large Language Models (LLMs) rely on massive training datasets, often including proprietary data, which raises concerns about unauthorized usage and copyright infringement. Existing dataset inference methods typically require access to log probabilities or other internal signals, but many modern LLMs restrict such access, motivating token-only inference approaches. We propose CatShift, a token-only dataset inference framework based on catastrophic forgetting, where models overwrite prior knowledge when trained on new data. Fine-tuning an LLM on a subset of its training data induces larger output shifts than fine-tuning on unseen data. CatShift compares these shifts against those from a known non-member validation set to infer whether a dataset was included in training. Experiments on both open-source and API-based LLMs show that CatShift remains effective without logit access, enabling practical protection of proprietary datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CatShift offers a practical token-only way to probe dataset membership in LLMs by measuring output shifts after fine-tuning, but the shifts may reflect data similarity more than true membership.

read the letter

The core contribution is a method called CatShift that detects whether a dataset was part of an LLM's training data using only token outputs. It fine-tunes the model on a suspect subset and compares the resulting output changes to those from fine-tuning on known non-member data, relying on the idea that original training data triggers more forgetting and thus larger shifts. This works for both open models and API-only ones where logits are unavailable. That addresses a clear gap, since most prior dataset inference needs internal probabilities that many deployed systems now hide. The experiments reportedly show the approach holds up across settings, which is a concrete step forward for practical auditing of proprietary data use. Credit to the authors for grounding the idea in catastrophic forgetting rather than just extending logit-based baselines. The main limitation is the risk that output shifts are driven by how close the fine-tuning data is to the model's pre-training distribution, not by whether the exact tokens were seen. If the non-member validation set is not tightly matched on topic, style, or embedding similarity, comparable shifts could appear for unrelated reasons. The abstract gives no sign of such matching or ablations, so the central claim needs tighter controls to rule out that confound. Without those, the method's specificity is harder to trust. This paper is for people working on data provenance, copyright enforcement, and model auditing in production LLMs. Readers who need black-box tools for checking training data will find the most direct value. It deserves a serious referee because the problem is timely, the token-only framing is new enough to matter, and the experiments provide a starting point even if revisions are required on the similarity controls. I would send it to review with a request to add those checks and report the quantitative gaps more clearly.

Referee Report

2 major / 2 minor

Summary. The paper proposes CatShift, a token-only dataset inference framework for LLMs that exploits catastrophic forgetting. The central claim is that fine-tuning on a subset of the model's original training data produces reliably larger output shifts (measured via token-level changes) than fine-tuning on unseen data; these shifts are compared against those induced by a known non-member validation set to infer membership. The method is evaluated on both open-source models and API-based LLMs without requiring logit access.

Significance. If the empirical distinction holds after proper controls, CatShift would be a meaningful advance for practical dataset inference in black-box settings, addressing copyright and data-provenance concerns where log-probability access is unavailable. The token-only design and applicability to closed models are practical strengths.

major comments (2)

[§3 and §4] §3 (Method) and §4 (Experiments): The central claim requires that output shifts are attributable to membership via catastrophic forgetting rather than distributional similarity between the fine-tuning subset and the pre-training distribution. No ablation or matching procedure (e.g., embedding cosine similarity, topic overlap, or lexical n-gram controls) between the member subset and the non-member validation set is described; without this, shifts could arise from any data that is distributionally close to the original training corpus independently of exact token membership.
[§4.1] §4.1 (Experimental setup): The paper reports effectiveness on open-source and API models but provides no quantitative results, error bars, or baseline comparisons in the available description. This makes it impossible to assess whether the observed shift differences are statistically reliable or practically distinguishable from noise or hyperparameter effects.

minor comments (2)

[§3] Notation for output shift (e.g., how token-level differences are aggregated across sequences) should be defined more explicitly with an equation or pseudocode to allow reproduction.
[Abstract and §1] The abstract and introduction would benefit from a clearer statement of the threat model (e.g., adversary capabilities and access assumptions) to situate CatShift relative to prior logit-based inference methods.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the referee's insightful comments on our paper. We have carefully considered the feedback and provide point-by-point responses below. We are committed to addressing the concerns to enhance the manuscript's robustness.

read point-by-point responses

Referee: [§3 and §4] §3 (Method) and §4 (Experiments): The central claim requires that output shifts are attributable to membership via catastrophic forgetting rather than distributional similarity between the fine-tuning subset and the pre-training distribution. No ablation or matching procedure (e.g., embedding cosine similarity, topic overlap, or lexical n-gram controls) between the member subset and the non-member validation set is described; without this, shifts could arise from any data that is distributionally close to the original training corpus independently of exact token membership.

Authors: We thank the referee for highlighting this important distinction. Our non-member validation set is selected from a held-out subset of the pre-training data to ensure it is distributionally similar yet not overlapping with the training set. However, to explicitly demonstrate that the larger shifts are due to membership and catastrophic forgetting rather than mere distributional proximity, we will incorporate matching procedures and ablations in the revised manuscript. This will include computing embedding similarities, topic overlaps, and n-gram statistics between the sets, and showing that the effect persists under these controls. These additions will be detailed in an expanded Section 4. revision: yes
Referee: [§4.1] §4.1 (Experimental setup): The paper reports effectiveness on open-source and API models but provides no quantitative results, error bars, or baseline comparisons in the available description. This makes it impossible to assess whether the observed shift differences are statistically reliable or practically distinguishable from noise or hyperparameter effects.

Authors: We apologize for any lack of clarity in the presentation. The full manuscript in Section 4.1 does include quantitative results on output shift differences for both open-source and API-based models, along with comparisons to baselines. To address the concern about statistical reliability, we will add explicit error bars, report standard deviations from repeated experiments, and include statistical tests in the revised version. This will make the distinguishability from noise and hyperparameter sensitivity clearer. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical comparison is self-contained

full rationale

The paper presents CatShift as an empirical token-only inference method that measures and compares output shifts induced by fine-tuning on a candidate dataset versus shifts from a known non-member validation set. No derivation chain, equations, or first-principles results are described that reduce by construction to fitted parameters, self-definitions, or load-bearing self-citations. The approach relies on observable differences in catastrophic forgetting effects, which are tested experimentally on open-source and API models rather than derived tautologically from the inputs themselves. This is a standard empirical procedure that remains falsifiable through independent replication and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that catastrophic forgetting produces measurably larger output shifts for member data; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Fine-tuning on training data produces larger output shifts due to catastrophic forgetting than fine-tuning on unseen data
This premise is invoked to justify the inference procedure described in the abstract.

pith-pipeline@v0.9.0 · 5681 in / 1137 out tokens · 22423 ms · 2026-05-19T10:58:03.460119+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

How to Compress KV Cache in RL Post-Training? Shadow Mask Distillation for Memory-Efficient Alignment
cs.LG 2026-05 unverdicted novelty 5.0

Shadow Mask Distillation enables KV cache compression in RL post-training of LLMs by mitigating amplified off-policy bias that defeats standard importance reweighting.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · cited by 1 Pith paper · 4 internal anchors

[1]

The claude 3 model family: Opus, sonnet, haiku

Anthropic. The claude 3 model family: Opus, sonnet, haiku. https://www-cdn.anthropic.com/ de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model Card Claude 3.pdf, March 2024

work page 2024
[2]

Pythia: A suite for analyzing large language models across training and scaling

Stella Biderman, Hailey Schoelkopf, Quentin Gregory Anthony, Her- bie Bradley, Kyle O’Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, and Oskar van der Wal. Pythia: A suite for analyzing large language models across training and scaling. In Andreas Krause, Emma Brunskill, Kyu...

work page 2023
[3]

GPT-Neo: Large-Scale Autoregressive Language Modeling with Mesh-TensorFlow, 2021

Sid Black, Leo Gao, Phil Wang, Connor Leahy, and Stella Bider- man. GPT-Neo: Large-Scale Autoregressive Language Modeling with Mesh-TensorFlow, 2021. Software available from Zenodo

work page 2021
[4]

Language models are few- shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few- shot learners. Advances in neural information processing systems , 33:1877–1901, 2020

work page 1901
[5]

Brown, Dawn Song, ´Ulfar Erlingsson, Alina Oprea, and Colin Raffel

Nicholas Carlini, Florian Tram `er, Eric Wallace, Matthew Jagielski, Ariel Herbert-V oss, Katherine Lee, Adam Roberts, Tom B. Brown, Dawn Song, ´Ulfar Erlingsson, Alina Oprea, and Colin Raffel. Ex- tracting training data from large language models. In 30th USENIX Security Symposium, USENIX Security 2021, August 11-13, 2021 , pages 2633–2650, 2021

work page 2021
[6]

Gan-leaks: A taxonomy of membership inference attacks against generative models

Dingfan Chen, Ning Yu, Yang Zhang, and Mario Fritz. Gan-leaks: A taxonomy of membership inference attacks against generative models. In CCS ’20: 2020 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, USA, November 9-13, 2020 , pages 343–362, 2020

work page 2020
[7]

Amplifying membership exposure via data poisoning

Yufei Chen, Chao Shen, Yun Shen, Cong Wang, and Yang Zhang. Amplifying membership exposure via data poisoning. In NeurIPS, 2022

work page 2022
[8]

Label-only membership inference attacks

Christopher A Choquette-Choo, Florian Tramer, Nicholas Carlini, and Nicolas Papernot. Label-only membership inference attacks. In International conference on machine learning , pages 1964–1974. PMLR, 2021

work page 1964
[9]

Blind baselines beat membership inference attacks for foundation models

Debeshee Das, Jie Zhang, and Florian Tram `er. Blind baselines beat membership inference attacks for foundation models. arXiv preprint arXiv:2406.16201, 2024

work page arXiv 2024
[10]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[11]

arXiv preprint arXiv:2402.07841 , year=

Michael Duan, Anshuman Suri, Niloofar Mireshghallah, Sewon Min, Weijia Shi, Luke Zettlemoyer, Yulia Tsvetkov, Yejin Choi, David Evans, and Hannaneh Hajishirzi. Do membership inference attacks work on large language models? arXiv preprint arXiv:2402.07841 , 2024

work page arXiv 2024
[12]

Privacy leakage on dnns: A survey of model inversion attacks and defenses

Hao Fang, Yixiang Qiu, Hongyao Yu, Wenbo Yu, Jiawei Kong, Baoli Chong, Bin Chen, Xuan Wang, and Shu-Tao Xia. Privacy leakage on dnns: A survey of model inversion attacks and defenses. CoRR, abs/2402.04013, 2024

work page arXiv 2024
[13]

Noisy neighbors: Efficient membership inference attacks against llms

Filippo Galli, Luca Melis, and Tommaso Cucinotta. Noisy neighbors: Efficient membership inference attacks against llms. arXiv preprint arXiv:2406.16565, 2024

work page arXiv 2024
[14]

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, Shawn Presser, and Connor Leahy. The pile: An 800gb dataset of diverse text for language modeling. CoRR, abs/2101.00027, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[15]

LOGAN: Membership Inference Attacks Against Generative Models

Jamie Hayes, Luca Melis, George Danezis, and Emiliano De Cristo- faro. LOGAN: evaluating privacy leakage of generative models using generative adversarial networks. CoRR, abs/1705.07663, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[16]

Monte carlo and reconstruction membership inference attacks against gen- erative models

Benjamin Hilprecht, Martin H ¨arterich, and Daniel Bernau. Monte carlo and reconstruction membership inference attacks against gen- erative models. Proc. Priv. Enhancing Technol. , 2019(4):232–249, 2019

work page 2019
[17]

Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low- rank adaptation of large language models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022 . OpenReview.net, 2022

work page 2022
[18]

Evaluating differentially private machine learning in practice

Bargav Jayaraman and David Evans. Evaluating differentially private machine learning in practice. In 28th USENIX Security Symposium, USENIX Security 2019, Santa Clara, CA, USA, August 14-16, 2019 , pages 1895–1912, 2019

work page 2019
[19]

Memguard: Defending against black-box membership inference attacks via adversarial examples

Jinyuan Jia, Ahmed Salem, Michael Backes, Yang Zhang, and Neil Zhenqiang Gong. Memguard: Defending against black-box membership inference attacks via adversarial examples. In Pro- ceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, CCS 2019, London, UK, November 11-15, 2019, pages 259–274, 2019

work page 2019
[20]

Yigitcan Kaya and Tudor Dumitras. When does data augmentation help with membership inference attacks? In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event , pages 5345–5355, 2021

work page 2021
[21]

Membership leakage in label-only exposures

Zheng Li and Yang Zhang. Membership leakage in label-only exposures. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security , pages 880–895, 2021

work page 2021
[22]

Membership leakage in label-only exposures

Zheng Li and Yang Zhang. Membership leakage in label-only exposures. In CCS ’21: 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, Republic of Korea, November 15 - 19, 2021 , pages 880–895, 2021

work page 2021
[23]

Varshney, Mohit Bansal, Sanmi Koyejo, and Yang Liu

Sijia Liu, Yuanshun Yao, Jinghan Jia, Stephen Casper, Nathalie Baracaldo, Peter Hase, Xiaojun Xu, Yuguang Yao, Hang Li, Kush R. Varshney, Mohit Bansal, Sanmi Koyejo, and Yang Liu. Re- thinking machine unlearning for large language models. CoRR, abs/2402.08787, 2024

work page arXiv 2024
[24]

An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning

Yun Luo, Zhen Yang, Fandong Meng, Yafu Li, Jie Zhou, and Yue Zhang. An empirical study of catastrophic forgetting in large language models during continual fine-tuning. CoRR, abs/2308.08747, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[26]

Llm dataset inference: Did you train on my dataset? arXiv preprint arXiv:2406.06443, 2024

Pratyush Maini, Hengrui Jia, Nicolas Papernot, and Adam Dziedzic. Llm dataset inference: Did you train on my dataset? arXiv preprint arXiv:2406.06443, 2024

work page arXiv 2024
[27]

Dataset inference: Ownership resolution in machine learning

Pratyush Maini, Mohammad Yaghini, and Nicolas Papernot. Dataset inference: Ownership resolution in machine learning. In 9th Interna- tional Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021 . OpenReview.net, 2021

work page 2021
[28]

Inherent challenges of post-hoc membership inference for large language models

Matthieu Meeus, Shubham Jain, Marek Rei, and Yves-Alexandre de Montjoye. Inherent challenges of post-hoc membership inference for large language models. arXiv preprint arXiv:2406.17975 , 2024

work page arXiv 2024
[29]

Copyright traps for large language models

Matthieu Meeus, Igor Shilov, Manuel Faysse, and Yves-Alexandre de Montjoye. Copyright traps for large language models. In 41st International Conference on Machine Learning , 2024

work page 2024
[30]

Exploiting unintended feature leakage in collaborative learning

Luca Melis, Congzheng Song, Emiliano De Cristofaro, and Vitaly Shmatikov. Exploiting unintended feature leakage in collaborative learning. In 2019 IEEE Symposium on Security and Privacy, SP 2019, San Francisco, CA, USA, May 19-23, 2019 , pages 691–706, 2019. 14

work page 2019
[31]

Machine learning with membership privacy using adversarial regularization

Milad Nasr, Reza Shokri, and Amir Houmansadr. Machine learning with membership privacy using adversarial regularization. In Pro- ceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, Toronto, ON, Canada, October 15-19, 2018, pages 634–646, 2018

work page 2018
[32]

Comprehensive privacy analysis of deep learning: Passive and active white-box in- ference attacks against centralized and federated learning

Milad Nasr, Reza Shokri, and Amir Houmansadr. Comprehensive privacy analysis of deep learning: Passive and active white-box in- ference attacks against centralized and federated learning. In 2019 IEEE Symposium on Security and Privacy, SP 2019, San Francisco, CA, USA, May 19-23, 2019 , pages 739–753, 2019

work page 2019
[33]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. , 21:140:1–140:67, 2020

work page 2020
[34]

Copyright protection in generative ai: A technical perspective

Jie Ren, Han Xu, Pengfei He, Yingqian Cui, Shenglai Zeng, Jiankun Zhang, Hongzhi Wen, Jiayuan Ding, Hui Liu, Yi Chang, and Jiliang Tang. Copyright protection in generative AI: A technical perspective. CoRR, abs/2402.02333, 2024

work page arXiv 2024
[35]

Ml-leaks: Model and data independent membership inference attacks and defenses on machine learning models

Ahmed Salem, Yang Zhang, Mathias Humbert, Pascal Berrang, Mario Fritz, and Michael Backes. Ml-leaks: Model and data independent membership inference attacks and defenses on machine learning models. In 26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, California, USA, February 24- 27, 2019, 2019

work page 2019
[36]

Membership privacy for machine learning models through knowledge transfer

Virat Shejwalkar and Amir Houmansadr. Membership privacy for machine learning models through knowledge transfer. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 202...

work page 2021
[37]

Detecting pretraining data from large language models, 2023

Weijia Shi, Anirudh Ajith, Mengzhou Xia, Yangsibo Huang, Daogao Liu, Terra Blevins, Danqi Chen, and Luke Zettlemoyer. Detecting pretraining data from large language models, 2023

work page 2023
[38]

Detecting pretraining data from large language models

Weijia Shi, Anirudh Ajith, Mengzhou Xia, Yangsibo Huang, Daogao Liu, Terra Blevins, Danqi Chen, and Luke Zettlemoyer. Detecting pretraining data from large language models. In The Twelfth Inter- national Conference on Learning Representations , 2024

work page 2024
[39]

Membership inference attacks against machine learning models

Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017 , pages 3–18, 2017

work page 2017
[40]

Auditing data provenance in text-generation models

Congzheng Song and Vitaly Shmatikov. Auditing data provenance in text-generation models. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019 , pages 196–206, 2019

work page 2019
[41]

Privacy risks of securing machine learning models against adversarial examples

Liwei Song, Reza Shokri, and Prateek Mittal. Privacy risks of securing machine learning models against adversarial examples. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, CCS 2019, London, UK, November 11-15, 2019, pages 241–257, 2019

work page 2019
[42]

The Times Sues OpenAI and Microsoft Over A.I

The New York Times. The Times Sues OpenAI and Microsoft Over A.I. Use of Copyrighted Work. https://www.nytimes.com/2023/12/27/ business/media/new-york-times-open-ai-microsoft-lawsuit.html, De- cember 2023

work page 2023
[43]

Truth serum: Poisoning machine learning models to reveal their secrets

Florian Tram `er, Reza Shokri, Ayrton San Joaquin, Hoang Le, Matthew Jagielski, Sanghyun Hong, and Nicholas Carlini. Truth serum: Poisoning machine learning models to reveal their secrets. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, CCS 2022, Los Angeles, CA, USA, November 7-11, 2022 , pages 2779–2792, 2022

work page 2022
[44]

Proving membership in LLM pretraining data via data watermarks

Johnny Tian-Zheng Wei, Ryan Yixiang Wang, and Robin Jia. Proving membership in LLM pretraining data via data watermarks. In Lun- Wei Ku, Andre Martins, and Vivek Srikumar, editors, Findings of the Association for Computational Linguistics, ACL 2024, Bangkok, Thailand and virtual meeting, August 11-16, 2024 , pages 13306– 13320. Association for Computation...

work page 2024
[45]

You only query once: an efficient label-only membership inference attack

Yutong Wu, Han Qiu, Shangwei Guo, Jiwei Li, and Tianwei Zhang. You only query once: an efficient label-only membership inference attack. In The Twelfth International Conference on Learning Repre- sentations, 2024

work page 2024
[46]

Privacy risk in machine learning: Analyzing the connection to over- fitting

Samuel Yeom, Irene Giacomelli, Matt Fredrikson, and Somesh Jha. Privacy risk in machine learning: Analyzing the connection to over- fitting. In 31st IEEE Computer Security Foundations Symposium, CSF 2018, Oxford, United Kingdom, July 9-12, 2018 , pages 268– 282, 2018

work page 2018
[47]

Label-only membership inference attacks and defenses in semantic segmentation models

Guangsheng Zhang, Bo Liu, Tianqing Zhu, Ming Ding, and Wanlei Zhou. Label-only membership inference attacks and defenses in semantic segmentation models. IEEE Transactions on Dependable and Secure Computing , 20(2):1435–1449, 2022

work page 2022
[48]

Membership inference attacks against sequential recommender sys- tems

Zhihao Zhu, Chenwang Wu, Rui Fan, Defu Lian, and Enhong Chen. Membership inference attacks against sequential recommender sys- tems. In Proceedings of the ACM Web Conference 2023, WWW 2023, Austin, TX, USA, 30 April 2023 - 4 May 2023 , pages 1208–1219, 2023. 15

work page 2023

[1] [1]

The claude 3 model family: Opus, sonnet, haiku

Anthropic. The claude 3 model family: Opus, sonnet, haiku. https://www-cdn.anthropic.com/ de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model Card Claude 3.pdf, March 2024

work page 2024

[2] [2]

Pythia: A suite for analyzing large language models across training and scaling

Stella Biderman, Hailey Schoelkopf, Quentin Gregory Anthony, Her- bie Bradley, Kyle O’Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, and Oskar van der Wal. Pythia: A suite for analyzing large language models across training and scaling. In Andreas Krause, Emma Brunskill, Kyu...

work page 2023

[3] [3]

GPT-Neo: Large-Scale Autoregressive Language Modeling with Mesh-TensorFlow, 2021

Sid Black, Leo Gao, Phil Wang, Connor Leahy, and Stella Bider- man. GPT-Neo: Large-Scale Autoregressive Language Modeling with Mesh-TensorFlow, 2021. Software available from Zenodo

work page 2021

[4] [4]

Language models are few- shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few- shot learners. Advances in neural information processing systems , 33:1877–1901, 2020

work page 1901

[5] [5]

Brown, Dawn Song, ´Ulfar Erlingsson, Alina Oprea, and Colin Raffel

Nicholas Carlini, Florian Tram `er, Eric Wallace, Matthew Jagielski, Ariel Herbert-V oss, Katherine Lee, Adam Roberts, Tom B. Brown, Dawn Song, ´Ulfar Erlingsson, Alina Oprea, and Colin Raffel. Ex- tracting training data from large language models. In 30th USENIX Security Symposium, USENIX Security 2021, August 11-13, 2021 , pages 2633–2650, 2021

work page 2021

[6] [6]

Gan-leaks: A taxonomy of membership inference attacks against generative models

Dingfan Chen, Ning Yu, Yang Zhang, and Mario Fritz. Gan-leaks: A taxonomy of membership inference attacks against generative models. In CCS ’20: 2020 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, USA, November 9-13, 2020 , pages 343–362, 2020

work page 2020

[7] [7]

Amplifying membership exposure via data poisoning

Yufei Chen, Chao Shen, Yun Shen, Cong Wang, and Yang Zhang. Amplifying membership exposure via data poisoning. In NeurIPS, 2022

work page 2022

[8] [8]

Label-only membership inference attacks

Christopher A Choquette-Choo, Florian Tramer, Nicholas Carlini, and Nicolas Papernot. Label-only membership inference attacks. In International conference on machine learning , pages 1964–1974. PMLR, 2021

work page 1964

[9] [9]

Blind baselines beat membership inference attacks for foundation models

Debeshee Das, Jie Zhang, and Florian Tram `er. Blind baselines beat membership inference attacks for foundation models. arXiv preprint arXiv:2406.16201, 2024

work page arXiv 2024

[10] [10]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[11] [11]

arXiv preprint arXiv:2402.07841 , year=

Michael Duan, Anshuman Suri, Niloofar Mireshghallah, Sewon Min, Weijia Shi, Luke Zettlemoyer, Yulia Tsvetkov, Yejin Choi, David Evans, and Hannaneh Hajishirzi. Do membership inference attacks work on large language models? arXiv preprint arXiv:2402.07841 , 2024

work page arXiv 2024

[12] [12]

Privacy leakage on dnns: A survey of model inversion attacks and defenses

Hao Fang, Yixiang Qiu, Hongyao Yu, Wenbo Yu, Jiawei Kong, Baoli Chong, Bin Chen, Xuan Wang, and Shu-Tao Xia. Privacy leakage on dnns: A survey of model inversion attacks and defenses. CoRR, abs/2402.04013, 2024

work page arXiv 2024

[13] [13]

Noisy neighbors: Efficient membership inference attacks against llms

Filippo Galli, Luca Melis, and Tommaso Cucinotta. Noisy neighbors: Efficient membership inference attacks against llms. arXiv preprint arXiv:2406.16565, 2024

work page arXiv 2024

[14] [14]

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, Shawn Presser, and Connor Leahy. The pile: An 800gb dataset of diverse text for language modeling. CoRR, abs/2101.00027, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[15] [15]

LOGAN: Membership Inference Attacks Against Generative Models

Jamie Hayes, Luca Melis, George Danezis, and Emiliano De Cristo- faro. LOGAN: evaluating privacy leakage of generative models using generative adversarial networks. CoRR, abs/1705.07663, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[16] [16]

Monte carlo and reconstruction membership inference attacks against gen- erative models

Benjamin Hilprecht, Martin H ¨arterich, and Daniel Bernau. Monte carlo and reconstruction membership inference attacks against gen- erative models. Proc. Priv. Enhancing Technol. , 2019(4):232–249, 2019

work page 2019

[17] [17]

Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low- rank adaptation of large language models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022 . OpenReview.net, 2022

work page 2022

[18] [18]

Evaluating differentially private machine learning in practice

Bargav Jayaraman and David Evans. Evaluating differentially private machine learning in practice. In 28th USENIX Security Symposium, USENIX Security 2019, Santa Clara, CA, USA, August 14-16, 2019 , pages 1895–1912, 2019

work page 2019

[19] [19]

Memguard: Defending against black-box membership inference attacks via adversarial examples

Jinyuan Jia, Ahmed Salem, Michael Backes, Yang Zhang, and Neil Zhenqiang Gong. Memguard: Defending against black-box membership inference attacks via adversarial examples. In Pro- ceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, CCS 2019, London, UK, November 11-15, 2019, pages 259–274, 2019

work page 2019

[20] [20]

Yigitcan Kaya and Tudor Dumitras. When does data augmentation help with membership inference attacks? In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event , pages 5345–5355, 2021

work page 2021

[21] [21]

Membership leakage in label-only exposures

Zheng Li and Yang Zhang. Membership leakage in label-only exposures. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security , pages 880–895, 2021

work page 2021

[22] [22]

Membership leakage in label-only exposures

Zheng Li and Yang Zhang. Membership leakage in label-only exposures. In CCS ’21: 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, Republic of Korea, November 15 - 19, 2021 , pages 880–895, 2021

work page 2021

[23] [23]

Varshney, Mohit Bansal, Sanmi Koyejo, and Yang Liu

Sijia Liu, Yuanshun Yao, Jinghan Jia, Stephen Casper, Nathalie Baracaldo, Peter Hase, Xiaojun Xu, Yuguang Yao, Hang Li, Kush R. Varshney, Mohit Bansal, Sanmi Koyejo, and Yang Liu. Re- thinking machine unlearning for large language models. CoRR, abs/2402.08787, 2024

work page arXiv 2024

[24] [24]

An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning

Yun Luo, Zhen Yang, Fandong Meng, Yafu Li, Jie Zhou, and Yue Zhang. An empirical study of catastrophic forgetting in large language models during continual fine-tuning. CoRR, abs/2308.08747, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[25] [26]

Llm dataset inference: Did you train on my dataset? arXiv preprint arXiv:2406.06443, 2024

Pratyush Maini, Hengrui Jia, Nicolas Papernot, and Adam Dziedzic. Llm dataset inference: Did you train on my dataset? arXiv preprint arXiv:2406.06443, 2024

work page arXiv 2024

[26] [27]

Dataset inference: Ownership resolution in machine learning

Pratyush Maini, Mohammad Yaghini, and Nicolas Papernot. Dataset inference: Ownership resolution in machine learning. In 9th Interna- tional Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021 . OpenReview.net, 2021

work page 2021

[27] [28]

Inherent challenges of post-hoc membership inference for large language models

Matthieu Meeus, Shubham Jain, Marek Rei, and Yves-Alexandre de Montjoye. Inherent challenges of post-hoc membership inference for large language models. arXiv preprint arXiv:2406.17975 , 2024

work page arXiv 2024

[28] [29]

Copyright traps for large language models

Matthieu Meeus, Igor Shilov, Manuel Faysse, and Yves-Alexandre de Montjoye. Copyright traps for large language models. In 41st International Conference on Machine Learning , 2024

work page 2024

[29] [30]

Exploiting unintended feature leakage in collaborative learning

Luca Melis, Congzheng Song, Emiliano De Cristofaro, and Vitaly Shmatikov. Exploiting unintended feature leakage in collaborative learning. In 2019 IEEE Symposium on Security and Privacy, SP 2019, San Francisco, CA, USA, May 19-23, 2019 , pages 691–706, 2019. 14

work page 2019

[30] [31]

Machine learning with membership privacy using adversarial regularization

Milad Nasr, Reza Shokri, and Amir Houmansadr. Machine learning with membership privacy using adversarial regularization. In Pro- ceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, Toronto, ON, Canada, October 15-19, 2018, pages 634–646, 2018

work page 2018

[31] [32]

Comprehensive privacy analysis of deep learning: Passive and active white-box in- ference attacks against centralized and federated learning

Milad Nasr, Reza Shokri, and Amir Houmansadr. Comprehensive privacy analysis of deep learning: Passive and active white-box in- ference attacks against centralized and federated learning. In 2019 IEEE Symposium on Security and Privacy, SP 2019, San Francisco, CA, USA, May 19-23, 2019 , pages 739–753, 2019

work page 2019

[32] [33]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. , 21:140:1–140:67, 2020

work page 2020

[33] [34]

Copyright protection in generative ai: A technical perspective

Jie Ren, Han Xu, Pengfei He, Yingqian Cui, Shenglai Zeng, Jiankun Zhang, Hongzhi Wen, Jiayuan Ding, Hui Liu, Yi Chang, and Jiliang Tang. Copyright protection in generative AI: A technical perspective. CoRR, abs/2402.02333, 2024

work page arXiv 2024

[34] [35]

Ml-leaks: Model and data independent membership inference attacks and defenses on machine learning models

Ahmed Salem, Yang Zhang, Mathias Humbert, Pascal Berrang, Mario Fritz, and Michael Backes. Ml-leaks: Model and data independent membership inference attacks and defenses on machine learning models. In 26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, California, USA, February 24- 27, 2019, 2019

work page 2019

[35] [36]

Membership privacy for machine learning models through knowledge transfer

Virat Shejwalkar and Amir Houmansadr. Membership privacy for machine learning models through knowledge transfer. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 202...

work page 2021

[36] [37]

Detecting pretraining data from large language models, 2023

Weijia Shi, Anirudh Ajith, Mengzhou Xia, Yangsibo Huang, Daogao Liu, Terra Blevins, Danqi Chen, and Luke Zettlemoyer. Detecting pretraining data from large language models, 2023

work page 2023

[37] [38]

Detecting pretraining data from large language models

Weijia Shi, Anirudh Ajith, Mengzhou Xia, Yangsibo Huang, Daogao Liu, Terra Blevins, Danqi Chen, and Luke Zettlemoyer. Detecting pretraining data from large language models. In The Twelfth Inter- national Conference on Learning Representations , 2024

work page 2024

[38] [39]

Membership inference attacks against machine learning models

Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017 , pages 3–18, 2017

work page 2017

[39] [40]

Auditing data provenance in text-generation models

Congzheng Song and Vitaly Shmatikov. Auditing data provenance in text-generation models. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019 , pages 196–206, 2019

work page 2019

[40] [41]

Privacy risks of securing machine learning models against adversarial examples

Liwei Song, Reza Shokri, and Prateek Mittal. Privacy risks of securing machine learning models against adversarial examples. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, CCS 2019, London, UK, November 11-15, 2019, pages 241–257, 2019

work page 2019

[41] [42]

The Times Sues OpenAI and Microsoft Over A.I

The New York Times. The Times Sues OpenAI and Microsoft Over A.I. Use of Copyrighted Work. https://www.nytimes.com/2023/12/27/ business/media/new-york-times-open-ai-microsoft-lawsuit.html, De- cember 2023

work page 2023

[42] [43]

Truth serum: Poisoning machine learning models to reveal their secrets

Florian Tram `er, Reza Shokri, Ayrton San Joaquin, Hoang Le, Matthew Jagielski, Sanghyun Hong, and Nicholas Carlini. Truth serum: Poisoning machine learning models to reveal their secrets. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, CCS 2022, Los Angeles, CA, USA, November 7-11, 2022 , pages 2779–2792, 2022

work page 2022

[43] [44]

Proving membership in LLM pretraining data via data watermarks

Johnny Tian-Zheng Wei, Ryan Yixiang Wang, and Robin Jia. Proving membership in LLM pretraining data via data watermarks. In Lun- Wei Ku, Andre Martins, and Vivek Srikumar, editors, Findings of the Association for Computational Linguistics, ACL 2024, Bangkok, Thailand and virtual meeting, August 11-16, 2024 , pages 13306– 13320. Association for Computation...

work page 2024

[44] [45]

You only query once: an efficient label-only membership inference attack

Yutong Wu, Han Qiu, Shangwei Guo, Jiwei Li, and Tianwei Zhang. You only query once: an efficient label-only membership inference attack. In The Twelfth International Conference on Learning Repre- sentations, 2024

work page 2024

[45] [46]

Privacy risk in machine learning: Analyzing the connection to over- fitting

Samuel Yeom, Irene Giacomelli, Matt Fredrikson, and Somesh Jha. Privacy risk in machine learning: Analyzing the connection to over- fitting. In 31st IEEE Computer Security Foundations Symposium, CSF 2018, Oxford, United Kingdom, July 9-12, 2018 , pages 268– 282, 2018

work page 2018

[46] [47]

Label-only membership inference attacks and defenses in semantic segmentation models

Guangsheng Zhang, Bo Liu, Tianqing Zhu, Ming Ding, and Wanlei Zhou. Label-only membership inference attacks and defenses in semantic segmentation models. IEEE Transactions on Dependable and Secure Computing , 20(2):1435–1449, 2022

work page 2022

[47] [48]

Membership inference attacks against sequential recommender sys- tems

Zhihao Zhu, Chenwang Wu, Rui Fan, Defu Lian, and Enhong Chen. Membership inference attacks against sequential recommender sys- tems. In Proceedings of the ACM Web Conference 2023, WWW 2023, Austin, TX, USA, 30 April 2023 - 4 May 2023 , pages 1208–1219, 2023. 15

work page 2023