Hey, That's My Data! Token-Only Dataset Inference in Large Language Models
Pith reviewed 2026-05-19 10:58 UTC · model grok-4.3
The pith
CatShift detects whether a dataset trained an LLM by comparing larger output shifts from fine-tuning on it versus on unseen data, using only generated tokens.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CatShift is a token-only dataset inference framework based on catastrophic forgetting, where models overwrite prior knowledge when trained on new data. Fine-tuning an LLM on a subset of its training data induces larger output shifts than fine-tuning on unseen data. CatShift compares these shifts against those from a known non-member validation set to infer whether a dataset was included in training.
What carries the argument
CatShift framework, which measures output shifts after fine-tuning on a suspect dataset and compares them to shifts from a known non-member validation set to detect membership via catastrophic forgetting.
Load-bearing premise
Catastrophic forgetting produces reliably larger and distinguishable output shifts specifically for training data subsets rather than for other factors such as data similarity or fine-tuning details.
What would settle it
Experiments that find output shifts of similar size after fine-tuning on training subsets and on unrelated new data would show the method cannot distinguish membership.
Figures
read the original abstract
Large Language Models (LLMs) rely on massive training datasets, often including proprietary data, which raises concerns about unauthorized usage and copyright infringement. Existing dataset inference methods typically require access to log probabilities or other internal signals, but many modern LLMs restrict such access, motivating token-only inference approaches. We propose CatShift, a token-only dataset inference framework based on catastrophic forgetting, where models overwrite prior knowledge when trained on new data. Fine-tuning an LLM on a subset of its training data induces larger output shifts than fine-tuning on unseen data. CatShift compares these shifts against those from a known non-member validation set to infer whether a dataset was included in training. Experiments on both open-source and API-based LLMs show that CatShift remains effective without logit access, enabling practical protection of proprietary datasets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes CatShift, a token-only dataset inference framework for LLMs that exploits catastrophic forgetting. The central claim is that fine-tuning on a subset of the model's original training data produces reliably larger output shifts (measured via token-level changes) than fine-tuning on unseen data; these shifts are compared against those induced by a known non-member validation set to infer membership. The method is evaluated on both open-source models and API-based LLMs without requiring logit access.
Significance. If the empirical distinction holds after proper controls, CatShift would be a meaningful advance for practical dataset inference in black-box settings, addressing copyright and data-provenance concerns where log-probability access is unavailable. The token-only design and applicability to closed models are practical strengths.
major comments (2)
- [§3 and §4] §3 (Method) and §4 (Experiments): The central claim requires that output shifts are attributable to membership via catastrophic forgetting rather than distributional similarity between the fine-tuning subset and the pre-training distribution. No ablation or matching procedure (e.g., embedding cosine similarity, topic overlap, or lexical n-gram controls) between the member subset and the non-member validation set is described; without this, shifts could arise from any data that is distributionally close to the original training corpus independently of exact token membership.
- [§4.1] §4.1 (Experimental setup): The paper reports effectiveness on open-source and API models but provides no quantitative results, error bars, or baseline comparisons in the available description. This makes it impossible to assess whether the observed shift differences are statistically reliable or practically distinguishable from noise or hyperparameter effects.
minor comments (2)
- [§3] Notation for output shift (e.g., how token-level differences are aggregated across sequences) should be defined more explicitly with an equation or pseudocode to allow reproduction.
- [Abstract and §1] The abstract and introduction would benefit from a clearer statement of the threat model (e.g., adversary capabilities and access assumptions) to situate CatShift relative to prior logit-based inference methods.
Simulated Author's Rebuttal
Thank you for the referee's insightful comments on our paper. We have carefully considered the feedback and provide point-by-point responses below. We are committed to addressing the concerns to enhance the manuscript's robustness.
read point-by-point responses
-
Referee: [§3 and §4] §3 (Method) and §4 (Experiments): The central claim requires that output shifts are attributable to membership via catastrophic forgetting rather than distributional similarity between the fine-tuning subset and the pre-training distribution. No ablation or matching procedure (e.g., embedding cosine similarity, topic overlap, or lexical n-gram controls) between the member subset and the non-member validation set is described; without this, shifts could arise from any data that is distributionally close to the original training corpus independently of exact token membership.
Authors: We thank the referee for highlighting this important distinction. Our non-member validation set is selected from a held-out subset of the pre-training data to ensure it is distributionally similar yet not overlapping with the training set. However, to explicitly demonstrate that the larger shifts are due to membership and catastrophic forgetting rather than mere distributional proximity, we will incorporate matching procedures and ablations in the revised manuscript. This will include computing embedding similarities, topic overlaps, and n-gram statistics between the sets, and showing that the effect persists under these controls. These additions will be detailed in an expanded Section 4. revision: yes
-
Referee: [§4.1] §4.1 (Experimental setup): The paper reports effectiveness on open-source and API models but provides no quantitative results, error bars, or baseline comparisons in the available description. This makes it impossible to assess whether the observed shift differences are statistically reliable or practically distinguishable from noise or hyperparameter effects.
Authors: We apologize for any lack of clarity in the presentation. The full manuscript in Section 4.1 does include quantitative results on output shift differences for both open-source and API-based models, along with comparisons to baselines. To address the concern about statistical reliability, we will add explicit error bars, report standard deviations from repeated experiments, and include statistical tests in the revised version. This will make the distinguishability from noise and hyperparameter sensitivity clearer. revision: yes
Circularity Check
No significant circularity; empirical comparison is self-contained
full rationale
The paper presents CatShift as an empirical token-only inference method that measures and compares output shifts induced by fine-tuning on a candidate dataset versus shifts from a known non-member validation set. No derivation chain, equations, or first-principles results are described that reduce by construction to fitted parameters, self-definitions, or load-bearing self-citations. The approach relies on observable differences in catastrophic forgetting effects, which are tested experimentally on open-source and API models rather than derived tautologically from the inputs themselves. This is a standard empirical procedure that remains falsifiable through independent replication and does not exhibit any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Fine-tuning on training data produces larger output shifts due to catastrophic forgetting than fine-tuning on unseen data
Forward citations
Cited by 1 Pith paper
-
How to Compress KV Cache in RL Post-Training? Shadow Mask Distillation for Memory-Efficient Alignment
Shadow Mask Distillation enables KV cache compression in RL post-training of LLMs by mitigating amplified off-policy bias that defeats standard importance reweighting.
Reference graph
Works this paper leans on
-
[1]
The claude 3 model family: Opus, sonnet, haiku
Anthropic. The claude 3 model family: Opus, sonnet, haiku. https://www-cdn.anthropic.com/ de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model Card Claude 3.pdf, March 2024
work page 2024
-
[2]
Pythia: A suite for analyzing large language models across training and scaling
Stella Biderman, Hailey Schoelkopf, Quentin Gregory Anthony, Her- bie Bradley, Kyle O’Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, and Oskar van der Wal. Pythia: A suite for analyzing large language models across training and scaling. In Andreas Krause, Emma Brunskill, Kyu...
work page 2023
-
[3]
GPT-Neo: Large-Scale Autoregressive Language Modeling with Mesh-TensorFlow, 2021
Sid Black, Leo Gao, Phil Wang, Connor Leahy, and Stella Bider- man. GPT-Neo: Large-Scale Autoregressive Language Modeling with Mesh-TensorFlow, 2021. Software available from Zenodo
work page 2021
-
[4]
Language models are few- shot learners
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few- shot learners. Advances in neural information processing systems , 33:1877–1901, 2020
work page 1901
-
[5]
Brown, Dawn Song, ´Ulfar Erlingsson, Alina Oprea, and Colin Raffel
Nicholas Carlini, Florian Tram `er, Eric Wallace, Matthew Jagielski, Ariel Herbert-V oss, Katherine Lee, Adam Roberts, Tom B. Brown, Dawn Song, ´Ulfar Erlingsson, Alina Oprea, and Colin Raffel. Ex- tracting training data from large language models. In 30th USENIX Security Symposium, USENIX Security 2021, August 11-13, 2021 , pages 2633–2650, 2021
work page 2021
-
[6]
Gan-leaks: A taxonomy of membership inference attacks against generative models
Dingfan Chen, Ning Yu, Yang Zhang, and Mario Fritz. Gan-leaks: A taxonomy of membership inference attacks against generative models. In CCS ’20: 2020 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, USA, November 9-13, 2020 , pages 343–362, 2020
work page 2020
-
[7]
Amplifying membership exposure via data poisoning
Yufei Chen, Chao Shen, Yun Shen, Cong Wang, and Yang Zhang. Amplifying membership exposure via data poisoning. In NeurIPS, 2022
work page 2022
-
[8]
Label-only membership inference attacks
Christopher A Choquette-Choo, Florian Tramer, Nicholas Carlini, and Nicolas Papernot. Label-only membership inference attacks. In International conference on machine learning , pages 1964–1974. PMLR, 2021
work page 1964
-
[9]
Blind baselines beat membership inference attacks for foundation models
Debeshee Das, Jie Zhang, and Florian Tram `er. Blind baselines beat membership inference attacks for foundation models. arXiv preprint arXiv:2406.16201, 2024
-
[10]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 , 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[11]
arXiv preprint arXiv:2402.07841 , year=
Michael Duan, Anshuman Suri, Niloofar Mireshghallah, Sewon Min, Weijia Shi, Luke Zettlemoyer, Yulia Tsvetkov, Yejin Choi, David Evans, and Hannaneh Hajishirzi. Do membership inference attacks work on large language models? arXiv preprint arXiv:2402.07841 , 2024
-
[12]
Privacy leakage on dnns: A survey of model inversion attacks and defenses
Hao Fang, Yixiang Qiu, Hongyao Yu, Wenbo Yu, Jiawei Kong, Baoli Chong, Bin Chen, Xuan Wang, and Shu-Tao Xia. Privacy leakage on dnns: A survey of model inversion attacks and defenses. CoRR, abs/2402.04013, 2024
-
[13]
Noisy neighbors: Efficient membership inference attacks against llms
Filippo Galli, Luca Melis, and Tommaso Cucinotta. Noisy neighbors: Efficient membership inference attacks against llms. arXiv preprint arXiv:2406.16565, 2024
-
[14]
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, Shawn Presser, and Connor Leahy. The pile: An 800gb dataset of diverse text for language modeling. CoRR, abs/2101.00027, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[15]
LOGAN: Membership Inference Attacks Against Generative Models
Jamie Hayes, Luca Melis, George Danezis, and Emiliano De Cristo- faro. LOGAN: evaluating privacy leakage of generative models using generative adversarial networks. CoRR, abs/1705.07663, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[16]
Monte carlo and reconstruction membership inference attacks against gen- erative models
Benjamin Hilprecht, Martin H ¨arterich, and Daniel Bernau. Monte carlo and reconstruction membership inference attacks against gen- erative models. Proc. Priv. Enhancing Technol. , 2019(4):232–249, 2019
work page 2019
-
[17]
Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low- rank adaptation of large language models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022 . OpenReview.net, 2022
work page 2022
-
[18]
Evaluating differentially private machine learning in practice
Bargav Jayaraman and David Evans. Evaluating differentially private machine learning in practice. In 28th USENIX Security Symposium, USENIX Security 2019, Santa Clara, CA, USA, August 14-16, 2019 , pages 1895–1912, 2019
work page 2019
-
[19]
Memguard: Defending against black-box membership inference attacks via adversarial examples
Jinyuan Jia, Ahmed Salem, Michael Backes, Yang Zhang, and Neil Zhenqiang Gong. Memguard: Defending against black-box membership inference attacks via adversarial examples. In Pro- ceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, CCS 2019, London, UK, November 11-15, 2019, pages 259–274, 2019
work page 2019
-
[20]
Yigitcan Kaya and Tudor Dumitras. When does data augmentation help with membership inference attacks? In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event , pages 5345–5355, 2021
work page 2021
-
[21]
Membership leakage in label-only exposures
Zheng Li and Yang Zhang. Membership leakage in label-only exposures. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security , pages 880–895, 2021
work page 2021
-
[22]
Membership leakage in label-only exposures
Zheng Li and Yang Zhang. Membership leakage in label-only exposures. In CCS ’21: 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, Republic of Korea, November 15 - 19, 2021 , pages 880–895, 2021
work page 2021
-
[23]
Varshney, Mohit Bansal, Sanmi Koyejo, and Yang Liu
Sijia Liu, Yuanshun Yao, Jinghan Jia, Stephen Casper, Nathalie Baracaldo, Peter Hase, Xiaojun Xu, Yuguang Yao, Hang Li, Kush R. Varshney, Mohit Bansal, Sanmi Koyejo, and Yang Liu. Re- thinking machine unlearning for large language models. CoRR, abs/2402.08787, 2024
-
[24]
An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning
Yun Luo, Zhen Yang, Fandong Meng, Yafu Li, Jie Zhou, and Yue Zhang. An empirical study of catastrophic forgetting in large language models during continual fine-tuning. CoRR, abs/2308.08747, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[26]
Llm dataset inference: Did you train on my dataset? arXiv preprint arXiv:2406.06443, 2024
Pratyush Maini, Hengrui Jia, Nicolas Papernot, and Adam Dziedzic. Llm dataset inference: Did you train on my dataset? arXiv preprint arXiv:2406.06443, 2024
-
[27]
Dataset inference: Ownership resolution in machine learning
Pratyush Maini, Mohammad Yaghini, and Nicolas Papernot. Dataset inference: Ownership resolution in machine learning. In 9th Interna- tional Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021 . OpenReview.net, 2021
work page 2021
-
[28]
Inherent challenges of post-hoc membership inference for large language models
Matthieu Meeus, Shubham Jain, Marek Rei, and Yves-Alexandre de Montjoye. Inherent challenges of post-hoc membership inference for large language models. arXiv preprint arXiv:2406.17975 , 2024
-
[29]
Copyright traps for large language models
Matthieu Meeus, Igor Shilov, Manuel Faysse, and Yves-Alexandre de Montjoye. Copyright traps for large language models. In 41st International Conference on Machine Learning , 2024
work page 2024
-
[30]
Exploiting unintended feature leakage in collaborative learning
Luca Melis, Congzheng Song, Emiliano De Cristofaro, and Vitaly Shmatikov. Exploiting unintended feature leakage in collaborative learning. In 2019 IEEE Symposium on Security and Privacy, SP 2019, San Francisco, CA, USA, May 19-23, 2019 , pages 691–706, 2019. 14
work page 2019
-
[31]
Machine learning with membership privacy using adversarial regularization
Milad Nasr, Reza Shokri, and Amir Houmansadr. Machine learning with membership privacy using adversarial regularization. In Pro- ceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, Toronto, ON, Canada, October 15-19, 2018, pages 634–646, 2018
work page 2018
-
[32]
Milad Nasr, Reza Shokri, and Amir Houmansadr. Comprehensive privacy analysis of deep learning: Passive and active white-box in- ference attacks against centralized and federated learning. In 2019 IEEE Symposium on Security and Privacy, SP 2019, San Francisco, CA, USA, May 19-23, 2019 , pages 739–753, 2019
work page 2019
-
[33]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. , 21:140:1–140:67, 2020
work page 2020
-
[34]
Copyright protection in generative ai: A technical perspective
Jie Ren, Han Xu, Pengfei He, Yingqian Cui, Shenglai Zeng, Jiankun Zhang, Hongzhi Wen, Jiayuan Ding, Hui Liu, Yi Chang, and Jiliang Tang. Copyright protection in generative AI: A technical perspective. CoRR, abs/2402.02333, 2024
-
[35]
Ahmed Salem, Yang Zhang, Mathias Humbert, Pascal Berrang, Mario Fritz, and Michael Backes. Ml-leaks: Model and data independent membership inference attacks and defenses on machine learning models. In 26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, California, USA, February 24- 27, 2019, 2019
work page 2019
-
[36]
Membership privacy for machine learning models through knowledge transfer
Virat Shejwalkar and Amir Houmansadr. Membership privacy for machine learning models through knowledge transfer. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 202...
work page 2021
-
[37]
Detecting pretraining data from large language models, 2023
Weijia Shi, Anirudh Ajith, Mengzhou Xia, Yangsibo Huang, Daogao Liu, Terra Blevins, Danqi Chen, and Luke Zettlemoyer. Detecting pretraining data from large language models, 2023
work page 2023
-
[38]
Detecting pretraining data from large language models
Weijia Shi, Anirudh Ajith, Mengzhou Xia, Yangsibo Huang, Daogao Liu, Terra Blevins, Danqi Chen, and Luke Zettlemoyer. Detecting pretraining data from large language models. In The Twelfth Inter- national Conference on Learning Representations , 2024
work page 2024
-
[39]
Membership inference attacks against machine learning models
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017 , pages 3–18, 2017
work page 2017
-
[40]
Auditing data provenance in text-generation models
Congzheng Song and Vitaly Shmatikov. Auditing data provenance in text-generation models. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019 , pages 196–206, 2019
work page 2019
-
[41]
Privacy risks of securing machine learning models against adversarial examples
Liwei Song, Reza Shokri, and Prateek Mittal. Privacy risks of securing machine learning models against adversarial examples. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, CCS 2019, London, UK, November 11-15, 2019, pages 241–257, 2019
work page 2019
-
[42]
The Times Sues OpenAI and Microsoft Over A.I
The New York Times. The Times Sues OpenAI and Microsoft Over A.I. Use of Copyrighted Work. https://www.nytimes.com/2023/12/27/ business/media/new-york-times-open-ai-microsoft-lawsuit.html, De- cember 2023
work page 2023
-
[43]
Truth serum: Poisoning machine learning models to reveal their secrets
Florian Tram `er, Reza Shokri, Ayrton San Joaquin, Hoang Le, Matthew Jagielski, Sanghyun Hong, and Nicholas Carlini. Truth serum: Poisoning machine learning models to reveal their secrets. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, CCS 2022, Los Angeles, CA, USA, November 7-11, 2022 , pages 2779–2792, 2022
work page 2022
-
[44]
Proving membership in LLM pretraining data via data watermarks
Johnny Tian-Zheng Wei, Ryan Yixiang Wang, and Robin Jia. Proving membership in LLM pretraining data via data watermarks. In Lun- Wei Ku, Andre Martins, and Vivek Srikumar, editors, Findings of the Association for Computational Linguistics, ACL 2024, Bangkok, Thailand and virtual meeting, August 11-16, 2024 , pages 13306– 13320. Association for Computation...
work page 2024
-
[45]
You only query once: an efficient label-only membership inference attack
Yutong Wu, Han Qiu, Shangwei Guo, Jiwei Li, and Tianwei Zhang. You only query once: an efficient label-only membership inference attack. In The Twelfth International Conference on Learning Repre- sentations, 2024
work page 2024
-
[46]
Privacy risk in machine learning: Analyzing the connection to over- fitting
Samuel Yeom, Irene Giacomelli, Matt Fredrikson, and Somesh Jha. Privacy risk in machine learning: Analyzing the connection to over- fitting. In 31st IEEE Computer Security Foundations Symposium, CSF 2018, Oxford, United Kingdom, July 9-12, 2018 , pages 268– 282, 2018
work page 2018
-
[47]
Label-only membership inference attacks and defenses in semantic segmentation models
Guangsheng Zhang, Bo Liu, Tianqing Zhu, Ming Ding, and Wanlei Zhou. Label-only membership inference attacks and defenses in semantic segmentation models. IEEE Transactions on Dependable and Secure Computing , 20(2):1435–1449, 2022
work page 2022
-
[48]
Membership inference attacks against sequential recommender sys- tems
Zhihao Zhu, Chenwang Wu, Rui Fan, Defu Lian, and Enhong Chen. Membership inference attacks against sequential recommender sys- tems. In Proceedings of the ACM Web Conference 2023, WWW 2023, Austin, TX, USA, 30 April 2023 - 4 May 2023 , pages 1208–1219, 2023. 15
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.