Pretraining Data Exposure in Large Language Models: A Survey of Membership Inference, Data Contamination, and Security Implications
Pith reviewed 2026-06-30 17:18 UTC · model grok-4.3
The pith
A survey unifies membership inference and data contamination as pretraining data exposure in large language models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that pretraining data exposure can be formalized across exposure levels to bring together the conceptually related but previously isolated areas of data contamination and membership inference, allowing a review of attack and defense methods, synthesis of findings, and highlighting of challenges in LLMs.
What carries the argument
The PDE framework, which determines whether specific data appeared in an LLM's pretraining corpus and unifies data contamination and membership inference.
Load-bearing premise
That membership inference and data contamination are conceptually close enough to be productively unified under a single PDE framework without important distinctions being lost or the survey becoming too broad to be useful.
What would settle it
Empirical evidence that methods and findings from membership inference and data contamination cannot be meaningfully compared or combined because their core mechanisms differ fundamentally.
read the original abstract
Large Language Models (LLMs) have become the predominant paradigm in NLP, advancing both research and industry. As model sizes and pretraining data grow, concerns about Pretraining Data Exposure (PDE) increase due to the scale and opacity of training datasets. PDE refers to determining whether specific data appeared in an LLM's pretraining corpus. It is critical for ensuring evaluation integrity and protecting privacy, intersecting two key areas: data contamination and membership inference. Though conceptually related, these areas have often been studied in isolation. This paper offers the first unified survey of both under the PDE framework. We formalize PDE across exposure levels, review attack and defense methods, synthesize empirical findings, and highlight open challenges and future research directions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to deliver the first unified survey of membership inference and data contamination in LLMs under a new Pretraining Data Exposure (PDE) framework. It formalizes PDE across exposure levels, reviews attack and defense methods from both literatures, synthesizes empirical findings, and identifies open challenges and future directions.
Significance. If the unification holds without losing key distinctions, the survey could serve as a useful reference that connects two previously isolated research threads, aiding work on LLM privacy risks and evaluation integrity. The organizational contribution of the PDE framework and the synthesis of methods are the primary potential strengths.
minor comments (2)
- The abstract states that the areas 'have often been studied in isolation' but does not cite prior attempts at partial unification; adding 1-2 sentences in the introduction with explicit comparison to any overlapping prior surveys would strengthen the novelty claim.
- Section headings and subsection numbering are not provided in the supplied text; ensuring consistent numbering and a clear table of contents would improve navigability for a survey of this length.
Simulated Author's Rebuttal
We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. The feedback affirms the value of unifying membership inference and data contamination under the PDE framework. No specific major comments were listed in the report, so we have no individual points to address point-by-point at this stage. We will incorporate any additional feedback from the editor or further referee comments in the revision.
Circularity Check
No significant circularity: survey with no derivations
full rationale
This paper is a literature survey that proposes an organizational PDE framework to unify two previously separate research areas. It contains no new equations, fitted parameters, predictions, or derivations of any kind. The central claim of providing the 'first unified survey' is a statement about coverage and synthesis rather than a mathematical result that could reduce to its inputs by construction. No self-citation load-bearing steps, ansatzes, or renamings of known results appear in the provided text.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
anthropic.com/news/claude-3-family, accessed: 2025-03-16
Anthropic: Introducing the next generation of claude (2024), https://www. anthropic.com/news/claude-3-family, accessed: 2025-03-16
2024
-
[2]
Conference of the European Chapter of the Association for Computational Linguistics (2024)
Balloccu, S., Schmidtov’a, P., Lango, M., Dusek, O.: Leak, cheat, repeat: Data contamination and evaluation malpractices in closed-source llms. Conference of the European Chapter of the Association for Computational Linguistics (2024). https://doi.org/10.48550/arxiv.2402.03927
-
[3]
Cao, J., Zhang, W., Cheung, S.: Concerned with data contamination? assessing countermeasures in code language model. arXiv.org (2024). https://doi.org/10. 48550/arxiv.2403.16898
-
[4]
Quantifying Memorization Across Neural Language Models
Carlini, N., Ippolito, D., Jagielski, M., Lee, K., Tramer, F., Zhang, C.: Quantifying Memorization Across Neural Language Models (March 2023). https://doi.org/10. 48550/arXiv.2202.07646, http://arxiv.org/abs/2202.07646, arXiv:2202.07646 [cs] 8 Z. Tong et al
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[5]
https://doi.org/10.48550/arXiv.2403.00393
Chandran, N., Sitaram, S., Gupta, D., Sharma, R., Mittal, K., Swaminathan, M.: Private benchmarking to prevent contamination and improve comparative evalua- tion of llms (2024). https://doi.org/10.48550/arXiv.2403.00393
-
[6]
Chen, S., Chen, Y., Li, Z., Jiang, Y., Wan, Z., He, Y., Ran, D., Gu, T., Li, H., Xie, T., Ray, B.: Recent advances in large langauge model benchmarks against data contamination: From static to dynamic evaluation (2025)
2025
-
[7]
https://doi.org/10.48550/arXiv.2502.14425, http://arxiv.org/abs/2502.14425, arXiv:2502.14425 [cs]
Cheng, Y., Chang, Y., Wu, Y.: A Survey on Data Contamination for Large Language Models (February 2025). https://doi.org/10.48550/arXiv.2502.14425, http://arxiv.org/abs/2502.14425, arXiv:2502.14425 [cs]
-
[8]
Dekoninck, J., Muller, M.N., Baader, M., Fischer, M., Vechev, M.T.: Evading data contamination detection for language models is (too) easy. arXiv.org (2024). https: //doi.org/10.48550/arxiv.2402.02823
-
[10]
arXiv preprint arXiv:2406.14644 (2024)
Deng, C., Zhao, Y., Heng, Y., Li, Y., Cao, J., Tang, X., Cohan, A.: Unveiling the spectrum of data contamination in language models: A survey from detection to remediation. arXiv preprint arXiv:2406.14644 (2024)
-
[11]
https://doi.org/10.48550/arXiv.2311.09783, http://arxiv.org/abs/ 2311.09783, arXiv:2311.09783 [cs]
Deng, C., Zhao, Y., Tang, X., Gerstein, M., Cohan, A.: Investigating Data Contamination in Modern Benchmarks for Large Language Models (Apr 2024). https://doi.org/10.48550/arXiv.2311.09783, http://arxiv.org/abs/ 2311.09783, arXiv:2311.09783 [cs]
-
[12]
https: //doi.org/10.48550/arXiv.2402.15938
Dong, Y., Jiang, X., Liu, H., Jin, Z., Li, G.: Generalization or memorization: Data contamination and trustworthy evaluation for large language models (2024). https: //doi.org/10.48550/arXiv.2402.15938
-
[13]
Duan, M., Suri, A., Mireshghallah, N., Min, S., Shi, W., Zettlemoyer, L.S., Tsvetkov, Y., Choi, Y., Evans, D., Hajishirzi, H.: Do membership inference at- tacks work on large language models? (2024)
2024
-
[14]
Fang, J., Jiang, H., Wang, K., Ma, Y ., Jie, S., Wang, X., He, X., and Chua, T.-S
Eldan, R., Russinovich, M.: Who’s harry potter? approximate unlearning in llms. arXiv preprint arXiv:2310.02238 (2023)
-
[16]
https://doi.org/10.48550/arXiv.2311.06062
Fu, W., Wang, H., Gao, C., Liu, G., Li, Y., Jiang, T.: Practical membership infer- ence attacks against fine-tuned large language models via self-prompt calibration (2023). https://doi.org/10.48550/arXiv.2311.06062
-
[17]
arXiv preprint arXiv:2410.18966 (2024)
Fu, Y., Uzuner, O., Yetisgen, M., Xia, F.: Does data contamination detection work (well) for llms? a survey and evaluation on detection assumptions. arXiv preprint arXiv:2410.18966 (2024)
-
[18]
ACM Computing Surveys (CSUR)54(11s), 1–37 (2022)
Hu, H., Salcic, Z., Sun, L., Dobbie, G., Yu, P.S., Zhang, X.: Membership inference attacks on machine learning: A survey. ACM Computing Surveys (CSUR)54(11s), 1–37 (2022)
2022
-
[19]
arXiv preprint arXiv:2305.16157 (2023)
Ishihara, S.: Training data extraction from pre-trained language models: A survey. arXiv preprint arXiv:2305.16157 (2023)
-
[20]
https://doi.org/10.48550/arXiv.2305.10160, http:// arxiv.org/abs/2305.10160, arXiv:2305.10160 [cs]
Jacovi, A., Caciularu, A., Goldman, O., Goldberg, Y.: Stop Uploading Test Data in Plain Text: Practical Strategies for Mitigating Data Contamination by Evaluation Benchmarks (October 2023). https://doi.org/10.48550/arXiv.2305.10160, http:// arxiv.org/abs/2305.10160, arXiv:2305.10160 [cs]
-
[21]
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Jain, N., Han, K., Gu, A., Li, W.D., Yan, F., Zhang, T., Wang, S., Solar-Lezama, A., Sen, K., Stoica, I.: Livecodebench: Holistic and contamination free evaluation of large language models for code (2024). https://doi.org/10.48550/arXiv.2403.07974 Title Suppressed Due to Excessive Length 9
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2403.07974 2024
-
[23]
https://doi.org/10.48550/arXiv.2404.11262
Kaneko, M., Ma, Y., Wata, Y., Okazaki, N.: Sampling-based pseudo-likelihood for membership inference attacks (2024). https://doi.org/10.48550/arXiv.2404.11262
-
[24]
https: //doi.org/10.48550/arXiv.2410.07582
Kim, G., Li, Y., Spiliopoulou, E., Ma, J., Ballesteros, M., Wang, W.Y.: Detecting training data of large language models via expectation maximization (2024). https: //doi.org/10.48550/arXiv.2410.07582
-
[25]
https://doi.org/10.48550/arXiv.2312.16337, http:// arxiv.org/abs/2312.16337, arXiv:2312.16337 [cs]
Li, C., Flanigan, J.: Task Contamination: Language Models May Not Be Few- Shot Anymore (Dec 2023). https://doi.org/10.48550/arXiv.2312.16337, http:// arxiv.org/abs/2312.16337, arXiv:2312.16337 [cs]
-
[26]
Li, Y., Wong, T.L., Hung, C.T., Zhao, J., Zheng, D., Liu, K.W., Lyu, M.R., Wang, L.: C2leva: Toward comprehensive and contamination-free language model evalua- tion (2024)
2024
- [27]
-
[30]
Liu, Y.: An open source data contamination report for llama series models. arXiv.org (2023). https://doi.org/10.48550/arxiv.2310.17589
-
[31]
arXiv preprint arXiv:2407.16997 (2024)
Liu, Y., Zhang, Y., Jaakkola, T., Chang, S.: Revisiting who’s harry potter: To- wards targeted unlearning from a causal intervention perspective. arXiv preprint arXiv:2407.16997 (2024)
-
[32]
https://doi.org/10.1109/SP46215.2023.10179300
Lukas, N., Salem, A., Sim, R., Tople, S., Wutschitz, L., Zanella-B’eguelin, S.: An- alyzing leakage of personally identifiable information in language models (2023). https://doi.org/10.1109/SP46215.2023.10179300
-
[33]
https://doi.org/10.1145/ 3702980
Majdinasab, V., Nikanjam, A., Khomh, F.: Trained without my consent: Detecting code inclusion in language models trained on code (2024). https://doi.org/10.1145/ 3702980
2024
-
[34]
Mancera, G., DeAlcala, D., Fiérrez, J., Tolosana, R., Morales, A.: Is my text in your ai model? gradient-based membership inference test applied to llms (2025)
2025
-
[35]
Annual Meeting of the Association for Computational Lin- guistics (2023)
Mattern, J., Mireshghallah, F., Jin, Z., Schölkopf, B., Sachan, M., Berg- Kirkpatrick, T.: Membership inference attacks against language models via neigh- bourhood comparison. Annual Meeting of the Association for Computational Lin- guistics (2023). https://doi.org/10.48550/arxiv.2305.18462
-
[36]
https://doi.org/10.48550/arXiv.2402.09363
Meeus, M., Shilov, I., Faysse, M., de Montjoye, Y.A.: Copyright traps for large language models (2024). https://doi.org/10.48550/arXiv.2402.09363
-
[37]
Conference on Empirical Methods in Natural Language Processing (2022)
Mireshghallah, F., Goyal, K., Uniyal, A., Berg-Kirkpatrick, T., Shokri, R.: Quan- tifying privacy risks of masked language models using membership inference at- tacks. Conference on Empirical Methods in Natural Language Processing (2022). https://doi.org/10.48550/arxiv.2203.03929
-
[38]
arXiv.org (2024) 10 Z
Mozaffari, H., Marathe, V.J.: Semantic membership inference attack against large language models. arXiv.org (2024) 10 Z. Tong et al
2024
-
[39]
https://doi.org/ 10.48550/arXiv.2410.08858
Nie, Y., Wang, C., Wang, K., Xu, G., Xu, G., Wang, H.: Decoding secret memo- rization in code llms through token-level characterization (2024). https://doi.org/ 10.48550/arXiv.2410.08858
-
[40]
Niu, L., Mirza, M.S., Maradni, Z., Pöpper, C.: Codexleaks: Privacy leaks from code generation language models in github copilot (2023)
2023
-
[41]
https://doi.org/ 10.48550/arXiv.2310.17623, http://arxiv.org/abs/2310.17623, arXiv:2310.17623 [cs]
Oren, Y., Meister, N., Chatterji, N., Ladhak, F., Hashimoto, T.B.: Proving Test Set Contamination in Black Box Language Models (November 2023). https://doi.org/ 10.48550/arXiv.2310.17623, http://arxiv.org/abs/2310.17623, arXiv:2310.17623 [cs]
-
[42]
Palavalli, M., Bertsch, A., Gormley, M.R.: A taxonomy for data contamination in large language models. CONDA (2024). https://doi.org/10.48550/arxiv.2407. 08716
-
[43]
Panaitescu-Liess, M.A., Che, Z., An, B., Xu, Y., Pathmanathan, P., Chakraborty, S., Zhu, S., Goldstein, T., Huang, F.: Can watermarking large language models prevent copyrighted text generation and hide training data? (2024). https://doi. org/10.48550/arXiv.2407.17417
-
[44]
Conference on Empirical Methods in Natural Language Processing (2024)
Qian, K., Wan, S., Tang, C., Wang, Y., Zhang, X., Chen, M., Yu, Z.: Var- bench: Robust language model benchmarking through dynamic variable pertur- bation. Conference on Empirical Methods in Natural Language Processing (2024). https://doi.org/10.48550/arxiv.2406.17681
-
[45]
https://doi.org/10.48550/ arXiv.2404.00699, http://arxiv.org/abs/2404.00699, arXiv:2404.00699 [cs]
Ravaut, M., Ding, B., Jiao, F., Chen, H., Li, X., Zhao, R., Qin, C., Xiong, C., Joty, S.: How Much are Large Language Models Contaminated? A Comprehensive Survey and the LLMSanitize Library (August 2024). https://doi.org/10.48550/ arXiv.2404.00699, http://arxiv.org/abs/2404.00699, arXiv:2404.00699 [cs]
-
[46]
https://doi.org/10.48550/arXiv.2310.18018, http://arxiv.org/abs/2310.18018, arXiv:2310.18018 [cs]
Sainz, O., Campos, J.A., García-Ferrero, I., Etxaniz, J., Lacalle, O.L.d., Agirre, E.: NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each Benchmark (October 2023). https://doi.org/10.48550/arXiv.2310.18018, http://arxiv.org/abs/2310.18018, arXiv:2310.18018 [cs]
-
[47]
https://doi.org/10.48550/arXiv.2409.09927, http://arxiv
Samuel, V., Zhou, Y., Zou, H.P.: Towards Data Contamination Detection for Modern Large Language Models: Limitations, Inconsistencies, and Oracle Chal- lenges (December 2024). https://doi.org/10.48550/arXiv.2409.09927, http://arxiv. org/abs/2409.09927, arXiv:2409.09927 [cs]
-
[48]
Detecting Pretraining Data from Large Language Models
Shi, W., Ajith, A., Xia, M., Huang, Y., Liu, D., Blevins, T., Chen, D., Zettlemoyer, L.: Detecting Pretraining Data from Large Language Models (March 2024). https://doi.org/10.48550/arXiv.2310.16789, http://arxiv.org/abs/ 2310.16789, arXiv:2310.16789 [cs]
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.16789 2024
-
[50]
arXiv: Cryptography and Security (2016)
Shokri, R., Shokri, R., Stronati, M., Stronati, M., Stronati, M., Song, C., Song, C., Shmatikov, V., Shmatikov, V.: Membership inference attacks against machine learning models. arXiv: Cryptography and Security (2016)
2016
-
[51]
Gemini: A Family of Highly Capable Multimodal Models
Team, G., Anil, R., Borgeaud, S., Alayrac, J.B., Yu, J., Soricut, R., Schalkwyk, J., Dai, A.M., Hauth, A., Millican, K., et al.: Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[52]
Placeholder Journal (2025)
Tran, T., Liu, R., Xiong, L.: Tokens for learning, tokens for unlearning: Mitigating membership inference attacks in large language models via dual-purpose training. Placeholder Journal (2025)
2025
-
[53]
Nordic Conference of Computational Linguistics (2023) Title Suppressed Due to Excessive Length 11
Vakili, T., Dalianis, H.: Using membership inference attacks to evaluate privacy- preserving language modeling fails for pseudonymizing data. Nordic Conference of Computational Linguistics (2023) Title Suppressed Due to Excessive Length 11
2023
-
[54]
https://doi.org/10.48550/arXiv.2404.14296
Wan, Y., Wan, G., Zhang, S., Zhang, H., Zhou, P., Jin, H., Sun, L.: Does your neural code completion model use my code? a membership inference approach (2024). https://doi.org/10.48550/arXiv.2404.14296
-
[55]
Wang, J.G., Wang, J., Li, M., Neel, S.: Pandora’s white-box: Precise training data detection and extraction in large language models (2024)
2024
-
[56]
Wei, R., Li, M., Ghassemi, M., Kreavci’c, E., Li, Y., Yue, X., Li, B., Potluru, V.K., Li, P., Chien, E.: Underestimated privacy risks for minority populations in large language model unlearning. arXiv.org (2024). https://doi.org/10.48550/ arxiv.2412.08559
-
[57]
Neural Information Processing Systems (2024)
Wen, Y., Marchyok, L., Hong, S., Geiping, J., Goldstein, T., Carlini, N.: Pri- vacy backdoors: Enhancing membership inference through poisoning pre-trained models. Neural Information Processing Systems (2024). https://doi.org/10.48550/ arxiv.2404.01231
-
[58]
LiveBench: A Challenging, Contamination-Limited LLM Benchmark
White, C., Dooley, S., ManleyRoberts, Pal, A., Feuer, B., Jain, S., Shwartz- Ziv, R., Jain, N., Saifullah, K., Naidu, S., Hegde, C., LeCun, Y., Goldstein, T., Neiswanger,W.,Goldblum,M.,Abacus.AI,Nyu,Nvidia:Livebench:Achallenging, contamination-free llm benchmark. arXiv.org (2024). https://doi.org/10.48550/ arxiv.2406.19314
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[59]
Wu, X., Pan, L., Xie, Y., Zhou, R., Zhao, S., Ma, Y., Du, M., Mao, R., Luu, A., Wang, W.Y.: Antileak-bench: Preventing data contamination by automatically constructing benchmarks with updated real-world knowledge (2024). https://doi. org/10.48550/arXiv.2412.13670
-
[60]
Keiran Paster, Marco Dos Santos, Zhangir Azerbayev, and Jimmy Ba
Wu, Z., Lou, J., Zheng, Z., Chen, C.: Memhunter: Automated and verifiable memo- rization detection at dataset-scale in llms (2024). https://doi.org/10.48550/arXiv. 2412.07261
work page internal anchor Pith review doi:10.48550/arxiv 2024
-
[61]
Benchmark Data Contamination of Large Language Models: A Survey
Xu, C., Guan, S., Greene, D., Kechadi, M., et al.: Benchmark data contamination of large language models: A survey. arXiv preprint arXiv:2406.04244 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[62]
IEEE Trans- actions on Software Engineering (2023)
Yang, Z., Zhao, Z., Wang, C., Shi, J., Kim, D., Han, D., Lo, D.: Gotcha! this model uses my code! evaluating membership leakage risks in code models. IEEE Trans- actions on Software Engineering (2023). https://doi.org/10.1109/tse.2024.3482719
-
[63]
Conference on Empirical Methods in Natural Language Processing (2024)
Zhang, R., Bertran, M., Roth, A.: Order of magnitude speedups for llm member- ship inference. Conference on Empirical Methods in Natural Language Processing (2024). https://doi.org/10.48550/arxiv.2409.14513
-
[64]
https://doi.org/10.48550/arXiv.2312
Zhang, S., Li, H.: Code membership inference for detecting unauthorized data use in code pre-trained language models (2023). https://doi.org/10.48550/arXiv.2312. 07200
-
[65]
Zhao, S., Zhu, L., Quan, R., Yang, Y.: Protecting copyrighted material with unique identifiers in large language model training (2024)
2024
-
[66]
Zhao, Y.F., Zhang, J.: Does training with synthetic data truly protect privacy? (2025)
2025
-
[67]
https: //doi.org/10.48550/arXiv.2311.01964
Zhou, K., Zhu, Y., Chen, Z., Chen, W., Zhao, W.X., Chen, X., Lin, Y., Wen, J.R., Han, J.: Don’t make your llm an evaluation benchmark cheater (2023). https: //doi.org/10.48550/arXiv.2311.01964
-
[68]
Conference on Empirical Methods in Natural Language Pro- cessing (2024)
Zhu, Q., Cheng, Q., Peng, R., Li, X., Liu, T., Peng, R., Qiu, X., Huang, X.: Inference-time decontamination: Reusing leaked benchmarks for large language model evaluation. Conference on Empirical Methods in Natural Language Pro- cessing (2024). https://doi.org/10.48550/arxiv.2406.13990
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.