pith. sign in

arxiv: 2605.30462 · v1 · pith:BYGWOJXNnew · submitted 2026-05-28 · 💻 cs.LG · cs.AI

idSCD: Identifying Training Datasets through Semantic Correlation Descriptors

Pith reviewed 2026-06-29 08:33 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords semantic correlation descriptorsdataset membership inferencespurious correlationswhite-box attacknatural language inferenceemotion classificationmedical text classificationmembership inference attack
0
0 comments X

The pith

Models internalize dataset-specific spurious correlations that semantic correlation descriptors can extract to detect training dataset membership.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that training on a dataset imprints incidental semantic regularities that are predictive within that dataset but not causally required by the task. These traces form a comparable structure across models, allowing a white-box comparison between a trained model's semantic correlation descriptors and the standalone descriptors computed on a candidate dataset. In controlled leave-one-out tests the descriptors separate matching and non-matching pairs perfectly. Across natural language inference, emotion classification, and medical text tasks the resulting membership score outperforms black-box and white-box baselines while showing lower variance. The largest reported gain exceeds 60 percent ROC-AUC when the dataset groups carry distinct semantic particularities.

Core claim

Datasets leave dataset-specific traces in a model's learned semantic correlation structure; incidental regularities that are predictive within a dataset but not causal for the underlying task are internalized during training. Semantic correlation descriptors capture this structure and make it comparable across dataset mixtures. A practical membership score tests whether a target dataset belongs to the training mixture by comparing only the model's SCD with the target dataset's standalone SCD. In leave-one-dataset-out diagnostics the descriptors recover dataset-specific changes and perfectly separate matching from non-matching pairs. The SCD-based classifier achieves the highest performance a

What carries the argument

Semantic correlation descriptors (SCDs): vectors that encode the semantic correlation structure learned by a model, enabling direct numerical comparison between a trained model and any candidate dataset without requiring leave-one-out retraining.

If this is right

  • SCDs recover dataset-specific changes and perfectly separate matching from non-matching dataset pairs in leave-one-out diagnostics.
  • An SCD-based membership classifier attains the highest average performance and lowest standard deviation across NLI, emotion, and medical text groups.
  • The approach yields a relative ROC-AUC gain exceeding 60 percent when dataset groups expose distinct semantic particularities.
  • The score operates with only the model's SCD and the target dataset's standalone SCD, without black-box queries or loss access.
  • Performance holds across different degrees of semantic separation and keyword support between dataset splits.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If SCDs remain stable under fine-tuning, they could serve as persistent fingerprints for auditing deployed models' training data composition.
  • The method may extend to detecting data poisoning by checking whether injected correlations appear in the model's SCD.
  • Neighboring tasks such as detecting duplicate or near-duplicate training sources could be addressed by clustering SCD vectors across many candidate datasets.
  • If the descriptors prove sensitive to collection artifacts, they might also help quantify unintended dataset leakage in public model releases.

Load-bearing premise

That incidental semantic correlations learned during training are sufficiently dataset-specific and stable to enable reliable membership inference even across datasets with partial semantic overlap.

What would settle it

Run the SCD membership score on two datasets that share identical semantic regularities yet differ in labels or collection source; if the score cannot distinguish the held-out dataset from the training one, the central claim is falsified.

Figures

Figures reproduced from arXiv: 2605.30462 by Andrada Gobeaja, Elena Burceanu, Ionut Hodoroaga, Marius Leordeanu.

Figure 1
Figure 1. Figure 1: Dataset membership through semantic correlation fingerprints. The pipeline evaluates [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Dataset-specificity of correlation descriptors in the controlled difference-based setting. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: SOTA comparison of ROC curves for dataset membership inference. (a-c) show perfor [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
read the original abstract

Can a dataset be recognized from the spurious correlations it induces during training? We argue that datasets leave dataset-specific traces in a model's learned semantic correlation structure: incidental regularities that are predictive within a dataset, but not causal for the underlying task, can be internalized during training. We use this insight to study dataset-level membership inference, moving beyond existing methods that rely on behavioral or distributional evidence such as confidence scores, losses, margins, generated samples, or query responses. We introduce a white-box semantic fingerprinting approach based on semantic correlation descriptors (SCDs), which capture the semantic correlation structure learned by a model and make it comparable across dataset mixtures. In a controlled leave-one-dataset-out diagnostic, SCDs recover dataset-specific changes and perfectly separate matching from non-matching dataset pairs. We then propose a practical SCD-based membership score that tests whether a target dataset is part of a model's training mixture using only the model's SCD and the target dataset's standalone SCD, without requiring leave-one-dataset-out models. Across three diverse experimental settings, with dataset groups for natural language inference, emotion classification, and medical text classification, we test both the advantages and limitations of SCD-based membership inference with different degrees of semantic separation and keyword support between dataset splits. On average, the classifier based on this score achieves the highest performance and the lowest std, outperforming black-box baselines RMIA, Attack-P, and LiRA, as well as the white-box SIF baseline. These results show that dataset membership can be traced through internal semantic correlations, with the largest relative gain exceeding 60% in ROC-AUC when dataset groups expose distinct semantic particularities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper claims that datasets induce identifiable, dataset-specific incidental semantic correlations during training, which can be captured via semantic correlation descriptors (SCDs) for dataset-level membership inference. In a controlled leave-one-out diagnostic, SCDs recover these changes and achieve perfect separation between matching and non-matching dataset pairs. A practical SCD-based score, using only the model's SCD and the target dataset's standalone SCD, is then shown to outperform black-box baselines (RMIA, Attack-P, LiRA) and the white-box SIF baseline across NLI, emotion classification, and medical text tasks, with the largest relative ROC-AUC gain exceeding 60%.

Significance. If the results hold, the work offers a new white-box fingerprinting method for tracing dataset membership through internalized semantic structures rather than behavioral or distributional signals. The reported outperformance with lowest standard deviation and the explicit testing across varying degrees of semantic separation are strengths; the approach could complement existing membership inference techniques when datasets exhibit distinct semantic particularities.

major comments (2)
  1. [Abstract] Abstract: the claim that SCDs 'perfectly separate matching from non-matching dataset pairs' in the leave-one-out diagnostic is load-bearing for the central claim, yet the manuscript provides no quantitative details on the separation metric, threshold, or potential confounds from shared semantic patterns across dataset groups.
  2. [Abstract] Abstract and experimental settings: the practical SCD score's reported average outperformance (highest performance, lowest std) is tested 'with different degrees of semantic separation and keyword support,' but lacks reported per-setting ROC-AUC values, error bars, or an explicit ablation isolating whether incidental correlations (e.g., common NLI patterns) produce false positives when a dataset is absent, directly bearing on whether the gains over RMIA/LiRA/SIF generalize.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address each of the major comments point-by-point below. We agree that providing additional quantitative details will improve clarity and have planned revisions to the abstract and experimental sections accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that SCDs 'perfectly separate matching from non-matching dataset pairs' in the leave-one-out diagnostic is load-bearing for the central claim, yet the manuscript provides no quantitative details on the separation metric, threshold, or potential confounds from shared semantic patterns across dataset groups.

    Authors: We agree that the abstract would benefit from including quantitative details supporting the 'perfect separation' claim. In the revised manuscript, we will update the abstract to specify the separation metric (cosine similarity between SCDs) and note that perfect separation corresponds to an ROC-AUC of 1.0 in the leave-one-out diagnostic. We will also incorporate a brief discussion of potential confounds from shared semantic patterns, explaining how the experimental design with varying semantic separation addresses this. revision: yes

  2. Referee: [Abstract] Abstract and experimental settings: the practical SCD score's reported average outperformance (highest performance, lowest std) is tested 'with different degrees of semantic separation and keyword support,' but lacks reported per-setting ROC-AUC values, error bars, or an explicit ablation isolating whether incidental correlations (e.g., common NLI patterns) produce false positives when a dataset is absent, directly bearing on whether the gains over RMIA/LiRA/SIF generalize.

    Authors: We acknowledge that the abstract summarizes average performance and that per-setting details are important for assessing generalization. We will revise to include per-setting ROC-AUC values with error bars in the abstract or a new table reference. Regarding the explicit ablation for false positives due to incidental correlations in absent datasets, the current leave-one-out diagnostic provides evidence against false positives by showing clear separation. However, to directly isolate the effect of common patterns (e.g., NLI), we will add a dedicated ablation experiment in the revision. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical method validated against external baselines

full rationale

The paper introduces SCDs as a new descriptor for semantic correlations and validates membership inference empirically via leave-one-out diagnostics and direct comparisons to independent baselines (RMIA, LiRA, Attack-P, SIF). No equations or claims reduce by construction to fitted parameters, self-definitions, or self-citation chains; the separation performance and outperformance are measured outcomes on held-out dataset groups, not tautological renamings or imported uniqueness theorems. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities identifiable from abstract alone; method appears empirical.

pith-pipeline@v0.9.1-grok · 5864 in / 818 out tokens · 41236 ms · 2026-06-29T08:33:45.103580+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

52 extracted references · 40 canonical work pages · 9 internal anchors

  1. [1]

    Invariant Risk Minimization

    Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. Invariant risk minimization, 2020. URL https://arxiv.org/abs/1907.02893

  2. [2]

    Scalable membership inference attacks via quantile regression, 2023

    Martin Bertran, Shuai Tang, Michael Kearns, Jamie Morgenstern, Aaron Roth, and Zhiwei Steven Wu. Scalable membership inference attacks via quantile regression, 2023. URL https://arxiv.org/abs/2307.03694

  3. [3]

    Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

    Stella Biderman, Hailey Schoelkopf, Quentin Anthony, Herbie Bradley, Kyle O'Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, and Oskar van der Wal. Pythia: A suite for analyzing large language models across training and scaling, 2023. URL https://arxiv.org/abs/2304.01373

  4. [4]

    Yake! keyword extraction from single documents using multiple local features

    Ricardo Campos, V \' tor Mangaravite, Arian Pasquali, Al \' pio Jorge, C \'e lia Nunes, and Adam Jatowt. Yake! keyword extraction from single documents using multiple local features. Information Sciences, 509: 0 257--289, 2020

  5. [5]

    Membership inference attacks from first principles, 2022

    Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, and Florian Tramer. Membership inference attacks from first principles, 2022. URL https://arxiv.org/abs/2112.03570

  6. [6]

    Chang, Dheeraj Rajagopal, Tolga Bolukbasi, Lucas Dixon, and Ian Tenney

    Tyler A. Chang, Dheeraj Rajagopal, Tolga Bolukbasi, Lucas Dixon, and Ian Tenney. Scalable influence and fact tracing for large language model pretraining, 2024. URL https://arxiv.org/abs/2410.17413

  7. [7]

    Choquette-Choo, Florian Tramer, Nicholas Carlini, and Nicolas Papernot

    Christopher A. Choquette-Choo, Florian Tramer, Nicholas Carlini, and Nicolas Papernot. Label-only membership inference attacks, 2021. URL https://arxiv.org/abs/2007.14321

  8. [8]

    Membership inference attack using self influence functions, 2022

    Gilad Cohen and Raja Giryes. Membership inference attack using self influence functions, 2022. URL https://arxiv.org/abs/2205.13680

  9. [9]

    GoEmotions: A Dataset of Fine-Grained Emotions

    Dorottya Demszky, Dana Movshovitz-Attias, Jeongwoo Ko, Alan Cowen, Gaurav Nemade, and Sujith Ravi. GoEmotions: A Dataset of Fine-Grained Emotions . In 58th Annual Meeting of the Association for Computational Linguistics (ACL), 2020

  10. [10]

    PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts

    Franck Dernoncourt and Ji Young Lee. Pubmed 200k rct: a dataset for sequential sentence classification in medical abstracts. In International Joint Conference on Natural Language Processing, 2017. URL https://arxiv.org/abs/1710.06071

  11. [11]

    Shortcut learning of large language models in natural language understanding, 2023

    Mengnan Du, Fengxiang He, Na Zou, Dacheng Tao, and Xia Hu. Shortcut learning of large language models in natural language understanding, 2023. URL https://arxiv.org/abs/2208.11857

  12. [12]

    Not all splits are equal: Rethinking attribute generalization across unrelated categories

    Liviu Nicolae Firca, Antonio Barbalau, Dan Oneata, and Elena Burceanu. Not all splits are equal: Rethinking attribute generalization across unrelated categories. NeurIPSW, 2025

  13. [13]

    SMED : Social media emotion dataset

    Amrita Ganguly. SMED : Social media emotion dataset. Kaggle, 2023. URL https://www.kaggle.com/datasets/gangulyamrita/social-media-emotion-dataset. Five-class emotion dataset curated from social media

  14. [14]

    The ef-cambridge open language database (efcamdat)

    Jeroen Geertzen, Theodora Alexopoulou, and Anna Korhonen. The ef-cambridge open language database (efcamdat). Annual Review of Applied Linguistics, 33: 0 208--231, 2013

  15. [15]

    Zemel, Wieland Brendel, Matthias Bethge, and Felix A

    Robert Geirhos, J \" o rn - Henrik Jacobsen, Claudio Michaelis, Richard S. Zemel, Wieland Brendel, Matthias Bethge, and Felix A. Wichmann. Shortcut learning in deep neural networks. CoRR, abs/2004.07780, 2020. URL https://arxiv.org/abs/2004.07780

  16. [16]

    Annotation Artifacts in Natural Language Inference Data

    Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel R. Bowman, and Noah A. Smith. Annotation artifacts in natural language inference data, 2018. URL https://arxiv.org/abs/1803.02324

  17. [17]

    o ping Electronic Conference Proceedings , pages 115--129. Link \

    Thomas G Hudson and Sardar Jaf. On the development of a large scale corpus for native language identification. In Proceedings of the 17th International Workshop on Treebanks and Linguistic Theories (TLT 2018), December 13--14, 2018, Oslo University, Norway, volume 155 of Link \"o ping Electronic Conference Proceedings , pages 115--129. Link \"o ping Unive...

  18. [18]

    Datamodels: Predicting predictions from training data, 2022

    Andrew Ilyas, Sung Min Park, Logan Engstrom, Guillaume Leclerc, and Aleksander Madry. Datamodels: Predicting predictions from training data, 2022. URL https://arxiv.org/abs/2202.00622

  19. [19]

    On feature learning in the presence of spurious correlations, 2022

    Pavel Izmailov, Polina Kirichenko, Nate Gruver, and Andrew Gordon Wilson. On feature learning in the presence of spurious correlations, 2022. URL https://arxiv.org/abs/2210.11369

  20. [20]

    Understanding black-box predictions via influence functions, 2020

    Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions, 2020. URL https://arxiv.org/abs/1703.04730

  21. [21]

    Stolen memories: Leveraging model memorization for calibrated white-box membership inference, 2020

    Klas Leino and Matt Fredrikson. Stolen memories: Leveraging model memorization for calibrated white-box membership inference, 2020. URL https://arxiv.org/abs/1906.11798

  22. [22]

    Membership leakage in label-only exposures, 2021

    Zheng Li and Yang Zhang. Membership leakage in label-only exposures, 2021. URL https://arxiv.org/abs/2007.15528

  23. [23]

    Shortcut learning in medical image segmentation

    Manxi Lin, Nina Weng, Kamil Mikolaj, Zahra Bashir, Morten BS Svendsen, Martin G Tolsgaard, Anders N Christensen, and Aasa Feragen. Shortcut learning in medical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 623--633. Springer, 2024

  24. [24]

    Membership inference attacks by exploiting loss trajectory, 2022

    Yiyong Liu, Zhengyu Zhao, Michael Backes, and Yang Zhang. Membership inference attacks by exploiting loss trajectory, 2022. URL https://arxiv.org/abs/2208.14933

  25. [25]

    Llm dataset inference: Did you train on my dataset?, 2024

    Pratyush Maini, Hengrui Jia, Nicolas Papernot, and Adam Dziedzic. Llm dataset inference: Did you train on my dataset?, 2024. URL https://arxiv.org/abs/2406.06443

  26. [26]

    Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference

    R. Thomas McCoy, Ellie Pavlick, and Tal Linzen. Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference, 2019. URL https://arxiv.org/abs/1902.01007

  27. [27]

    Machine Learning with Membership Privacy using Adversarial Regularization

    Milad Nasr, Reza Shokri, and Amir Houmansadr. Machine learning with membership privacy using adversarial regularization, 2018. URL https://arxiv.org/abs/1807.05852

  28. [28]

    Hugo Frezat, Julien Le Sommer, Ronan Fablet, Guillaume Balarac, and Redouane Lguen- sat

    Milad Nasr, Reza Shokri, and Amir Houmansadr. Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning. In 2019 IEEE Symposium on Security and Privacy (SP), page 739–753. IEEE, May 2019. doi:10.1109/sp.2019.00065. URL http://dx.doi.org/10.1109/SP.2019.00065

  29. [29]

    Probing neural network comprehension of natural language arguments, 2019

    Timothy Niven and Hung-Yu Kao. Probing neural network comprehension of natural language arguments, 2019. URL https://arxiv.org/abs/1907.07355

  30. [30]

    O hman, Marc P \`a mies, Kaisla Kajava, and J \

    Emily \"O hman, Marc P \`a mies, Kaisla Kajava, and J \"o rg Tiedemann. XED : A multilingual dataset for sentiment analysis and emotion detection. In The 28th International Conference on Computational Linguistics (COLING 2020), 2020

  31. [31]

    Bridging explainability and embeddings: Bee aware of spuriousness

    Cristian Daniel Paduraru, Antonio Barbalau, Radu Filipescu, Andrei Liviu Nicolicioiu, and Elena Burceanu. Bridging explainability and embeddings: Bee aware of spuriousness. In ICLR , 2026

  32. [32]

    Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, and Michael P. Wellman. Sok: Security and privacy in machine learning. In 2018 IEEE European Symposium on Security and Privacy (EuroS&P), pages 399--414, 2018. doi:10.1109/EuroSP.2018.00035

  33. [33]

    Trak: Attributing model behavior at scale.arXiv preprint arXiv:2303.14186, 2023

    Sung Min Park, Kristian Georgiev, Andrew Ilyas, Guillaume Leclerc, and Aleksander Madry. Trak: Attributing model behavior at scale, 2023. URL https://arxiv.org/abs/2303.14186

  34. [34]

    Oslo: One-shot label-only membership inference attacks, 2024

    Yuefeng Peng, Jaechul Roh, Subhransu Maji, and Amir Houmansadr. Oslo: One-shot label-only membership inference attacks, 2024. URL https://arxiv.org/abs/2405.16978

  35. [35]

    20-emotion text classification dataset, 2025

    Shreyas Pulle. 20-emotion text classification dataset, 2025. URL https://huggingface.co/datasets/shreyaspulle98/emotion-dataset-20-emotions

  36. [36]

    Native language cognate effects on second language lexical choice

    Ella Rabinovich, Yulia Bogdanova, Dimitra Laparidou, Svitlana Volkova, and Yulia Tsvetkov. Native language cognate effects on second language lexical choice. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2745--2755, Brussels, Belgium, October 2018. Association for Computational Linguistics. doi:10.18653/v...

  37. [37]

    Detecting fine-grained emotions in literature

    Luis Rei and Dunja Mladenić. Detecting fine-grained emotions in literature. Applied Sciences, 13 0 (13), 2023. ISSN 2076-3417. doi:10.3390/app13137502. URL https://www.mdpi.com/2076-3417/13/13/7502

  38. [38]

    A survey of privacy attacks in machine learning

    Maria Rigaki and Sebastian Garcia. A survey of privacy attacks in machine learning. ACM Computing Surveys, 56 0 (4): 0 1–34, November 2023. ISSN 1557-7341. doi:10.1145/3624010. URL http://dx.doi.org/10.1145/3624010

  39. [39]

    White-box vs black-box: Bayes optimal strategies for membership inference, 2019

    Alexandre Sablayrolles, Matthijs Douze, Yann Ollivier, Cordelia Schmid, and Hervé Jégou. White-box vs black-box: Bayes optimal strategies for membership inference, 2019. URL https://arxiv.org/abs/1908.11229

  40. [40]

    ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models

    Ahmed Salem, Yang Zhang, Mathias Humbert, Pascal Berrang, Mario Fritz, and Michael Backes. Ml-leaks: Model and data independent membership inference attacks and defenses on machine learning models, 2018. URL https://arxiv.org/abs/1806.01246

  41. [41]

    Web-scale k-means clustering

    David Sculley. Web-scale k-means clustering. In Proceedings of the 19th International Conference on World Wide Web, pages 1177--1178. ACM, 2010

  42. [42]

    Membership Inference Attacks against Machine Learning Models

    Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models, 2017. URL https://arxiv.org/abs/1610.05820

  43. [43]

    Systematic evaluation of privacy risks of machine learning models, 2020

    Liwei Song and Prateek Mittal. Systematic evaluation of privacy risks of machine learning models, 2020. URL https://arxiv.org/abs/2003.10595

  44. [44]

    A statistical interpretation of term specificity and its application in retrieval

    Karen Sp \"a rck Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28 0 (1): 0 11--21, 1972

  45. [45]

    Enhancing training data attribution with representational optimization, 2025

    Weiwei Sun, Haokun Liu, Nikhil Kandpal, Colin Raffel, and Yiming Yang. Enhancing training data attribution with representational optimization, 2025. URL https://arxiv.org/abs/2505.18513

  46. [46]

    Wang, Prateek Mittal, Dawn Song, and Ruoxi Jia

    Jiachen T. Wang, Prateek Mittal, Dawn Song, and Ruoxi Jia. Data shapley in one training run, 2025. URL https://arxiv.org/abs/2406.11011

  47. [47]

    Causal attention for unbiased visual recognition, 2021

    Tan Wang, Chang Zhou, Qianru Sun, and Hanwang Zhang. Causal attention for unbiased visual recognition, 2021. URL https://arxiv.org/abs/2108.08782

  48. [48]

    Understanding rare spurious correlations in neural networks, 2022

    Yao-Yuan Yang, Chi-Ning Chou, and Kamalika Chaudhuri. Understanding rare spurious correlations in neural networks, 2022. URL https://arxiv.org/abs/2202.05189

  49. [49]

    Enhanced membership inference attacks against machine learning models, 2022

    Jiayuan Ye, Aadyaa Maddi, Sasi Kumar Murakonda, Vincent Bindschaedler, and Reza Shokri. Enhanced membership inference attacks against machine learning models, 2022. URL https://arxiv.org/abs/2111.09679

  50. [50]

    The clever hans mirage: A comprehensive survey on spurious correlations in machine learning

    Wenqian Ye, Luyang Jiang, Eric Xie, Guangtao Zheng, Yunsheng Ma, Xu Cao, Dongliang Guo, Daiqing Qi, Zeyu He, Yijun Tian, et al. The clever hans mirage: A comprehensive survey on spurious correlations in machine learning. arXiv preprint arXiv:2402.12715, 2024

  51. [51]

    Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting

    Samuel Yeom, Irene Giacomelli, Matt Fredrikson, and Somesh Jha. Privacy risk in machine learning: Analyzing the connection to overfitting, 2018. URL https://arxiv.org/abs/1709.01604

  52. [52]

    Low-cost high-power membership inference attacks, 2024

    Sajjad Zarifzadeh, Philippe Liu, and Reza Shokri. Low-cost high-power membership inference attacks, 2024. URL https://arxiv.org/abs/2312.03262