idSCD: Identifying Training Datasets through Semantic Correlation Descriptors

Andrada Gobeaja; Elena Burceanu; Ionut Hodoroaga; Marius Leordeanu

arxiv: 2605.30462 · v1 · pith:BYGWOJXNnew · submitted 2026-05-28 · 💻 cs.LG · cs.AI

idSCD: Identifying Training Datasets through Semantic Correlation Descriptors

Andrada Gobeaja , Ionut Hodoroaga , Elena Burceanu , Marius Leordeanu This is my paper

Pith reviewed 2026-06-29 08:33 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords semantic correlation descriptorsdataset membership inferencespurious correlationswhite-box attacknatural language inferenceemotion classificationmedical text classificationmembership inference attack

0 comments

The pith

Models internalize dataset-specific spurious correlations that semantic correlation descriptors can extract to detect training dataset membership.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that training on a dataset imprints incidental semantic regularities that are predictive within that dataset but not causally required by the task. These traces form a comparable structure across models, allowing a white-box comparison between a trained model's semantic correlation descriptors and the standalone descriptors computed on a candidate dataset. In controlled leave-one-out tests the descriptors separate matching and non-matching pairs perfectly. Across natural language inference, emotion classification, and medical text tasks the resulting membership score outperforms black-box and white-box baselines while showing lower variance. The largest reported gain exceeds 60 percent ROC-AUC when the dataset groups carry distinct semantic particularities.

Core claim

Datasets leave dataset-specific traces in a model's learned semantic correlation structure; incidental regularities that are predictive within a dataset but not causal for the underlying task are internalized during training. Semantic correlation descriptors capture this structure and make it comparable across dataset mixtures. A practical membership score tests whether a target dataset belongs to the training mixture by comparing only the model's SCD with the target dataset's standalone SCD. In leave-one-dataset-out diagnostics the descriptors recover dataset-specific changes and perfectly separate matching from non-matching pairs. The SCD-based classifier achieves the highest performance a

What carries the argument

Semantic correlation descriptors (SCDs): vectors that encode the semantic correlation structure learned by a model, enabling direct numerical comparison between a trained model and any candidate dataset without requiring leave-one-out retraining.

If this is right

SCDs recover dataset-specific changes and perfectly separate matching from non-matching dataset pairs in leave-one-out diagnostics.
An SCD-based membership classifier attains the highest average performance and lowest standard deviation across NLI, emotion, and medical text groups.
The approach yields a relative ROC-AUC gain exceeding 60 percent when dataset groups expose distinct semantic particularities.
The score operates with only the model's SCD and the target dataset's standalone SCD, without black-box queries or loss access.
Performance holds across different degrees of semantic separation and keyword support between dataset splits.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If SCDs remain stable under fine-tuning, they could serve as persistent fingerprints for auditing deployed models' training data composition.
The method may extend to detecting data poisoning by checking whether injected correlations appear in the model's SCD.
Neighboring tasks such as detecting duplicate or near-duplicate training sources could be addressed by clustering SCD vectors across many candidate datasets.
If the descriptors prove sensitive to collection artifacts, they might also help quantify unintended dataset leakage in public model releases.

Load-bearing premise

That incidental semantic correlations learned during training are sufficiently dataset-specific and stable to enable reliable membership inference even across datasets with partial semantic overlap.

What would settle it

Run the SCD membership score on two datasets that share identical semantic regularities yet differ in labels or collection source; if the score cannot distinguish the held-out dataset from the training one, the central claim is falsified.

Figures

Figures reproduced from arXiv: 2605.30462 by Andrada Gobeaja, Elena Burceanu, Ionut Hodoroaga, Marius Leordeanu.

**Figure 2.** Figure 2: Dataset-specificity of correlation descriptors in the controlled difference-based setting. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: SOTA comparison of ROC curves for dataset membership inference. (a-c) show perfor [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

Can a dataset be recognized from the spurious correlations it induces during training? We argue that datasets leave dataset-specific traces in a model's learned semantic correlation structure: incidental regularities that are predictive within a dataset, but not causal for the underlying task, can be internalized during training. We use this insight to study dataset-level membership inference, moving beyond existing methods that rely on behavioral or distributional evidence such as confidence scores, losses, margins, generated samples, or query responses. We introduce a white-box semantic fingerprinting approach based on semantic correlation descriptors (SCDs), which capture the semantic correlation structure learned by a model and make it comparable across dataset mixtures. In a controlled leave-one-dataset-out diagnostic, SCDs recover dataset-specific changes and perfectly separate matching from non-matching dataset pairs. We then propose a practical SCD-based membership score that tests whether a target dataset is part of a model's training mixture using only the model's SCD and the target dataset's standalone SCD, without requiring leave-one-dataset-out models. Across three diverse experimental settings, with dataset groups for natural language inference, emotion classification, and medical text classification, we test both the advantages and limitations of SCD-based membership inference with different degrees of semantic separation and keyword support between dataset splits. On average, the classifier based on this score achieves the highest performance and the lowest std, outperforming black-box baselines RMIA, Attack-P, and LiRA, as well as the white-box SIF baseline. These results show that dataset membership can be traced through internal semantic correlations, with the largest relative gain exceeding 60% in ROC-AUC when dataset groups expose distinct semantic particularities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SCDs give a plausible new internal handle on dataset membership but the specificity under semantic overlap still needs tighter controls.

read the letter

The paper's core move is to treat incidental semantic correlations as dataset fingerprints. Instead of looking at losses or outputs, they extract SCDs from the model and compare them to a target dataset's standalone correlations to infer membership.

What stands out is the controlled leave-one-out diagnostic: SCDs achieve perfect separation between matching and non-matching pairs. The practical score then beats RMIA, Attack-P, LiRA, and the SIF baseline on average across NLI, emotion, and medical text settings, with the biggest lift over 60% ROC-AUC when semantic particularities are distinct.

The approach is genuinely different from the behavioral baselines they cite, and the white-box framing is a reasonable next step. The experiments test varying degrees of keyword support and semantic separation, which shows some awareness of the limits.

The soft spot is exactly the one the stress-test flags. The claim that SCDs encode correlations unique to the target dataset rests on the assumption that shared incidental patterns (common NLI structures or medical phrasing) won't produce false positives when a dataset is absent. The leave-one-out result is clean, but the practical score's robustness across partial overlap is only sketched; without explicit isolation of those confounds or error bars on the reported gains, the outperformance looks promising but not yet conclusive.

This is for people working on membership inference and data auditing who want to move past output-only attacks. It has enough of a fresh angle and empirical signal to deserve a serious referee, even if the next round will need tighter controls on semantic overlap and more transparent metrics.

Referee Report

2 major / 0 minor

Summary. The paper claims that datasets induce identifiable, dataset-specific incidental semantic correlations during training, which can be captured via semantic correlation descriptors (SCDs) for dataset-level membership inference. In a controlled leave-one-out diagnostic, SCDs recover these changes and achieve perfect separation between matching and non-matching dataset pairs. A practical SCD-based score, using only the model's SCD and the target dataset's standalone SCD, is then shown to outperform black-box baselines (RMIA, Attack-P, LiRA) and the white-box SIF baseline across NLI, emotion classification, and medical text tasks, with the largest relative ROC-AUC gain exceeding 60%.

Significance. If the results hold, the work offers a new white-box fingerprinting method for tracing dataset membership through internalized semantic structures rather than behavioral or distributional signals. The reported outperformance with lowest standard deviation and the explicit testing across varying degrees of semantic separation are strengths; the approach could complement existing membership inference techniques when datasets exhibit distinct semantic particularities.

major comments (2)

[Abstract] Abstract: the claim that SCDs 'perfectly separate matching from non-matching dataset pairs' in the leave-one-out diagnostic is load-bearing for the central claim, yet the manuscript provides no quantitative details on the separation metric, threshold, or potential confounds from shared semantic patterns across dataset groups.
[Abstract] Abstract and experimental settings: the practical SCD score's reported average outperformance (highest performance, lowest std) is tested 'with different degrees of semantic separation and keyword support,' but lacks reported per-setting ROC-AUC values, error bars, or an explicit ablation isolating whether incidental correlations (e.g., common NLI patterns) produce false positives when a dataset is absent, directly bearing on whether the gains over RMIA/LiRA/SIF generalize.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address each of the major comments point-by-point below. We agree that providing additional quantitative details will improve clarity and have planned revisions to the abstract and experimental sections accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that SCDs 'perfectly separate matching from non-matching dataset pairs' in the leave-one-out diagnostic is load-bearing for the central claim, yet the manuscript provides no quantitative details on the separation metric, threshold, or potential confounds from shared semantic patterns across dataset groups.

Authors: We agree that the abstract would benefit from including quantitative details supporting the 'perfect separation' claim. In the revised manuscript, we will update the abstract to specify the separation metric (cosine similarity between SCDs) and note that perfect separation corresponds to an ROC-AUC of 1.0 in the leave-one-out diagnostic. We will also incorporate a brief discussion of potential confounds from shared semantic patterns, explaining how the experimental design with varying semantic separation addresses this. revision: yes
Referee: [Abstract] Abstract and experimental settings: the practical SCD score's reported average outperformance (highest performance, lowest std) is tested 'with different degrees of semantic separation and keyword support,' but lacks reported per-setting ROC-AUC values, error bars, or an explicit ablation isolating whether incidental correlations (e.g., common NLI patterns) produce false positives when a dataset is absent, directly bearing on whether the gains over RMIA/LiRA/SIF generalize.

Authors: We acknowledge that the abstract summarizes average performance and that per-setting details are important for assessing generalization. We will revise to include per-setting ROC-AUC values with error bars in the abstract or a new table reference. Regarding the explicit ablation for false positives due to incidental correlations in absent datasets, the current leave-one-out diagnostic provides evidence against false positives by showing clear separation. However, to directly isolate the effect of common patterns (e.g., NLI), we will add a dedicated ablation experiment in the revision. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical method validated against external baselines

full rationale

The paper introduces SCDs as a new descriptor for semantic correlations and validates membership inference empirically via leave-one-out diagnostics and direct comparisons to independent baselines (RMIA, LiRA, Attack-P, SIF). No equations or claims reduce by construction to fitted parameters, self-definitions, or self-citation chains; the separation performance and outperformance are measured outcomes on held-out dataset groups, not tautological renamings or imported uniqueness theorems. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities identifiable from abstract alone; method appears empirical.

pith-pipeline@v0.9.1-grok · 5864 in / 818 out tokens · 41236 ms · 2026-06-29T08:33:45.103580+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 40 canonical work pages · 9 internal anchors

[1]

Invariant Risk Minimization

Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. Invariant risk minimization, 2020. URL https://arxiv.org/abs/1907.02893

work page internal anchor Pith review Pith/arXiv arXiv 2020
[2]

Scalable membership inference attacks via quantile regression, 2023

Martin Bertran, Shuai Tang, Michael Kearns, Jamie Morgenstern, Aaron Roth, and Zhiwei Steven Wu. Scalable membership inference attacks via quantile regression, 2023. URL https://arxiv.org/abs/2307.03694

work page arXiv 2023
[3]

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Stella Biderman, Hailey Schoelkopf, Quentin Anthony, Herbie Bradley, Kyle O'Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, and Oskar van der Wal. Pythia: A suite for analyzing large language models across training and scaling, 2023. URL https://arxiv.org/abs/2304.01373

work page internal anchor Pith review Pith/arXiv arXiv 2023
[4]

Yake! keyword extraction from single documents using multiple local features

Ricardo Campos, V \' tor Mangaravite, Arian Pasquali, Al \' pio Jorge, C \'e lia Nunes, and Adam Jatowt. Yake! keyword extraction from single documents using multiple local features. Information Sciences, 509: 0 257--289, 2020

2020
[5]

Membership inference attacks from first principles, 2022

Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, and Florian Tramer. Membership inference attacks from first principles, 2022. URL https://arxiv.org/abs/2112.03570

work page arXiv 2022
[6]

Chang, Dheeraj Rajagopal, Tolga Bolukbasi, Lucas Dixon, and Ian Tenney

Tyler A. Chang, Dheeraj Rajagopal, Tolga Bolukbasi, Lucas Dixon, and Ian Tenney. Scalable influence and fact tracing for large language model pretraining, 2024. URL https://arxiv.org/abs/2410.17413

work page arXiv 2024
[7]

Choquette-Choo, Florian Tramer, Nicholas Carlini, and Nicolas Papernot

Christopher A. Choquette-Choo, Florian Tramer, Nicholas Carlini, and Nicolas Papernot. Label-only membership inference attacks, 2021. URL https://arxiv.org/abs/2007.14321

work page arXiv 2021
[8]

Membership inference attack using self influence functions, 2022

Gilad Cohen and Raja Giryes. Membership inference attack using self influence functions, 2022. URL https://arxiv.org/abs/2205.13680

work page arXiv 2022
[9]

GoEmotions: A Dataset of Fine-Grained Emotions

Dorottya Demszky, Dana Movshovitz-Attias, Jeongwoo Ko, Alan Cowen, Gaurav Nemade, and Sujith Ravi. GoEmotions: A Dataset of Fine-Grained Emotions . In 58th Annual Meeting of the Association for Computational Linguistics (ACL), 2020

2020
[10]

PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts

Franck Dernoncourt and Ji Young Lee. Pubmed 200k rct: a dataset for sequential sentence classification in medical abstracts. In International Joint Conference on Natural Language Processing, 2017. URL https://arxiv.org/abs/1710.06071

work page internal anchor Pith review Pith/arXiv arXiv 2017
[11]

Shortcut learning of large language models in natural language understanding, 2023

Mengnan Du, Fengxiang He, Na Zou, Dacheng Tao, and Xia Hu. Shortcut learning of large language models in natural language understanding, 2023. URL https://arxiv.org/abs/2208.11857

work page arXiv 2023
[12]

Not all splits are equal: Rethinking attribute generalization across unrelated categories

Liviu Nicolae Firca, Antonio Barbalau, Dan Oneata, and Elena Burceanu. Not all splits are equal: Rethinking attribute generalization across unrelated categories. NeurIPSW, 2025

2025
[13]

SMED : Social media emotion dataset

Amrita Ganguly. SMED : Social media emotion dataset. Kaggle, 2023. URL https://www.kaggle.com/datasets/gangulyamrita/social-media-emotion-dataset. Five-class emotion dataset curated from social media

2023
[14]

The ef-cambridge open language database (efcamdat)

Jeroen Geertzen, Theodora Alexopoulou, and Anna Korhonen. The ef-cambridge open language database (efcamdat). Annual Review of Applied Linguistics, 33: 0 208--231, 2013

2013
[15]

Zemel, Wieland Brendel, Matthias Bethge, and Felix A

Robert Geirhos, J \" o rn - Henrik Jacobsen, Claudio Michaelis, Richard S. Zemel, Wieland Brendel, Matthias Bethge, and Felix A. Wichmann. Shortcut learning in deep neural networks. CoRR, abs/2004.07780, 2020. URL https://arxiv.org/abs/2004.07780

work page arXiv 2004
[16]

Annotation Artifacts in Natural Language Inference Data

Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel R. Bowman, and Noah A. Smith. Annotation artifacts in natural language inference data, 2018. URL https://arxiv.org/abs/1803.02324

work page internal anchor Pith review Pith/arXiv arXiv 2018
[17]

o ping Electronic Conference Proceedings , pages 115--129. Link \

Thomas G Hudson and Sardar Jaf. On the development of a large scale corpus for native language identification. In Proceedings of the 17th International Workshop on Treebanks and Linguistic Theories (TLT 2018), December 13--14, 2018, Oslo University, Norway, volume 155 of Link \"o ping Electronic Conference Proceedings , pages 115--129. Link \"o ping Unive...

2018
[18]

Datamodels: Predicting predictions from training data, 2022

Andrew Ilyas, Sung Min Park, Logan Engstrom, Guillaume Leclerc, and Aleksander Madry. Datamodels: Predicting predictions from training data, 2022. URL https://arxiv.org/abs/2202.00622

work page arXiv 2022
[19]

On feature learning in the presence of spurious correlations, 2022

Pavel Izmailov, Polina Kirichenko, Nate Gruver, and Andrew Gordon Wilson. On feature learning in the presence of spurious correlations, 2022. URL https://arxiv.org/abs/2210.11369

work page arXiv 2022
[20]

Understanding black-box predictions via influence functions, 2020

Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions, 2020. URL https://arxiv.org/abs/1703.04730

work page arXiv 2020
[21]

Stolen memories: Leveraging model memorization for calibrated white-box membership inference, 2020

Klas Leino and Matt Fredrikson. Stolen memories: Leveraging model memorization for calibrated white-box membership inference, 2020. URL https://arxiv.org/abs/1906.11798

work page arXiv 2020
[22]

Membership leakage in label-only exposures, 2021

Zheng Li and Yang Zhang. Membership leakage in label-only exposures, 2021. URL https://arxiv.org/abs/2007.15528

work page arXiv 2021
[23]

Shortcut learning in medical image segmentation

Manxi Lin, Nina Weng, Kamil Mikolaj, Zahra Bashir, Morten BS Svendsen, Martin G Tolsgaard, Anders N Christensen, and Aasa Feragen. Shortcut learning in medical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 623--633. Springer, 2024

2024
[24]

Membership inference attacks by exploiting loss trajectory, 2022

Yiyong Liu, Zhengyu Zhao, Michael Backes, and Yang Zhang. Membership inference attacks by exploiting loss trajectory, 2022. URL https://arxiv.org/abs/2208.14933

work page arXiv 2022
[25]

Llm dataset inference: Did you train on my dataset?, 2024

Pratyush Maini, Hengrui Jia, Nicolas Papernot, and Adam Dziedzic. Llm dataset inference: Did you train on my dataset?, 2024. URL https://arxiv.org/abs/2406.06443

work page arXiv 2024
[26]

Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference

R. Thomas McCoy, Ellie Pavlick, and Tal Linzen. Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference, 2019. URL https://arxiv.org/abs/1902.01007

work page internal anchor Pith review Pith/arXiv arXiv 2019
[27]

Machine Learning with Membership Privacy using Adversarial Regularization

Milad Nasr, Reza Shokri, and Amir Houmansadr. Machine learning with membership privacy using adversarial regularization, 2018. URL https://arxiv.org/abs/1807.05852

work page internal anchor Pith review Pith/arXiv arXiv 2018
[28]

Hugo Frezat, Julien Le Sommer, Ronan Fablet, Guillaume Balarac, and Redouane Lguen- sat

Milad Nasr, Reza Shokri, and Amir Houmansadr. Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning. In 2019 IEEE Symposium on Security and Privacy (SP), page 739–753. IEEE, May 2019. doi:10.1109/sp.2019.00065. URL http://dx.doi.org/10.1109/SP.2019.00065

work page doi:10.1109/sp.2019.00065 2019
[29]

Probing neural network comprehension of natural language arguments, 2019

Timothy Niven and Hung-Yu Kao. Probing neural network comprehension of natural language arguments, 2019. URL https://arxiv.org/abs/1907.07355

work page arXiv 2019
[30]

O hman, Marc P \`a mies, Kaisla Kajava, and J \

Emily \"O hman, Marc P \`a mies, Kaisla Kajava, and J \"o rg Tiedemann. XED : A multilingual dataset for sentiment analysis and emotion detection. In The 28th International Conference on Computational Linguistics (COLING 2020), 2020

2020
[31]

Bridging explainability and embeddings: Bee aware of spuriousness

Cristian Daniel Paduraru, Antonio Barbalau, Radu Filipescu, Andrei Liviu Nicolicioiu, and Elena Burceanu. Bridging explainability and embeddings: Bee aware of spuriousness. In ICLR , 2026

2026
[32]

Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, and Michael P. Wellman. Sok: Security and privacy in machine learning. In 2018 IEEE European Symposium on Security and Privacy (EuroS&P), pages 399--414, 2018. doi:10.1109/EuroSP.2018.00035

work page doi:10.1109/eurosp.2018.00035 2018
[33]

Trak: Attributing model behavior at scale.arXiv preprint arXiv:2303.14186, 2023

Sung Min Park, Kristian Georgiev, Andrew Ilyas, Guillaume Leclerc, and Aleksander Madry. Trak: Attributing model behavior at scale, 2023. URL https://arxiv.org/abs/2303.14186

work page arXiv 2023
[34]

Oslo: One-shot label-only membership inference attacks, 2024

Yuefeng Peng, Jaechul Roh, Subhransu Maji, and Amir Houmansadr. Oslo: One-shot label-only membership inference attacks, 2024. URL https://arxiv.org/abs/2405.16978

work page arXiv 2024
[35]

20-emotion text classification dataset, 2025

Shreyas Pulle. 20-emotion text classification dataset, 2025. URL https://huggingface.co/datasets/shreyaspulle98/emotion-dataset-20-emotions

2025
[36]

Native language cognate effects on second language lexical choice

Ella Rabinovich, Yulia Bogdanova, Dimitra Laparidou, Svitlana Volkova, and Yulia Tsvetkov. Native language cognate effects on second language lexical choice. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2745--2755, Brussels, Belgium, October 2018. Association for Computational Linguistics. doi:10.18653/v...

work page doi:10.18653/v1/d18-1296 2018
[37]

Detecting fine-grained emotions in literature

Luis Rei and Dunja Mladenić. Detecting fine-grained emotions in literature. Applied Sciences, 13 0 (13), 2023. ISSN 2076-3417. doi:10.3390/app13137502. URL https://www.mdpi.com/2076-3417/13/13/7502

work page doi:10.3390/app13137502 2023
[38]

A survey of privacy attacks in machine learning

Maria Rigaki and Sebastian Garcia. A survey of privacy attacks in machine learning. ACM Computing Surveys, 56 0 (4): 0 1–34, November 2023. ISSN 1557-7341. doi:10.1145/3624010. URL http://dx.doi.org/10.1145/3624010

work page doi:10.1145/3624010 2023
[39]

White-box vs black-box: Bayes optimal strategies for membership inference, 2019

Alexandre Sablayrolles, Matthijs Douze, Yann Ollivier, Cordelia Schmid, and Hervé Jégou. White-box vs black-box: Bayes optimal strategies for membership inference, 2019. URL https://arxiv.org/abs/1908.11229

work page arXiv 2019
[40]

ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models

Ahmed Salem, Yang Zhang, Mathias Humbert, Pascal Berrang, Mario Fritz, and Michael Backes. Ml-leaks: Model and data independent membership inference attacks and defenses on machine learning models, 2018. URL https://arxiv.org/abs/1806.01246

work page internal anchor Pith review Pith/arXiv arXiv 2018
[41]

Web-scale k-means clustering

David Sculley. Web-scale k-means clustering. In Proceedings of the 19th International Conference on World Wide Web, pages 1177--1178. ACM, 2010

2010
[42]

Membership Inference Attacks against Machine Learning Models

Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models, 2017. URL https://arxiv.org/abs/1610.05820

work page internal anchor Pith review Pith/arXiv arXiv 2017
[43]

Systematic evaluation of privacy risks of machine learning models, 2020

Liwei Song and Prateek Mittal. Systematic evaluation of privacy risks of machine learning models, 2020. URL https://arxiv.org/abs/2003.10595

work page arXiv 2020
[44]

A statistical interpretation of term specificity and its application in retrieval

Karen Sp \"a rck Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28 0 (1): 0 11--21, 1972

1972
[45]

Enhancing training data attribution with representational optimization, 2025

Weiwei Sun, Haokun Liu, Nikhil Kandpal, Colin Raffel, and Yiming Yang. Enhancing training data attribution with representational optimization, 2025. URL https://arxiv.org/abs/2505.18513

work page arXiv 2025
[46]

Wang, Prateek Mittal, Dawn Song, and Ruoxi Jia

Jiachen T. Wang, Prateek Mittal, Dawn Song, and Ruoxi Jia. Data shapley in one training run, 2025. URL https://arxiv.org/abs/2406.11011

work page arXiv 2025
[47]

Causal attention for unbiased visual recognition, 2021

Tan Wang, Chang Zhou, Qianru Sun, and Hanwang Zhang. Causal attention for unbiased visual recognition, 2021. URL https://arxiv.org/abs/2108.08782

work page arXiv 2021
[48]

Understanding rare spurious correlations in neural networks, 2022

Yao-Yuan Yang, Chi-Ning Chou, and Kamalika Chaudhuri. Understanding rare spurious correlations in neural networks, 2022. URL https://arxiv.org/abs/2202.05189

work page arXiv 2022
[49]

Enhanced membership inference attacks against machine learning models, 2022

Jiayuan Ye, Aadyaa Maddi, Sasi Kumar Murakonda, Vincent Bindschaedler, and Reza Shokri. Enhanced membership inference attacks against machine learning models, 2022. URL https://arxiv.org/abs/2111.09679

work page arXiv 2022
[50]

The clever hans mirage: A comprehensive survey on spurious correlations in machine learning

Wenqian Ye, Luyang Jiang, Eric Xie, Guangtao Zheng, Yunsheng Ma, Xu Cao, Dongliang Guo, Daiqing Qi, Zeyu He, Yijun Tian, et al. The clever hans mirage: A comprehensive survey on spurious correlations in machine learning. arXiv preprint arXiv:2402.12715, 2024

work page arXiv 2024
[51]

Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting

Samuel Yeom, Irene Giacomelli, Matt Fredrikson, and Somesh Jha. Privacy risk in machine learning: Analyzing the connection to overfitting, 2018. URL https://arxiv.org/abs/1709.01604

work page internal anchor Pith review Pith/arXiv arXiv 2018
[52]

Low-cost high-power membership inference attacks, 2024

Sajjad Zarifzadeh, Philippe Liu, and Reza Shokri. Low-cost high-power membership inference attacks, 2024. URL https://arxiv.org/abs/2312.03262

work page arXiv 2024

[1] [1]

Invariant Risk Minimization

Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. Invariant risk minimization, 2020. URL https://arxiv.org/abs/1907.02893

work page internal anchor Pith review Pith/arXiv arXiv 2020

[2] [2]

Scalable membership inference attacks via quantile regression, 2023

Martin Bertran, Shuai Tang, Michael Kearns, Jamie Morgenstern, Aaron Roth, and Zhiwei Steven Wu. Scalable membership inference attacks via quantile regression, 2023. URL https://arxiv.org/abs/2307.03694

work page arXiv 2023

[3] [3]

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Stella Biderman, Hailey Schoelkopf, Quentin Anthony, Herbie Bradley, Kyle O'Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, and Oskar van der Wal. Pythia: A suite for analyzing large language models across training and scaling, 2023. URL https://arxiv.org/abs/2304.01373

work page internal anchor Pith review Pith/arXiv arXiv 2023

[4] [4]

Yake! keyword extraction from single documents using multiple local features

Ricardo Campos, V \' tor Mangaravite, Arian Pasquali, Al \' pio Jorge, C \'e lia Nunes, and Adam Jatowt. Yake! keyword extraction from single documents using multiple local features. Information Sciences, 509: 0 257--289, 2020

2020

[5] [5]

Membership inference attacks from first principles, 2022

Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, and Florian Tramer. Membership inference attacks from first principles, 2022. URL https://arxiv.org/abs/2112.03570

work page arXiv 2022

[6] [6]

Chang, Dheeraj Rajagopal, Tolga Bolukbasi, Lucas Dixon, and Ian Tenney

Tyler A. Chang, Dheeraj Rajagopal, Tolga Bolukbasi, Lucas Dixon, and Ian Tenney. Scalable influence and fact tracing for large language model pretraining, 2024. URL https://arxiv.org/abs/2410.17413

work page arXiv 2024

[7] [7]

Choquette-Choo, Florian Tramer, Nicholas Carlini, and Nicolas Papernot

Christopher A. Choquette-Choo, Florian Tramer, Nicholas Carlini, and Nicolas Papernot. Label-only membership inference attacks, 2021. URL https://arxiv.org/abs/2007.14321

work page arXiv 2021

[8] [8]

Membership inference attack using self influence functions, 2022

Gilad Cohen and Raja Giryes. Membership inference attack using self influence functions, 2022. URL https://arxiv.org/abs/2205.13680

work page arXiv 2022

[9] [9]

GoEmotions: A Dataset of Fine-Grained Emotions

Dorottya Demszky, Dana Movshovitz-Attias, Jeongwoo Ko, Alan Cowen, Gaurav Nemade, and Sujith Ravi. GoEmotions: A Dataset of Fine-Grained Emotions . In 58th Annual Meeting of the Association for Computational Linguistics (ACL), 2020

2020

[10] [10]

PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts

Franck Dernoncourt and Ji Young Lee. Pubmed 200k rct: a dataset for sequential sentence classification in medical abstracts. In International Joint Conference on Natural Language Processing, 2017. URL https://arxiv.org/abs/1710.06071

work page internal anchor Pith review Pith/arXiv arXiv 2017

[11] [11]

Shortcut learning of large language models in natural language understanding, 2023

Mengnan Du, Fengxiang He, Na Zou, Dacheng Tao, and Xia Hu. Shortcut learning of large language models in natural language understanding, 2023. URL https://arxiv.org/abs/2208.11857

work page arXiv 2023

[12] [12]

Not all splits are equal: Rethinking attribute generalization across unrelated categories

Liviu Nicolae Firca, Antonio Barbalau, Dan Oneata, and Elena Burceanu. Not all splits are equal: Rethinking attribute generalization across unrelated categories. NeurIPSW, 2025

2025

[13] [13]

SMED : Social media emotion dataset

Amrita Ganguly. SMED : Social media emotion dataset. Kaggle, 2023. URL https://www.kaggle.com/datasets/gangulyamrita/social-media-emotion-dataset. Five-class emotion dataset curated from social media

2023

[14] [14]

The ef-cambridge open language database (efcamdat)

Jeroen Geertzen, Theodora Alexopoulou, and Anna Korhonen. The ef-cambridge open language database (efcamdat). Annual Review of Applied Linguistics, 33: 0 208--231, 2013

2013

[15] [15]

Zemel, Wieland Brendel, Matthias Bethge, and Felix A

Robert Geirhos, J \" o rn - Henrik Jacobsen, Claudio Michaelis, Richard S. Zemel, Wieland Brendel, Matthias Bethge, and Felix A. Wichmann. Shortcut learning in deep neural networks. CoRR, abs/2004.07780, 2020. URL https://arxiv.org/abs/2004.07780

work page arXiv 2004

[16] [16]

Annotation Artifacts in Natural Language Inference Data

Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel R. Bowman, and Noah A. Smith. Annotation artifacts in natural language inference data, 2018. URL https://arxiv.org/abs/1803.02324

work page internal anchor Pith review Pith/arXiv arXiv 2018

[17] [17]

o ping Electronic Conference Proceedings , pages 115--129. Link \

Thomas G Hudson and Sardar Jaf. On the development of a large scale corpus for native language identification. In Proceedings of the 17th International Workshop on Treebanks and Linguistic Theories (TLT 2018), December 13--14, 2018, Oslo University, Norway, volume 155 of Link \"o ping Electronic Conference Proceedings , pages 115--129. Link \"o ping Unive...

2018

[18] [18]

Datamodels: Predicting predictions from training data, 2022

Andrew Ilyas, Sung Min Park, Logan Engstrom, Guillaume Leclerc, and Aleksander Madry. Datamodels: Predicting predictions from training data, 2022. URL https://arxiv.org/abs/2202.00622

work page arXiv 2022

[19] [19]

On feature learning in the presence of spurious correlations, 2022

Pavel Izmailov, Polina Kirichenko, Nate Gruver, and Andrew Gordon Wilson. On feature learning in the presence of spurious correlations, 2022. URL https://arxiv.org/abs/2210.11369

work page arXiv 2022

[20] [20]

Understanding black-box predictions via influence functions, 2020

Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions, 2020. URL https://arxiv.org/abs/1703.04730

work page arXiv 2020

[21] [21]

Stolen memories: Leveraging model memorization for calibrated white-box membership inference, 2020

Klas Leino and Matt Fredrikson. Stolen memories: Leveraging model memorization for calibrated white-box membership inference, 2020. URL https://arxiv.org/abs/1906.11798

work page arXiv 2020

[22] [22]

Membership leakage in label-only exposures, 2021

Zheng Li and Yang Zhang. Membership leakage in label-only exposures, 2021. URL https://arxiv.org/abs/2007.15528

work page arXiv 2021

[23] [23]

Shortcut learning in medical image segmentation

Manxi Lin, Nina Weng, Kamil Mikolaj, Zahra Bashir, Morten BS Svendsen, Martin G Tolsgaard, Anders N Christensen, and Aasa Feragen. Shortcut learning in medical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 623--633. Springer, 2024

2024

[24] [24]

Membership inference attacks by exploiting loss trajectory, 2022

Yiyong Liu, Zhengyu Zhao, Michael Backes, and Yang Zhang. Membership inference attacks by exploiting loss trajectory, 2022. URL https://arxiv.org/abs/2208.14933

work page arXiv 2022

[25] [25]

Llm dataset inference: Did you train on my dataset?, 2024

Pratyush Maini, Hengrui Jia, Nicolas Papernot, and Adam Dziedzic. Llm dataset inference: Did you train on my dataset?, 2024. URL https://arxiv.org/abs/2406.06443

work page arXiv 2024

[26] [26]

Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference

R. Thomas McCoy, Ellie Pavlick, and Tal Linzen. Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference, 2019. URL https://arxiv.org/abs/1902.01007

work page internal anchor Pith review Pith/arXiv arXiv 2019

[27] [27]

Machine Learning with Membership Privacy using Adversarial Regularization

Milad Nasr, Reza Shokri, and Amir Houmansadr. Machine learning with membership privacy using adversarial regularization, 2018. URL https://arxiv.org/abs/1807.05852

work page internal anchor Pith review Pith/arXiv arXiv 2018

[28] [28]

Hugo Frezat, Julien Le Sommer, Ronan Fablet, Guillaume Balarac, and Redouane Lguen- sat

Milad Nasr, Reza Shokri, and Amir Houmansadr. Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning. In 2019 IEEE Symposium on Security and Privacy (SP), page 739–753. IEEE, May 2019. doi:10.1109/sp.2019.00065. URL http://dx.doi.org/10.1109/SP.2019.00065

work page doi:10.1109/sp.2019.00065 2019

[29] [29]

Probing neural network comprehension of natural language arguments, 2019

Timothy Niven and Hung-Yu Kao. Probing neural network comprehension of natural language arguments, 2019. URL https://arxiv.org/abs/1907.07355

work page arXiv 2019

[30] [30]

O hman, Marc P \`a mies, Kaisla Kajava, and J \

Emily \"O hman, Marc P \`a mies, Kaisla Kajava, and J \"o rg Tiedemann. XED : A multilingual dataset for sentiment analysis and emotion detection. In The 28th International Conference on Computational Linguistics (COLING 2020), 2020

2020

[31] [31]

Bridging explainability and embeddings: Bee aware of spuriousness

Cristian Daniel Paduraru, Antonio Barbalau, Radu Filipescu, Andrei Liviu Nicolicioiu, and Elena Burceanu. Bridging explainability and embeddings: Bee aware of spuriousness. In ICLR , 2026

2026

[32] [32]

Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, and Michael P. Wellman. Sok: Security and privacy in machine learning. In 2018 IEEE European Symposium on Security and Privacy (EuroS&P), pages 399--414, 2018. doi:10.1109/EuroSP.2018.00035

work page doi:10.1109/eurosp.2018.00035 2018

[33] [33]

Trak: Attributing model behavior at scale.arXiv preprint arXiv:2303.14186, 2023

Sung Min Park, Kristian Georgiev, Andrew Ilyas, Guillaume Leclerc, and Aleksander Madry. Trak: Attributing model behavior at scale, 2023. URL https://arxiv.org/abs/2303.14186

work page arXiv 2023

[34] [34]

Oslo: One-shot label-only membership inference attacks, 2024

Yuefeng Peng, Jaechul Roh, Subhransu Maji, and Amir Houmansadr. Oslo: One-shot label-only membership inference attacks, 2024. URL https://arxiv.org/abs/2405.16978

work page arXiv 2024

[35] [35]

20-emotion text classification dataset, 2025

Shreyas Pulle. 20-emotion text classification dataset, 2025. URL https://huggingface.co/datasets/shreyaspulle98/emotion-dataset-20-emotions

2025

[36] [36]

Native language cognate effects on second language lexical choice

Ella Rabinovich, Yulia Bogdanova, Dimitra Laparidou, Svitlana Volkova, and Yulia Tsvetkov. Native language cognate effects on second language lexical choice. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2745--2755, Brussels, Belgium, October 2018. Association for Computational Linguistics. doi:10.18653/v...

work page doi:10.18653/v1/d18-1296 2018

[37] [37]

Detecting fine-grained emotions in literature

Luis Rei and Dunja Mladenić. Detecting fine-grained emotions in literature. Applied Sciences, 13 0 (13), 2023. ISSN 2076-3417. doi:10.3390/app13137502. URL https://www.mdpi.com/2076-3417/13/13/7502

work page doi:10.3390/app13137502 2023

[38] [38]

A survey of privacy attacks in machine learning

Maria Rigaki and Sebastian Garcia. A survey of privacy attacks in machine learning. ACM Computing Surveys, 56 0 (4): 0 1–34, November 2023. ISSN 1557-7341. doi:10.1145/3624010. URL http://dx.doi.org/10.1145/3624010

work page doi:10.1145/3624010 2023

[39] [39]

White-box vs black-box: Bayes optimal strategies for membership inference, 2019

Alexandre Sablayrolles, Matthijs Douze, Yann Ollivier, Cordelia Schmid, and Hervé Jégou. White-box vs black-box: Bayes optimal strategies for membership inference, 2019. URL https://arxiv.org/abs/1908.11229

work page arXiv 2019

[40] [40]

ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models

Ahmed Salem, Yang Zhang, Mathias Humbert, Pascal Berrang, Mario Fritz, and Michael Backes. Ml-leaks: Model and data independent membership inference attacks and defenses on machine learning models, 2018. URL https://arxiv.org/abs/1806.01246

work page internal anchor Pith review Pith/arXiv arXiv 2018

[41] [41]

Web-scale k-means clustering

David Sculley. Web-scale k-means clustering. In Proceedings of the 19th International Conference on World Wide Web, pages 1177--1178. ACM, 2010

2010

[42] [42]

Membership Inference Attacks against Machine Learning Models

Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models, 2017. URL https://arxiv.org/abs/1610.05820

work page internal anchor Pith review Pith/arXiv arXiv 2017

[43] [43]

Systematic evaluation of privacy risks of machine learning models, 2020

Liwei Song and Prateek Mittal. Systematic evaluation of privacy risks of machine learning models, 2020. URL https://arxiv.org/abs/2003.10595

work page arXiv 2020

[44] [44]

A statistical interpretation of term specificity and its application in retrieval

Karen Sp \"a rck Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28 0 (1): 0 11--21, 1972

1972

[45] [45]

Enhancing training data attribution with representational optimization, 2025

Weiwei Sun, Haokun Liu, Nikhil Kandpal, Colin Raffel, and Yiming Yang. Enhancing training data attribution with representational optimization, 2025. URL https://arxiv.org/abs/2505.18513

work page arXiv 2025

[46] [46]

Wang, Prateek Mittal, Dawn Song, and Ruoxi Jia

Jiachen T. Wang, Prateek Mittal, Dawn Song, and Ruoxi Jia. Data shapley in one training run, 2025. URL https://arxiv.org/abs/2406.11011

work page arXiv 2025

[47] [47]

Causal attention for unbiased visual recognition, 2021

Tan Wang, Chang Zhou, Qianru Sun, and Hanwang Zhang. Causal attention for unbiased visual recognition, 2021. URL https://arxiv.org/abs/2108.08782

work page arXiv 2021

[48] [48]

Understanding rare spurious correlations in neural networks, 2022

Yao-Yuan Yang, Chi-Ning Chou, and Kamalika Chaudhuri. Understanding rare spurious correlations in neural networks, 2022. URL https://arxiv.org/abs/2202.05189

work page arXiv 2022

[49] [49]

Enhanced membership inference attacks against machine learning models, 2022

Jiayuan Ye, Aadyaa Maddi, Sasi Kumar Murakonda, Vincent Bindschaedler, and Reza Shokri. Enhanced membership inference attacks against machine learning models, 2022. URL https://arxiv.org/abs/2111.09679

work page arXiv 2022

[50] [50]

The clever hans mirage: A comprehensive survey on spurious correlations in machine learning

Wenqian Ye, Luyang Jiang, Eric Xie, Guangtao Zheng, Yunsheng Ma, Xu Cao, Dongliang Guo, Daiqing Qi, Zeyu He, Yijun Tian, et al. The clever hans mirage: A comprehensive survey on spurious correlations in machine learning. arXiv preprint arXiv:2402.12715, 2024

work page arXiv 2024

[51] [51]

Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting

Samuel Yeom, Irene Giacomelli, Matt Fredrikson, and Somesh Jha. Privacy risk in machine learning: Analyzing the connection to overfitting, 2018. URL https://arxiv.org/abs/1709.01604

work page internal anchor Pith review Pith/arXiv arXiv 2018

[52] [52]

Low-cost high-power membership inference attacks, 2024

Sajjad Zarifzadeh, Philippe Liu, and Reza Shokri. Low-cost high-power membership inference attacks, 2024. URL https://arxiv.org/abs/2312.03262

work page arXiv 2024