Min-K% Prob detects pretraining data in LLMs by flagging outlier low-probability words in text, achieving 7.4% better performance than prior methods on the new WIKIMIA benchmark.
arXiv preprint arXiv:2304.06929 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
A masked-token hit-rate comparison method detects pretraining data membership in black-box LLMs with performance comparable to white-box approaches.
TADP-RME adapts the privacy budget via inverse trust scores in [0,1] and uses reverse manifold embedding to reduce inference attack success rates by up to 3.1% while preserving formal differential privacy guarantees.
citing papers explorer
-
Detecting Pretraining Data from Large Language Models
Min-K% Prob detects pretraining data in LLMs by flagging outlier low-probability words in text, achieving 7.4% better performance than prior methods on the new WIKIMIA benchmark.
-
MC-PDD: Masked Corpus-Level Pretraining Data Detection for Black-Box Large Language Models
A masked-token hit-rate comparison method detects pretraining data membership in black-box LLMs with performance comparable to white-box approaches.
-
TADP-RME: A Trust-Adaptive Differential Privacy Framework for Enhancing Reliability of Data-Driven Systems
TADP-RME adapts the privacy budget via inverse trust scores in [0,1] and uses reverse manifold embedding to reduce inference attack success rates by up to 3.1% while preserving formal differential privacy guarantees.