Recognition: 2 theorem links
· Lean TheoremMembership Inference Attacks for Retrieval Based In-Context Learning for Document Question Answering
Pith reviewed 2026-05-08 18:39 UTC · model grok-4.3
The pith
Retrieval-based in-context learning systems leak whether specific documents are in the retrieval database through simple prefix queries.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Black-box membership-inference attacks on retrieval-augmented in-context learning for document question answering can be carried out by exploiting statistics on prefixes of the user query; a novel weighted-averaging scheme produces a membership score without requiring a reference model and maintains effectiveness against paraphrased member text.
What carries the argument
Prefix-based membership statistic that measures how retrieval similarity changes when successive prefixes of the query are supplied, either via reference-model loss or direct weighted averaging.
If this is right
- Remote services that combine retrieval with in-context learning expose private membership information about their document collections.
- The attacks succeed with only a small number of prefixes and against paraphrased inputs.
- A simple ensemble-prompting defense substantially lowers leakage from the weighted-average attack.
- The new attacks outperform three prior membership-inference methods on this task in many evaluated cases.
Where Pith is reading between the lines
- The same prefix-scoring idea could be tested on other retrieval-augmented generation tasks beyond question answering.
- Randomizing or adding noise to the retrieval ranking might be a practical countermeasure worth measuring.
- The reference-model-free weighted-average statistic may apply to other black-box settings where loss values are unavailable.
Load-bearing premise
The retrieval function picks examples by similarity to the query such that prefix statistics can still separate member documents from non-members even after the text has been paraphrased.
What would settle it
Running the attacks on a retrieval system whose similarity function has been replaced by uniform random selection and finding that accuracy falls to chance level.
Figures
read the original abstract
We show that remotely hosted applications employing in-context learning when augmented with a retrieval function to select in-context examples can be vulnerable to membership-inference attacks even when the service provider and users are separate parties. We propose two black-box membership inference attacks that exploit query text prefixes to distinguish member from non-member inputs. The first attack uses a reference model to estimate an otherwise unavailable loss metric. The second attack improves upon it by eliminating the reference model and instead computing a membership statistic through a simple but novel weighted-averaging scheme. Our comprehensive empirical evaluations consider a stricter case in which the adversary has a paraphrased version of the text in the queries and show that our attacks can exhibit stronger resilience to paraphrasing and outperform three prior attacks in many cases with small number of prefixes. We also adapt an existing ensemble prompting defense to our setting, demonstrating that it substantially mitigates the privacy leakage caused by our second attack.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes two black-box membership inference attacks on retrieval-augmented in-context learning systems for document question answering. The attacks exploit statistics computed over query text prefixes to distinguish member from non-member documents. The first attack employs a reference model to estimate an unavailable loss; the second replaces the reference model with a novel weighted-averaging scheme over prefix statistics. Comprehensive experiments on paraphrased queries demonstrate that both attacks remain effective, outperform three prior attacks in many settings even with few prefixes, and that an adapted ensemble-prompting defense substantially reduces leakage from the second attack.
Significance. If the empirical results hold, the work identifies a concrete privacy risk in practical RAG-ICL deployments where the service provider and end users are distinct parties. The stricter paraphrased-query threat model and the demonstration that attacks succeed with small numbers of prefixes are practically relevant. The reference-model-free weighted-averaging attack and the adapted defense are constructive contributions. The manuscript supplies reproducible empirical evaluations and falsifiable attack definitions that can be tested on other retrievers and corpora.
major comments (2)
- [§4 and abstract] §4 (Empirical Evaluations) and the abstract: the central claim that the attacks exhibit 'stronger resilience to paraphrasing' and outperform prior attacks rests on the retrieval function continuing to surface member documents preferentially on the basis of prefix statistics even after semantic rewriting. The manuscript does not report whether the retriever is lexical or embedding-based, nor does it include an ablation that replaces the retriever with a standard semantic embedding model while keeping the same paraphrases. If embedding-based retrieval is used, paraphrases can preserve similarity scores while scrambling prefix distributions, which would make the observed outperformance an artifact of the specific retriever rather than a general property of prefix-based attacks.
- [§3] §3 (Attack 2, weighted-averaging scheme): the membership statistic is defined via a simple weighted average over prefixes, yet the manuscript does not specify how the weights are computed or whether they depend on any statistics of the query distribution. If the weights are derived from the same corpus that the adversary is trying to attack, the scheme is no longer strictly black-box with respect to the target retrieval corpus; this must be clarified because it directly affects the attack's claimed practicality.
minor comments (2)
- All tables reporting AUC or accuracy should include the exact number of prefixes used, the paraphrasing method, and the retrieval model (including embedding dimension or lexical metric) so that the 'small number of prefixes' claim can be reproduced.
- The description of the adapted ensemble-prompting defense should include the exact prompt templates and the number of ensemble members so that the mitigation results can be verified independently.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major point below and will incorporate clarifications and revisions into the next version of the manuscript.
read point-by-point responses
-
Referee: [§4 and abstract] §4 (Empirical Evaluations) and the abstract: the central claim that the attacks exhibit 'stronger resilience to paraphrasing' and outperform prior attacks rests on the retrieval function continuing to surface member documents preferentially on the basis of prefix statistics even after semantic rewriting. The manuscript does not report whether the retriever is lexical or embedding-based, nor does it include an ablation that replaces the retriever with a standard semantic embedding model while keeping the same paraphrases. If embedding-based retrieval is used, paraphrases can preserve similarity scores while scrambling prefix distributions, which would make the observed outperformance an artifact of the specific retriever rather than a general property of prefix-based attacks.
Authors: We agree that the retriever type must be explicitly stated to allow proper interpretation of the paraphrasing results. We will revise §4 and the abstract to clearly report the retrieval function used in all experiments. We will also add a discussion of the implications for lexical versus embedding-based retrievers and note that our empirical claims are tied to the evaluated retrieval setup. An ablation replacing the retriever with a standard semantic embedding model while reusing the same paraphrases would strengthen generality; we will include this as a new experiment in the revision if feasible, or otherwise expand the limitations section to address the concern directly. revision: partial
-
Referee: [§3] §3 (Attack 2, weighted-averaging scheme): the membership statistic is defined via a simple weighted average over prefixes, yet the manuscript does not specify how the weights are computed or whether they depend on any statistics of the query distribution. If the weights are derived from the same corpus that the adversary is trying to attack, the scheme is no longer strictly black-box with respect to the target retrieval corpus; this must be clarified because it directly affects the attack's claimed practicality.
Authors: We thank the referee for highlighting this ambiguity. The weights are computed using only the lengths of the available query prefixes (longer prefixes receive proportionally higher weight, normalized to sum to one) and do not depend on any statistics from the target retrieval corpus or the query distribution of the attacked system. No corpus-specific information is required or used. We will revise §3 to include the precise weighting formula and an explicit statement that the attack remains strictly black-box with respect to the target corpus. revision: yes
Circularity Check
No significant circularity in attack definitions or empirical claims
full rationale
The paper defines its two black-box membership inference attacks via direct computations: one using a reference model to estimate loss on query prefixes, and the second via a weighted-averaging scheme on model outputs. These are algorithmic procedures, not mathematical derivations. The central claims rest on empirical evaluations (including paraphrased-query settings and comparisons to three prior attacks) rather than any equations, fitted parameters renamed as predictions, or self-citation chains that bear the load. No steps exhibit self-definitional loops, ansatz smuggling, or uniqueness theorems imported from the authors' prior work. The work is self-contained against external benchmarks and matches the reader's assessment of non-circular empirical construction.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost (J(x)=½(x+x⁻¹)−1)washburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The functionϕis the most interesting part of the attack... We would like our score function to amplify early signals and provide diminishing returns for the subsequent answers. This can be achieved by using a decaying function such asϕ(i) = 1/i or 1/log(i).
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
In-context ex- amples selection for machine translation
Sweta Agrawal, Chunting Zhou, Mike Lewis, Luke Zettlemoyer, and Marjan Ghazvininejad. In-context ex- amples selection for machine translation. InFindings of the Association for Computational Linguistics: ACL 2023, pages 8857–8873. Association for Computational Linguistics, 2023
2023
-
[2]
The llama 3 herd of models, 2024
AI@Meta. The llama 3 herd of models, 2024
2024
-
[3]
Private prediction for large-scale synthetic text generation
Kareem Amin, Alex Bie, Weiwei Kong, Alexey Kurakin, Natalia Ponomareva, Umar Syed, Andreas Terzis, and Sergei Vassilvitskii. Private prediction for large-scale synthetic text generation. InFindings of the Associa- tion for Computational Linguistics: EMNLP 2024, pages 7244–7262, 2024
2024
-
[4]
Stella Biderman, Kieran Bicheno, and Leo Gao. Datasheet for the pile.arXiv preprint arXiv:2201.07311, 2022
-
[5]
Pythia: A suite for analyzing large language models across training and scaling
Stella Biderman, Hailey Schoelkopf, Quentin Anthony, Herbie Bradley, Kyle O’Brien, Eric Hallahan, Moham- mad Aflah Khan, Shivanshu Purohit, USVSN Sai Singh, , , et al. Pythia: A suite for analyzing large language models across training and scaling. InProceedings of the 40th International Conference on Machine Learning, volume 202, pages 2399–2415. PMLR, 2023
2023
-
[6]
impact of sample selection on in-context learning for entity extraction from scientific writing
Necva B ¨ol¨uc¨u, Maciej Rybinski, and Stephen Wan. impact of sample selection on in-context learning for entity extraction from scientific writing. InFindings of the Association for Computational Linguistics: EMNLP 2023, December 2023
2023
-
[7]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Dario Amodei, Alec Radford, Ilya Sutskever, and Jack Cla...
-
[8]
Membership in- ference attacks from first principles
Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, and Florian Tram `er. Membership in- ference attacks from first principles. In2022 IEEE Symposium on Security and Privacy (SP), pages 1897– 1914, 2022
1914
-
[9]
Brown, Dawn Xiaodong Song, ´Ulfar Erlingsson, Alina Oprea, and Colin Raffel
Nicholas Carlini, Florian Tram `er, Eric Wallace, Matthew Jagielski, Ariel Herbert-V oss, Katherine Lee, Adam Roberts, Tom B. Brown, Dawn Xiaodong Song, ´Ulfar Erlingsson, Alina Oprea, and Colin Raffel. Extracting training data from large language models. InUSENIX Security Symposium, 2020
2020
-
[10]
Choquette-Choo, Matthew Jagielski, Mi- lad Nasr, Eric Wallace, and Florian Tram `er
Edoardo Debenedetti, Giorgio Severi, Nicholas Carlini, Christopher A. Choquette-Choo, Matthew Jagielski, Mi- lad Nasr, Eric Wallace, and Florian Tram `er. Privacy side channels in machine learning systems. InProceedings of the 33rd USENIX Conference on Security Symposium, USA, 2024. USENIX Association
2024
-
[11]
Gemma: Open Models Based on Gemini Research and Technology
Google DeepMind and Google. Gemma: Open models based on gemini research and technology.arXiv preprint arXiv:2403.08295, 2024
work page internal anchor Pith review arXiv 2024
-
[12]
A survey on in-context learning
Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, Xu Sun, Lei Li, and Zhifang Sui. A survey on in-context learning. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, November 2024
2024
-
[13]
The faiss library
Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazar ´e, Maria Lomeli, Lucas Hosseini, and Herv ´e J ´egou. The faiss library. 2024
2024
-
[14]
Flocks of stochastic parrots: Dif- ferentially private prompt learning for large language models
Haonan Duan, Adam Dziedzic, Nicolas Papernot, and Franziska Boenisch. Flocks of stochastic parrots: Dif- ferentially private prompt learning for large language models. InAdvances in Neural Information Processing TABLE VIII THE TABLE COMPARES CROSS-VALIDATED METRICS FOR THE SECOND ATTACK AGAINST1-SHOTICLFOR SEVERAL VALUES OFmWHEN THE ENSEMBLE PROMPTING DEF...
2023
-
[15]
On the privacy risk of in-context learning, 2024
Haonan Duan, Adam Dziedzic, Mohammad Yaghini, Nicolas Papernot, and Franziska Boenisch. On the privacy risk of in-context learning, 2024
2024
-
[16]
Do membership inference attacks work on large language models? InConference on Language Modeling (COLM), 2024
Michael Duan, Anshuman Suri, Niloofar Mireshghal- lah, Sewon Min, Weijia Shi, Luke Zettlemoyer, Yulia Tsvetkov, Yejin Choi, David Evans, and Hannaneh Ha- jishirzi. Do membership inference attacks work on large language models? InConference on Language Modeling (COLM), 2024
2024
-
[17]
Data-adaptive differentially private prompt synthesis for in-context learning
Fengyu Gao, Ruida Zhou, Tianhao Wang, Cong Shen, and Jing Yang. Data-adaptive differentially private prompt synthesis for in-context learning. InThe Thir- teenth International Conference on Learning Represen- tations, 2025
2025
-
[18]
Demystifying prompts in language models via perplexity estimation
Hila Gonen, Srini Iyer, Terra Blevins, Noah Smith, and Luke Zettlemoyer. Demystifying prompts in language models via perplexity estimation. InFindings of the As- sociation for Computational Linguistics: EMNLP 2023, pages 10136–10148. Association for Computational Lin- guistics, December 2023
2023
-
[19]
Jiawei Gu, Xuhui Jiang, Zhichao Shi, Hexiang Tan, Xue- hao Zhai, Chengjin Xu, Wei Li, Yinghan Shen, Shengjie Ma, Honghao Liu, Yuanzhuo Wang, and Jian Guo. A survey on llm-as-a-judge.CoRR, abs/2411.15594, 2024
work page Pith review arXiv 2024
-
[20]
Choquette-Choo, and Zheng Xu
Nikhil Kandpal, Krishna Pillutla, Alina Oprea, Peter Kairouz, Christopher A. Choquette-Choo, and Zheng Xu. User inference attacks on large language models. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, November 2024
2024
-
[21]
Differ- entially private in-context learning with nearest neighbor search, 2025
Antti Koskela, Tejas Kulkarni, and Laith Zumot. Differ- entially private in-context learning with nearest neighbor search, 2025
2025
-
[22]
Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, and Weizhu Chen. What makes good in-context examples for GPT-3? InProceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Work- shop on Knowledge Extraction and Integration for Deep Learning Architectures. Association for Computational Linguistics, May 2022
2022
-
[23]
Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity
Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel, and Pontus Stenetorp. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8086–8098, 2022
2022
-
[24]
Sum- mQA at MEDIQA-chat 2023: In-context learning with GPT-4 for medical summarization
Yash Mathur, Sanketh Rangreji, Raghav Kapoor, Medha Palavalli, Amanda Bertsch, and Matthew Gormley. Sum- mQA at MEDIQA-chat 2023: In-context learning with GPT-4 for medical summarization. InProceedings of the 5th Clinical Natural Language Processing Workshop, pages 490–502. Association for Computational Linguis- tics, July 2023
2023
-
[25]
Membership inference attacks against language models via neighbourhood comparison
Justus Mattern, Fatemehsadat Mireshghallah, Zhijing Jin, Bernhard Schoelkopf, Mrinmaya Sachan, and Taylor Berg-Kirkpatrick. Membership inference attacks against language models via neighbourhood comparison. In Findings of the Association for Computational Linguis- tics: ACL 2023. Association for Computational Linguis- tics, 2023
2023
-
[26]
The effect of natural distribution shift on question answering models
John Miller, Karl Krauth, Benjamin Recht, and Ludwig Schmidt. The effect of natural distribution shift on question answering models. InProceedings of the 37th International Conference on Machine Learning, volume 119 ofProceedings of Machine Learning Research, pages 6905–6916. PMLR, 2020
2020
-
[27]
Quan- tifying privacy risks of masked language models using membership inference attacks
Fatemehsadat Mireshghallah, Kartik Goyal, Archit Uniyal, Taylor Berg-Kirkpatrick, and Reza Shokri. Quan- tifying privacy risks of masked language models using membership inference attacks. InProceedings of the 2022 Conference on Empirical Methods in Natural Lan- guage Processing, pages 8332–8347, December 2022
2022
-
[28]
Med-flamingo: a multimodal medical few-shot learner
Michael Moor, Qian Huang, Shirley Wu, Michihiro Ya- sunaga, Yash Dalmia, Jure Leskovec, Cyril Zakka, Ed- uardo Pontes Reis, and Pranav Rajpurkar. Med-flamingo: a multimodal medical few-shot learner. InProceedings of the 3rd Machine Learning for Health Symposium, Proceedings of Machine Learning Research, 2023
2023
-
[29]
Know what you don’t know: Unanswerable questions for SQuAD
Pranav Rajpurkar, Robin Jia, and Percy Liang. Know what you don’t know: Unanswerable questions for SQuAD. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Vol- ume 2: Short Papers), Melbourne, Australia, July 2018. Association for Computational Linguistics
2018
-
[30]
Sentence-bert: Sen- tence embeddings using siamese bert-networks
Nils Reimers and Iryna Gurevych. Sentence-bert: Sen- tence embeddings using siamese bert-networks. InPro- ceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th Interna- tional Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992. Association for Computational Linguistics, 2019
2019
-
[31]
Membership inference attacks against machine learning models
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In2017 IEEE Symposium on Security and Privacy (SP), pages 3–18, 2017
2017
-
[32]
Privacy- preserving in-context learning with differentially private few-shot generation
Xinyu Tang, Richard Shin, Huseyin A Inan, Andre Ma- noel, Fatemehsadat Mireshghallah, Zinan Lin, Sivakanth Gopi, Janardhan Kulkarni, and Robert Sim. Privacy- preserving in-context learning with differentially private few-shot generation. InThe Twelfth International Con- ference on Learning Representations, 2024
2024
-
[33]
NewsQA: A machine comprehension dataset
Adam Trischler, Tong Wang, Xingdi Yuan, Justin Har- ris, Alessandro Sordoni, Philip Bachman, and Kaheer Suleman. NewsQA: A machine comprehension dataset. InProceedings of the 1st Workshop on Representation Learning for NLP, Vancouver, Canada, 2017. Association for Computational Linguistics
2017
-
[34]
Membership inference attacks against in-context learn- ing
Rui Wen, Zheng Li, Michael Backes, and Yang Zhang. Membership inference attacks against in-context learn- ing. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, CCS 2024, pages 3481–3495. ACM, 2024
2024
-
[35]
Privacy-preserving in-context learning for large language models
Tong Wu, Ashwinee Panda, Jiachen T Wang, and Prateek Mittal. Privacy-preserving in-context learning for large language models. InThe Twelfth International Confer- ence on Learning Representations, 2024
2024
-
[36]
Heng Xu, Tianqing Zhu, Lefeng Zhang, Wanlei Zhou, and Philip S. Yu. Machine unlearning: A survey.ACM Comput. Surv., 56(1)
-
[37]
Privacy Risk in Machine Learning: An- alyzing the Connection to Overfitting
Samuel Yeom, Irene Giacomelli, Matt Fredrikson, and Somesh Jha. Privacy Risk in Machine Learning: An- alyzing the Connection to Overfitting . In2018 IEEE 31st Computer Security Foundations Symposium (CSF), pages 268–282, Los Alamitos, CA, USA, July 2018. IEEE Computer Society
2018
-
[38]
Counterfactual memorization in neural language models
Chiyuan Zhang, Daphne Ippolito, Katherine Lee, Matthew Jagielski, Florian Tram`er, and Nicholas Carlini. Counterfactual memorization in neural language models. InProceedings of the 37th International Conference on Neural Information Processing Systems, NeurlPS ’23, 2023
2023
-
[39]
Active example selection for in-context learning
Yiming Zhang, Shi Feng, and Chenhao Tan. Active example selection for in-context learning. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9134–9148. Association for Computational Linguistics, December 2022. APPENDIX A. Example DQA Prompt template For question answering task,we used the following prompt temp...
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.