arxiv: 2604.23711 · v1 · submitted 2026-04-26 · 💻 cs.CR

Recognition: unknown

Spore: Efficient and Training-Free Privacy Extraction Attack on LLMs via Inference-Time Hybrid Probing

Baohan Huang, Cong Zuo, Haibin Zhang, Hang Fu, Licheng Wang, Ruiqing Yue, Sicheng Pan, Yu Cui, Zhuoyu Sun

Pith reviewed 2026-05-08 06:05 UTC · model grok-4.3

classification 💻 cs.CR

keywords privacy extractionLLM agentsinference-time attackblack-box attacktraining-free attackcontextual privacyhybrid probingmemory leakage

0 comments

The pith

Spore extracts private user context from LLM agent memory with one black-box query and higher success than prior attacks while bypassing defenses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Spore, a training-free attack that probes LLM agents at inference time to recover private information stored in their context memory. In black-box mode it recovers the target data from a small candidate set generated by one query; in gray-box mode it uses ranked token outputs for faster and more precise extraction. The method is supported by an information-theoretic argument showing high leakage per query and is shown empirically to exceed existing attacks on frontier models across multiple settings while remaining effective against detection and safety alignments.

Core claim

Spore is a hybrid probing technique that, without any training or white-box access, extracts private information held in LLM agent memory by issuing targeted inference-time queries whose token outputs contain the sensitive content in a recoverable form. In the black-box setting a single query suffices to produce a short candidate list containing the original private data; in the gray-box setting multi-ranked tokens further accelerate and improve recovery. Information-theoretic analysis establishes that each query leaks substantial entropy, and experiments confirm higher attack success rates, lower query cost, and resilience to existing defenses compared with prior schemes.

What carries the argument

Inference-time hybrid probing: a single crafted query (black-box) or ranked-token response (gray-box) that surfaces private memory tokens in the model's output distribution without prior training or model internals.

If this is right

LLM agent deployments expose user context to extraction by a single well-crafted query.
Existing detection and safety-alignment methods do not reliably block this form of inference-time probing.
Privacy risk scales with the amount of personal data retained in agent memory rather than with model size or training regime.
Attack cost remains low even when the target model changes, because the method relies only on output tokens.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Agent designers may need to avoid storing raw user context or to add per-query memory isolation that prevents probe leakage.
Monitoring for atypical query patterns could serve as a practical countermeasure, though the paper does not test such detection.
The same probing principle might apply to other contextual leakage vectors such as tool-use histories or multi-turn conversation summaries.

Load-bearing premise

Private information stored in an LLM agent's memory will appear among the tokens generated by a carefully chosen probe query even when the attacker has no training data or white-box access.

What would settle it

An experiment in which Spore's candidate set consistently excludes the true private string or in which its success rate falls below that of baseline attacks when the model applies stronger safety fine-tuning or output filtering.

Figures

Figures reproduced from arXiv: 2604.23711 by Baohan Huang, Cong Zuo, Haibin Zhang, Hang Fu, Licheng Wang, Ruiqing Yue, Sicheng Pan, Yu Cui, Zhuoyu Sun.

**Figure 1.** Figure 1: Overview of the proposed SPORE and comparison with existing methods. SPORE operates under both black-box and gray-box settings, depending on attacker capabilities. Compared with prior methods, SPORE extracts PII from agent context memory with a high success rate and low cost. 3.2 Threat Model In this paper, we consider two practical settings in real-world deployments: • Black-Box Setting. The adversary can… view at source ↗

**Figure 2.** Figure 2: Overview of our shadow encryption paradigm, view at source ↗

**Figure 3.** Figure 3: Comparison of attack cost between SPORE and prior methods. • Token Consumption. We measure token consumption as the total number of tokens used during one complete attack execution. This includes all prompt tokens and all output tokens. 4 Experiments 4.1 Experimental Setup Models. We select the frontier LLMs that are widely recognized as the most robust in terms of safety, based on existing SOTA research… view at source ↗

**Figure 4.** Figure 4: Experimental results on GPT-5.4 under different temperature and top- view at source ↗

**Figure 5.** Figure 5: Comparison of query complexity between our S view at source ↗

**Figure 6.** Figure 6: Expected number of queries required for SPORE-B across multiple LLMs. performance. ENQ also shows no clear fluctuation across settings. The attack achieves the best performance when top-k is small and temperature is moderate. These results indicate that our attack is robust across different model settings. Finding 2: The attack performance remains stable under low sampling temperatures. 5.3 Theoretical Qu… view at source ↗

read the original abstract

With the wide adoption of personal AI assistants such as OpenClaw, privacy leakage in user interaction contexts with large language model (LLM) agents has become a critical issue. Existing privacy attacks against LLMs primarily target training data, while research on inference-time contextual privacy risks in LLM agent memory remains limited. Moreover, prior methods often incur high attack costs, requiring multiple queries or relying on white-box assumptions, which limits their practicality in real-world deployments. To address these issues, we propose a training-free privacy extraction attack targeting LLM agent memory, which we name \textsc{Spore}. \textsc{Spore} is compatible with both black-box and gray-box settings. In the black-box setting, \textsc{Spore} can efficiently extract a small candidate set via a single query to recover the original private information. In the gray-box setting, \textsc{Spore} allows the attacker to leverage multi-ranked tokens for more accurate and faster privacy extraction. We provide an information-theoretic analysis of \textsc{Spore} and show that it achieves high query efficiency with substantial per query information leakage. Experiments on multiple frontier LLMs show that \textsc{Spore} outperforms attack success rate over existing state-of-the-art (SOTA) schemes. It also maintains low attack cost and remains stable across different model parameter settings. We further evaluate the robustness of \textsc{Spore} against existing defense mechanisms. Our results show that \textsc{Spore} consistently bypasses both detection and strong safety alignment, demonstrating resilient performance in diverse defensive settings and real-world safety threats.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Spore introduces a training-free hybrid probing attack that extracts private LLM agent memory in one black-box query or with ranked tokens in gray-box, backed by info theory and SOTA-beating experiments, but success rests on the probe reliably surfacing hidden data.

read the letter

The main point for you is that this paper gives a practical inference-time attack called Spore on privacy in LLM agent memory. It runs without training, uses a single query in black-box mode to return a small candidate set that recovers the private info, and improves accuracy in gray-box by using multi-ranked tokens. An information-theoretic analysis supports the efficiency and leakage claims, and experiments on frontier models report higher attack success rates than prior schemes at low cost, plus consistent bypass of detection and safety alignment defenses.

Referee Report

3 major / 3 minor

Summary. The manuscript introduces Spore, a training-free inference-time hybrid probing attack for extracting private information stored in LLM agent memory. It supports black-box settings via a single query that returns a small candidate set and gray-box settings that exploit ranked tokens for higher accuracy. The work includes an information-theoretic analysis claiming high query efficiency and substantial per-query leakage, plus experiments on frontier LLMs asserting higher attack success rates than prior SOTA methods, low cost, stability across model sizes, and consistent bypass of detection and safety-alignment defenses.

Significance. If the central claims hold under rigorous validation, the result would be significant for LLM security and privacy research. It identifies a practical, low-cost inference-time vector against contextual agent memory that existing training-data-focused attacks do not address. The training-free design and reported robustness to defenses could directly inform defense design for deployed personal AI assistants. The information-theoretic component supplies independent grounding beyond pure empirics, which is a strength.

major comments (3)

[Method] Method section (hybrid probing description): The core claim that a single inference-time query reliably surfaces private memory content in a small recoverable candidate set (black-box) or top-ranked tokens (gray-box) rests on the untested assumption that the model's token distribution encodes the private datum without prior knowledge of its format or storage. This assumption is load-bearing for both the efficiency claims and the SOTA outperformance; if private data is indirectly stored, multi-turn, or suppressed by alignment, the candidate set misses the target and leakage drops sharply.
[Experiments] Experiments section: The reported outperformance in attack success rate and robustness to defenses lacks sufficient controls and reporting. No details are given on how private information is injected into agent memory (direct statements vs. complex multi-turn), the exact composition of test cases, number of trials per setting, or statistical tests for the claimed superiority. Without these, it is impossible to determine whether results generalize or whether the hybrid probe itself activates suppression mechanisms.
[Analysis] Information-theoretic analysis: The analysis asserts high per-query information leakage and efficiency, yet provides no explicit derivation linking the hybrid probing strategy to concrete mutual-information or entropy-reduction bounds. The claims therefore risk being general statements rather than tight characterizations of the specific attack, weakening the grounding for the efficiency and leakage assertions.

minor comments (3)

[Introduction] The distinction between black-box and gray-box threat models could be stated more crisply in the introduction and abstract, including the precise attacker capabilities assumed in each.
[Experiments] Tables reporting attack success rates should include variance or confidence intervals across runs and models to support the stability claim.
[Related Work] A few sentences on related inference-time privacy work (e.g., prompt-injection or memory-extraction baselines) would help situate the novelty.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and outline planned revisions to improve the manuscript's clarity, rigor, and completeness.

read point-by-point responses

Referee: [Method] Method section (hybrid probing description): The core claim that a single inference-time query reliably surfaces private memory content in a small recoverable candidate set (black-box) or top-ranked tokens (gray-box) rests on the untested assumption that the model's token distribution encodes the private datum without prior knowledge of its format or storage. This assumption is load-bearing for both the efficiency claims and the SOTA outperformance; if private data is indirectly stored, multi-turn, or suppressed by alignment, the candidate set misses the target and leakage drops sharply.

Authors: We acknowledge that the effectiveness of hybrid probing depends on the model's token distribution reflecting stored private data. The attack is motivated by empirical observations that LLMs often surface contextual memory in inference-time probabilities, and our experiments on frontier models with direct memory injection achieve high success rates, supporting practical validity. To address the concern, we will add a new subsection in the Method section explicitly stating the assumptions, discussing limitations for indirect/multi-turn storage or strong suppression, and outlining adaptation strategies. We will also incorporate additional experiments with varied injection methods in the revised version. revision: partial
Referee: [Experiments] Experiments section: The reported outperformance in attack success rate and robustness to defenses lacks sufficient controls and reporting. No details are given on how private information is injected into agent memory (direct statements vs. complex multi-turn), the exact composition of test cases, number of trials per setting, or statistical tests for the claimed superiority. Without these, it is impossible to determine whether results generalize or whether the hybrid probe itself activates suppression mechanisms.

Authors: We agree that expanded reporting and controls are essential. In the revised Experiments section we will detail: injection methods (primarily direct statements in the agent context, with new multi-turn examples added); test case composition (50 private facts spanning categories such as identifiers and sensitive attributes); trial counts (100 independent runs per model/setting); and statistical tests (means with standard deviations plus t-tests with p < 0.05 confirming superiority over baselines). These additions will also report failure modes to evaluate potential suppression activation by the probe. revision: yes
Referee: [Analysis] Information-theoretic analysis: The analysis asserts high per-query information leakage and efficiency, yet provides no explicit derivation linking the hybrid probing strategy to concrete mutual-information or entropy-reduction bounds. The claims therefore risk being general statements rather than tight characterizations of the specific attack, weakening the grounding for the efficiency and leakage assertions.

Authors: We appreciate the call for tighter formalization. The existing analysis offers high-level entropy-reduction arguments; we will add an explicit derivation in a new appendix (or expanded main-text subsection) that defines mutual information I(Private Data; Probe Response) and shows how the hybrid strategy produces greater per-query entropy reduction than prior attacks. This will directly connect the bounds to the probing mechanism and strengthen the efficiency claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical attack with independent analysis

full rationale

The paper presents Spore as a training-free inference-time probing method evaluated empirically on frontier LLMs, with success rates compared to external SOTA baselines and robustness tested against existing defenses. The information-theoretic analysis of query efficiency and per-query leakage is derived from standard entropy measures applied to the observed token distributions, without reducing to fitted parameters from the attack results themselves or self-referential definitions. No load-bearing steps invoke self-citations for uniqueness theorems, smuggle ansatzes, or rename known results as novel derivations. The central claims rest on experimental measurements and external comparisons rather than tautological reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard assumptions about LLM behavior and introduces Spore as a new attack technique supported by information theory.

axioms (1)

domain assumption LLM inference reveals information about stored context through output tokens and probabilities
Core to the probing method described.

pith-pipeline@v0.9.0 · 5615 in / 1267 out tokens · 78434 ms · 2026-05-08T06:05:28.628353+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

8 extracted references · 6 canonical work pages · 2 internal anchors

[1]

Wassim Bouaziz, Mathurin VIDEAU, Nicolas Usunier, and El-Mahdi El-Mhamdi

On the impossibility of separating intelli- gence from judgment: The computational intractabil- ity of filtering for ai alignment.arXiv preprint arXiv:2507.07341. Wassim Bouaziz, Mathurin VIDEAU, Nicolas Usunier, and El-Mahdi El-Mhamdi. 2026. Winter soldier: Backdooring language models at pre-training with indirect data poisoning. InThe Fourteenth Interna...

work page arXiv 2026
[2]

Evaluating Large Language Models Trained on Code

Evaluating large language models trained on code.arXiv preprint arXiv:2107.03374. Yulin Chen, Haoran Li, Yuan Sui, Yufei He, Yue Liu, Yangqiu Song, and Bryan Hooi. 2025a. Can indirect prompt injection attacks be detected and removed? InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 18...

work page internal anchor Pith review arXiv 2025
[3]

In2023 IEEE Symposium on Security and Privacy (SP), pages 1401–1418

Rethinking searchable symmetric encryption. In2023 IEEE Symposium on Security and Privacy (SP), pages 1401–1418. Changzhou Han, Zehang Deng, Wanlun Ma, Xiaogang Zhu, Minhui Xue, Tianqing Zhu, Sheng Wen, and Yang Xiang. 2025. Codebreaker: Dynamic extraction attacks on code language models. In2025 IEEE Symposium on Security and Privacy (SP), pages 559– 575....

2025
[4]

Defending against indirect prompt injection attacks with spotlighting

Towards label-only membership inference at- tack against pre-trained large language models. In Proceedings of the 34th USENIX Conference on Se- curity Symposium, SEC ’25, USA. USENIX Associ- ation. Keegan Hines, Gary Lopez, Matthew Hall, Federico Zarfati, Yonatan Zunger, and Emre Kiciman. 2024. Defending against indirect prompt injection attacks with spot...

work page arXiv 2024
[5]

A safety report on gpt-5.2, gemini 3 pro, qwen3-vl, grok 4.1 fast, nano banana pro, and seedream 4.5, 2026

Analyzing leakage of personally identifiable information in language models. In2023 IEEE Sym- posium on Security and Privacy (SP), pages 346–363. Xingjun Ma, Yixu Wang, Hengyuan Xu, Yutao Wu, Yi- fan Ding, Yunhan Zhao, Zilong Wang, Jiabin Hua, Ming Wen, Jianan Liu, Ranjie Duan, Yifeng Gao, Yingshui Tan, Yunhao Chen, Hui Xue, Xin Wang, Wei Cheng, Jingjing ...

work page arXiv 2026
[6]

OpenAI GPT-5 System Card

CIMemories: A compositional benchmark for contextual integrity in LLMs. InThe Fourteenth In- ternational Conference on Learning Representations. Milad Nasr, Javier Rando, Nicholas Carlini, Jonathan Hayase, Matthew Jagielski, A. Feder Cooper, Daphne Ippolito, Christopher A. Choquette-Choo, Florian Tramèr, and Katherine Lee. 2025. Scalable extraction of tra...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[7]

Internal safety collapse in frontier large language models.arXiv preprint arXiv:2603.23509, 2026

Membership inference attacks against in- context learning. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communica- tions Security, CCS ’24, page 3481–3495, New York, NY , USA. Association for Computing Machinery. Yutao Wu, Xiao Liu, Yifeng Gao, Xiang Zheng, Hanxun Huang, Yige Li, Cong Wang, Bo Li, Xingjun Ma, and Yu-Gang Jiang. 2026....

work page arXiv 2024
[8]

C Response from Detector Detection Results of an Attack Case

Bag of tricks for training data extraction from language models. InProceedings of the 40th Interna- tional Conference on Machine Learning, ICML’23. JMLR.org. Xiao Zhan, Juan Carlos Carrillo, William Seymour, and Jose Such. 2025. Malicious llm-based conversational 11 ai makes users reveal personal information. InPro- ceedings of the 34th USENIX Conference ...

2025