TRAP: Benchmark for Task-completion and Resistance to Active Privacy-extraction

Baek Seong-Eun; Moon Ye-Bin; Nam Hyeon-Woo; Tae-Hyun Oh; Yejin Yeo

arxiv: 2606.18996 · v2 · pith:74BMPPY4new · submitted 2026-06-17 · 💻 cs.CR · cs.AI

TRAP: Benchmark for Task-completion and Resistance to Active Privacy-extraction

Moon Ye-Bin , Nam Hyeon-Woo , Baek Seong-Eun , Yejin Yeo , Tae-Hyun Oh This is my paper

Pith reviewed 2026-06-26 20:35 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords privacy leakageAI agentstask completionbenchmarkslarge language modelssoftmax modelsdefense mechanismsprivate information

0 comments

The pith

No soft-constraint defense lets softmax models complete private-data tasks with zero leakage risk.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the TRAP benchmark to test whether agents can use private information from documents to finish tasks while resisting natural-language attacks that try to extract the same information. Evaluations across 22 models show non-trivial leakage in every family, with stronger instruction-following linked to higher leakage rates. Prompt-based defenses lower leakage but also reduce task accuracy, and the authors prove this trade-off is unavoidable: for any softmax-based model, no soft constraint can deliver both high task success and zero leakage probability. They therefore replace private fields with hash keys before the model sees them, which largely blocks leakage while preserving accuracy.

Core claim

For any softmax-based model, no soft-constraint defense, e.g., prompt-based defenses, can jointly achieve high task success with zero leakage probability. Structural private field isolation replaces private fields with hash keys before they reach the model and largely prevents leakage while keeping task accuracy.

What carries the argument

The impossibility result for soft-constraint defenses on softmax-based models, which demonstrates that any mechanism relying on output probabilities cannot enforce zero leakage without sacrificing task performance.

If this is right

All tested model families exhibit non-trivial leakage under attack queries.
Instruction-following ability correlates positively with leakage rate.
Prompt optimization cannot escape the accuracy-leakage trade-off.
Structural hash-key isolation maintains task accuracy while largely eliminating leakage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Agent systems handling routine private data will require input-level isolation rather than output-level prompting.
The TRAP scenarios could be extended to measure leakage under multi-turn or tool-chaining attacks.
Hard architectural separation of private fields may become a standard requirement for agents in document workflows.

Load-bearing premise

The benchmark's attack queries and scenarios are representative of real deployment risks, and the impossibility result applies to the exact output mechanisms of the evaluated models.

What would settle it

A single soft-constraint defense, such as an optimized prompt, that achieves both near-perfect task accuracy and zero leakage probability across the TRAP scenarios would falsify the impossibility claim.

Figures

Figures reproduced from arXiv: 2606.18996 by Baek Seong-Eun, Moon Ye-Bin, Nam Hyeon-Woo, Tae-Hyun Oh, Yejin Yeo.

**Figure 2.** Figure 2: Task accuracy (%, ↑) and privacy score (%, ↑) under the base system prompt, four defense variants, and jointly optimized system prompt. Stronger privacy directives shift models toward higher privacy score but lower task accuracy, tracing a consistent trade-off curve that no hand-crafted variant escapes. We ask whether augmenting the system prompt with a privacy-aware instruction is sufficient to mitigate… view at source ↗

**Figure 3.** Figure 3: Change in privacy score (∆Privacy = PrivacyVariant − PrivacyDirect) for each attack variant (Ignore, Important, Roleplay) relative to the Direct baseline. Positive ∆ indicates that the adversarial phrasing raises privacy score, i.e., the model treats attack language as an alarm signal. Proprietary models (left 4) dominate the positive-∆ region, while open-source models generally show smaller ∆ [PITH_FULL_… view at source ↗

**Figure 4.** Figure 4: Illustration of private field isolation. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Task accuracy (%, ↑) and privacy score (%, ↑) for the baseline (no masking) and three masking configurations (Oracle, Practical, Auto). Results [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Dataset statistics of TRAP. (a) Document type and modality distribution across 500 samples (50 per type). (b) Number of unique tools and private/non-private fields per document type. (c) Distribution of required tool arguments per sample, broken down by document type (min–max range with mean) [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Examples of private field masking. Masked regions are replaced with a private key (shown [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

read the original abstract

Agents are increasingly deployed in document-intensive workflows where sensitive private information is not an edge case but a routine input, e.g., an agent booking a flight needs passport numbers. In such settings, the agent must use private information to complete tasks accurately while never exposing it in its responses, because it cannot verify who is actually at the keyboard. These two obligations are in fundamental tension. A model capable enough to use private information for task completion can, by the same capability, be induced to reveal it. To evaluate the trade-off of task accuracy and privacy leakage, we introduce Task-completion and Resistance to Active Privacy-extraction (TRAP). Each scenario includes a document containing private information, a task query that requires the agent to invoke the correct tool using private fields, and an attack query that attempts to elicit the same information in natural language. Evaluating 22 models spanning frontier proprietary and open-source models at multiple scales, we find that all model families exhibit non-trivial leakage, and that instruction-following ability correlates with leakage rate. Existing prompt-based defenses reduce leakage but at significant cost to task accuracy. Prompt optimization fails to escape this trade-off. We demonstrate that this failure is not incidental. For any softmax-based model, no soft-constraint defense, e.g., prompt-based defenses, can jointly achieve high task success with zero leakage probability. Motivated by this impossibility result, we propose structural private field isolation, which replaces private fields with hash keys before they reach the model. This approach largely prevents leakage while keeping task accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TRAP shows prompt defenses hit a hard task-leakage trade-off on 22 models and proposes structural isolation, but the softmax impossibility claim is stated without visible proof steps.

read the letter

The main point is that this paper documents a consistent leakage problem when agents must use private data to finish tasks, finds that prompt defenses always cost accuracy, and argues that softmax models make zero-leakage soft defenses impossible in principle.

They define TRAP scenarios that pair a private document, a task query requiring the right tool call with private fields, and a separate attack query that tries to pull the same fields out in plain text. Across 22 models they report non-trivial leakage in every family and a positive correlation between instruction-following strength and leakage rate. Prompt defenses and prompt optimization both reduce leakage but degrade task success. The authors then claim this trade-off is not fixable by better prompts because, for any softmax model, soft constraints cannot drive leakage probability exactly to zero while preserving high task accuracy. They therefore suggest replacing private fields with hash keys before the model ever sees them.

The evaluation on a wide range of models is useful and the real workflow examples are concrete. The structural isolation idea follows directly from the observed problem.

The soft spot is the impossibility result. The abstract ties it to softmax always assigning positive probability, yet gives no theorem statement, list of assumptions on what counts as a soft constraint, or derivation. Without those details it is difficult to know whether the claim covers sampling methods, few-shot examples, or output restrictions. If the full paper supplies a tight argument, the claim strengthens; otherwise it remains an informal observation.

The work is aimed at researchers building or auditing agent systems that handle routine private inputs. The benchmark and the empirical trade-off numbers are worth examining even if the theory needs tightening. It should go to peer review because the problem is practical and the experiments are broad enough to merit referee scrutiny.

Referee Report

1 major / 0 minor

Summary. The paper introduces the TRAP benchmark consisting of document scenarios with private fields, task queries requiring tool use of those fields, and attack queries attempting natural-language elicitation of the same fields. It evaluates 22 models across proprietary and open-source families, reports non-trivial leakage rates that correlate with instruction-following ability, shows that prompt-based defenses reduce leakage at the expense of task accuracy, and claims that prompt optimization cannot escape the trade-off. The paper asserts an impossibility result: for any softmax-based model, no soft-constraint defense can achieve both high task success and zero leakage probability. Motivated by this, it proposes structural private-field isolation via hash-key replacement before the model sees the input.

Significance. If the impossibility result is formalized and the empirical trade-off generalizes, the work identifies a structural limitation of prompt-only privacy mechanisms in capable agents and supplies a concrete architectural alternative (hash isolation) that largely preserves task accuracy. The evaluation on 22 models across scales and families, together with the explicit proposal of a non-prompt mitigation, constitutes a useful empirical and design contribution to agent privacy research.

major comments (1)

[Abstract / impossibility section] Abstract and the section presenting the impossibility result: the central claim that 'for any softmax-based model, no soft-constraint defense... can jointly achieve high task success with zero leakage probability' is stated without a theorem, without an explicit list of assumptions on the class of soft constraints (e.g., whether few-shot examples, output-format restrictions, or per-query prompt variation are included), and without derivation steps showing why finite prompt adjustments cannot drive leakage exactly to zero while preserving task performance. Because this result is used to motivate the structural-isolation proposal, the absence of the formal argument is load-bearing.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and for highlighting the need to strengthen the presentation of the impossibility result. We address the major comment below and will revise the manuscript to incorporate a formal statement of the result.

read point-by-point responses

Referee: [Abstract / impossibility section] Abstract and the section presenting the impossibility result: the central claim that 'for any softmax-based model, no soft-constraint defense... can jointly achieve high task success with zero leakage probability' is stated without a theorem, without an explicit list of assumptions on the class of soft constraints (e.g., whether few-shot examples, output-format restrictions, or per-query prompt variation are included), and without derivation steps showing why finite prompt adjustments cannot drive leakage exactly to zero while preserving task performance. Because this result is used to motivate the structural-isolation proposal, the absence of the formal argument is load-bearing.

Authors: We agree that the impossibility claim would be strengthened by an explicit theorem. The current manuscript presents the result as an informal demonstration based on the fundamental properties of softmax outputs (non-zero probability mass on all tokens) combined with the empirical observation that capable models can be induced to reveal private fields. In the revised version we will add a dedicated subsection containing: (1) a formal theorem statement, (2) an explicit list of assumptions on the class of soft constraints (including few-shot exemplars, output-format restrictions, system-prompt variations, and per-query prompt modifications), and (3) derivation steps showing that, under these assumptions, no finite collection of soft constraints can drive leakage probability to exactly zero while preserving non-trivial task success for any softmax-based model. This formalization will directly support the motivation for structural isolation. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents its central impossibility result as following directly from properties of softmax-based models (positive token probabilities cannot be driven exactly to zero by finite soft constraints without harming task performance). No equations, fitted parameters, or self-citations are shown that would reduce this claim to its own inputs by construction. The benchmark construction, empirical evaluation on 22 models, and structural isolation proposal are motivated by but logically independent of the stated result. No load-bearing step matches any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The impossibility result rests on the domain assumption that models are softmax-based; the benchmark itself is a new constructed entity without independent evidence outside the paper.

axioms (1)

domain assumption Language models output tokens via softmax over logits
Invoked to establish the impossibility result for any soft-constraint defense

invented entities (1)

TRAP benchmark scenarios no independent evidence
purpose: To measure task accuracy versus privacy leakage under active attacks
Newly defined test cases combining private documents, task queries, and attack queries

pith-pipeline@v0.9.1-grok · 5823 in / 1279 out tokens · 22118 ms · 2026-06-26T20:35:13.376216+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 13 canonical work pages · 8 internal anchors

[1]

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Abdelrahman Abouelenin, Atabak Ashfaq, Adam Atkinson, Hany Awadalla, Nguyen Bach, Jianmin Bao, Alon Benhaim, Martin Cai, Vishrav Chaudhary, Congcong Chen, et al. Phi-4-mini technical report: Compact yet powerful multimodal language models via mixture-of-loras.arXiv preprint arXiv:2503.01743, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

Simple prompt injection attacks can leak personal data observed by llm agents during task execution,

Meysam Alizadeh, Zeynab Samei, Daria Stetsenko, and Fabrizio Gilardi. Simple prompt injection attacks can leak personal data observed by llm agents during task execution.arXiv preprint arXiv:2506.01055, 2025

work page arXiv 2025
[3]

Bowman, Ethan Perez, Roger Baker Grosse, and David Duvenaud

Cem Anil, Esin DURMUS, Nina Rimsky, Mrinank Sharma, Joe Benton, Sandipan Kundu, Joshua Batson, Meg Tong, Jesse Mu, Daniel J Ford, Francesco Mosconi, Rajashree Agrawal, Rylan Schaeffer, Naomi Bashkansky, Samuel Svenningsen, Mike Lambert, Ansh Radhakrishnan, Carson Denison, Evan J Hubinger, Yuntao Bai, Trenton Bricken, Timothy Maxwell, Nicholas Schiefer, Ja...

2024
[4]

Qwen3-VL Technical Report

Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-vl technical report.arXiv preprint arXiv:2511.21631, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[5]

Salt: Steering activations towards leakage-free thinking in chain of thought

Shourya Batra, Pierce Tillman, Samarth Gaggar, Shashank Kesineni, Kevin Zhu, Sunishchal Dev, Ashwinee Panda, Vasu Sharma, and Maheep Chaudhary. Salt: Steering activations towards leakage-free thinking in chain of thought. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

2025
[6]

Unveiling privacy risks in multi- modal large language models: Task-specific vulnerabilities and mitigation challenges

Tiejin Chen, Pingzhi Li, Kaixiong Zhou, Tianlong Chen, and Hua Wei. Unveiling privacy risks in multi- modal large language models: Task-specific vulnerabilities and mitigation challenges. InFindings of ACL, 2025

2025
[7]

Paddleocr 3.0 technical report, 2025

Cheng Cui, Ting Sun, Manhui Lin, Tingquan Gao, Yubo Zhang, Jiaxuan Liu, Xueqing Wang, Zelun Zhang, Changda Zhou, Hongen Liu, Yue Zhang, Wenyu Lv, Kui Huang, Yichao Zhang, Jing Zhang, Jun Zhang, Yi Liu, Dianhai Yu, and Yanjun Ma. Paddleocr 3.0 technical report, 2025

2025
[8]

Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents

Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents. In Advances in Neural Information Processing Systems (NeurIPS), 2024

2024
[9]

ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

Team Glm, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Dan Zhang, Diego Rojas, Guanyu Feng, Hanlin Zhao, et al. Chatglm: A family of large language models from glm-130b to glm-4 all tools. arXiv preprint arXiv:2406.12793, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[10]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al- Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[11]

GPT-4o System Card

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card.arXiv preprint arXiv:2410.21276, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[12]

Magpie: a benchmark for multi-agent contextual privacy evaluation.arXiv preprint arXiv:2510.15186, 2025

Gurusha Juneja, Jayanth Naga Sai Pasupulati, Alon Albalak, Wenyue Hua, and William Yang Wang. Magpie: a benchmark for multi-agent contextual privacy evaluation.arXiv preprint arXiv:2510.15186, 2025

work page arXiv 2025
[13]

Privacybench: A conversational benchmark for evaluating privacy in personalized ai.arXiv preprint arXiv:2512.24848, 2025

Srija Mukhopadhyay, Sathwik Reddy, Shruthi Muthukumar, Jisun An, and Ponnurangam Kumaraguru. Privacybench: A conversational benchmark for evaluating privacy in personalized ai.arXiv preprint arXiv:2512.24848, 2025. 10

work page arXiv 2025
[14]

docxpand.https://github.com/QuickSign/docxpand, 2024

QuickSign. docxpand.https://github.com/QuickSign/docxpand, 2024

2024
[15]

Privacylens: Evaluating privacy norm awareness of language models in action

Yijia Shao, Tianshi Li, Weiyan Shi, Yanchen Liu, and Diyi Yang. Privacylens: Evaluating privacy norm awareness of language models in action. InAdvances in Neural Information Processing Systems (NeurIPS), 2024

2024
[16]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[17]

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Weiyun Wang, Zhangwei Gao, Lixin Gu, Hengjun Pu, Long Cui, Xingguang Wei, Zhaoyang Liu, Linglin Jing, Shenglong Ye, Jie Shao, et al. Internvl3. 5: Advancing open-source multimodal models in versatility, reasoning, and efficiency.arXiv preprint arXiv:2508.18265, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[18]

Contextualized privacy defense for llm agents.arXiv preprint arXiv:2603.02983, 2026

Yule Wen, Yanzhe Zhang, Jianxun Lian, Xiaoyuan Yi, Xing Xie, and Diyi Yang. Contextualized privacy defense for llm agents.arXiv preprint arXiv:2603.02983, 2026

work page arXiv 2026
[19]

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Zhiyu Wu, Xiaokang Chen, Zizheng Pan, Xingchao Liu, Wen Liu, Damai Dai, Huazuo Gao, Yiyang Ma, Chengyue Wu, Bingxuan Wang, et al. Deepseek-vl2: Mixture-of-experts vision-language models for advanced multimodal understanding.arXiv preprint arXiv:2412.10302, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[20]

Optimizing generative ai by backpropagating language model feedback.Nature, 639:609–616, 2025

Mert Yuksekgonul, Federico Bianchi, Joseph Boen, Sheng Liu, Pan Lu, Zhi Huang, Carlos Guestrin, and James Zou. Optimizing generative ai by backpropagating language model feedback.Nature, 639:609–616, 2025

2025
[21]

Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents

Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents. InFindings of ACL, 2024

2024
[22]

Multi-pa: A multi-perspective benchmark on privacy assessment for large vision-language models.arXiv preprint arXiv:2412.19496, 2024

Jie Zhang, Xiangkui Cao, Zhouyu Han, Shiguang Shan, and Xilin Chen. Multi-pa: A multi-perspective benchmark on privacy assessment for large vision-language models.arXiv preprint arXiv:2412.19496, 2024

work page arXiv 2024
[23]

value", arg2=

Arman Zharmagambetov, Chuan Guo, Ivan Evtimov, Maya Pavlova, Ruslan Salakhutdinov, and Kamalika Chaudhuri. Agentdam: Privacy leakage evaluation for autonomous web agents. InAdvances in Neural Information Processing Systems (NeurIPS), 2025. 11 Contents A. Theoretical analysis and proofs B. Data statistics of TRAP C. Private fields per document type D. Auth...

2025

[1] [1]

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Abdelrahman Abouelenin, Atabak Ashfaq, Adam Atkinson, Hany Awadalla, Nguyen Bach, Jianmin Bao, Alon Benhaim, Martin Cai, Vishrav Chaudhary, Congcong Chen, et al. Phi-4-mini technical report: Compact yet powerful multimodal language models via mixture-of-loras.arXiv preprint arXiv:2503.01743, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[2] [2]

Simple prompt injection attacks can leak personal data observed by llm agents during task execution,

Meysam Alizadeh, Zeynab Samei, Daria Stetsenko, and Fabrizio Gilardi. Simple prompt injection attacks can leak personal data observed by llm agents during task execution.arXiv preprint arXiv:2506.01055, 2025

work page arXiv 2025

[3] [3]

Bowman, Ethan Perez, Roger Baker Grosse, and David Duvenaud

Cem Anil, Esin DURMUS, Nina Rimsky, Mrinank Sharma, Joe Benton, Sandipan Kundu, Joshua Batson, Meg Tong, Jesse Mu, Daniel J Ford, Francesco Mosconi, Rajashree Agrawal, Rylan Schaeffer, Naomi Bashkansky, Samuel Svenningsen, Mike Lambert, Ansh Radhakrishnan, Carson Denison, Evan J Hubinger, Yuntao Bai, Trenton Bricken, Timothy Maxwell, Nicholas Schiefer, Ja...

2024

[4] [4]

Qwen3-VL Technical Report

Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-vl technical report.arXiv preprint arXiv:2511.21631, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[5] [5]

Salt: Steering activations towards leakage-free thinking in chain of thought

Shourya Batra, Pierce Tillman, Samarth Gaggar, Shashank Kesineni, Kevin Zhu, Sunishchal Dev, Ashwinee Panda, Vasu Sharma, and Maheep Chaudhary. Salt: Steering activations towards leakage-free thinking in chain of thought. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

2025

[6] [6]

Unveiling privacy risks in multi- modal large language models: Task-specific vulnerabilities and mitigation challenges

Tiejin Chen, Pingzhi Li, Kaixiong Zhou, Tianlong Chen, and Hua Wei. Unveiling privacy risks in multi- modal large language models: Task-specific vulnerabilities and mitigation challenges. InFindings of ACL, 2025

2025

[7] [7]

Paddleocr 3.0 technical report, 2025

Cheng Cui, Ting Sun, Manhui Lin, Tingquan Gao, Yubo Zhang, Jiaxuan Liu, Xueqing Wang, Zelun Zhang, Changda Zhou, Hongen Liu, Yue Zhang, Wenyu Lv, Kui Huang, Yichao Zhang, Jing Zhang, Jun Zhang, Yi Liu, Dianhai Yu, and Yanjun Ma. Paddleocr 3.0 technical report, 2025

2025

[8] [8]

Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents

Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents. In Advances in Neural Information Processing Systems (NeurIPS), 2024

2024

[9] [9]

ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

Team Glm, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Dan Zhang, Diego Rojas, Guanyu Feng, Hanlin Zhao, et al. Chatglm: A family of large language models from glm-130b to glm-4 all tools. arXiv preprint arXiv:2406.12793, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[10] [10]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al- Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[11] [11]

GPT-4o System Card

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card.arXiv preprint arXiv:2410.21276, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[12] [12]

Magpie: a benchmark for multi-agent contextual privacy evaluation.arXiv preprint arXiv:2510.15186, 2025

Gurusha Juneja, Jayanth Naga Sai Pasupulati, Alon Albalak, Wenyue Hua, and William Yang Wang. Magpie: a benchmark for multi-agent contextual privacy evaluation.arXiv preprint arXiv:2510.15186, 2025

work page arXiv 2025

[13] [13]

Privacybench: A conversational benchmark for evaluating privacy in personalized ai.arXiv preprint arXiv:2512.24848, 2025

Srija Mukhopadhyay, Sathwik Reddy, Shruthi Muthukumar, Jisun An, and Ponnurangam Kumaraguru. Privacybench: A conversational benchmark for evaluating privacy in personalized ai.arXiv preprint arXiv:2512.24848, 2025. 10

work page arXiv 2025

[14] [14]

docxpand.https://github.com/QuickSign/docxpand, 2024

QuickSign. docxpand.https://github.com/QuickSign/docxpand, 2024

2024

[15] [15]

Privacylens: Evaluating privacy norm awareness of language models in action

Yijia Shao, Tianshi Li, Weiyan Shi, Yanchen Liu, and Diyi Yang. Privacylens: Evaluating privacy norm awareness of language models in action. InAdvances in Neural Information Processing Systems (NeurIPS), 2024

2024

[16] [16]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[17] [17]

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Weiyun Wang, Zhangwei Gao, Lixin Gu, Hengjun Pu, Long Cui, Xingguang Wei, Zhaoyang Liu, Linglin Jing, Shenglong Ye, Jie Shao, et al. Internvl3. 5: Advancing open-source multimodal models in versatility, reasoning, and efficiency.arXiv preprint arXiv:2508.18265, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[18] [18]

Contextualized privacy defense for llm agents.arXiv preprint arXiv:2603.02983, 2026

Yule Wen, Yanzhe Zhang, Jianxun Lian, Xiaoyuan Yi, Xing Xie, and Diyi Yang. Contextualized privacy defense for llm agents.arXiv preprint arXiv:2603.02983, 2026

work page arXiv 2026

[19] [19]

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Zhiyu Wu, Xiaokang Chen, Zizheng Pan, Xingchao Liu, Wen Liu, Damai Dai, Huazuo Gao, Yiyang Ma, Chengyue Wu, Bingxuan Wang, et al. Deepseek-vl2: Mixture-of-experts vision-language models for advanced multimodal understanding.arXiv preprint arXiv:2412.10302, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[20] [20]

Optimizing generative ai by backpropagating language model feedback.Nature, 639:609–616, 2025

Mert Yuksekgonul, Federico Bianchi, Joseph Boen, Sheng Liu, Pan Lu, Zhi Huang, Carlos Guestrin, and James Zou. Optimizing generative ai by backpropagating language model feedback.Nature, 639:609–616, 2025

2025

[21] [21]

Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents

Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents. InFindings of ACL, 2024

2024

[22] [22]

Multi-pa: A multi-perspective benchmark on privacy assessment for large vision-language models.arXiv preprint arXiv:2412.19496, 2024

Jie Zhang, Xiangkui Cao, Zhouyu Han, Shiguang Shan, and Xilin Chen. Multi-pa: A multi-perspective benchmark on privacy assessment for large vision-language models.arXiv preprint arXiv:2412.19496, 2024

work page arXiv 2024

[23] [23]

value", arg2=

Arman Zharmagambetov, Chuan Guo, Ivan Evtimov, Maya Pavlova, Ruslan Salakhutdinov, and Kamalika Chaudhuri. Agentdam: Privacy leakage evaluation for autonomous web agents. InAdvances in Neural Information Processing Systems (NeurIPS), 2025. 11 Contents A. Theoretical analysis and proofs B. Data statistics of TRAP C. Private fields per document type D. Auth...

2025