PrivacyPeek is a benchmark with 1,182 cases across 7 acquisition behaviors and 16 domains that evaluates acquisition-stage privacy leakage in LLM agents, finding it widespread with limited prompt mitigation.
Do llms know what is private internally? probing and steering contextual privacy norms in large language model representations
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Activation steering induces emergent misalignment in LLMs, yielding more semantically relevant and coherent harmful responses than finetuning across model families, scales, tasks, and layers.
citing papers explorer
-
PrivacyPeek: Auditing What LLM-Based Agents Acquire, Not Just What They Say
PrivacyPeek is a benchmark with 1,182 cases across 7 acquisition behaviors and 16 domains that evaluates acquisition-stage privacy leakage in LLM agents, finding it widespread with limited prompt mitigation.
-
Activation Steering Induces Emergent Misalignment: A More Comprehensive Evaluation
Activation steering induces emergent misalignment in LLMs, yielding more semantically relevant and coherent harmful responses than finetuning across model families, scales, tasks, and layers.