PII can be reconstructed from SFT models via prefix attacks, with the new COVA algorithm improving success rates and leakage varying by attacker knowledge and PII type.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Indistinguishability-based privacy is incomparable to extractability in LLMs, and a new (l, b)-inextractability definition with rank-based bounds provides a tighter measure of extraction risk than prior proxies.
New canary crafting via greedy influence-based init and bilevel optimization for diversity in embedding space yields stronger one-run privacy leakage estimates at lower cost.
citing papers explorer
-
Reconstruction of Personally Identifiable Information from Supervised Finetuned Models
PII can be reconstructed from SFT models via prefix attacks, with the new COVA algorithm improving success rates and leakage varying by attacker knowledge and PII type.
-
Beyond Indistinguishability: Measuring Extraction Risk in LLM APIs
Indistinguishability-based privacy is incomparable to extractability in LLMs, and a new (l, b)-inextractability definition with rank-based bounds provides a tighter measure of extraction risk than prior proxies.
-
Detectability in Diversity: Improved Canary Crafting for Privacy Auditing in One Run
New canary crafting via greedy influence-based init and bilevel optimization for diversity in embedding space yields stronger one-run privacy leakage estimates at lower cost.