Pappas, Florian Tramèr, Hamed Hassani, and Eric Wong

Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J · 2024 · DOI 10.52202/079017-1745

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

representative citing papers

Off-Distribution Voices: Fanfiction Subgenres as Universal Vernacular Jailbreaks for Aligned LLMs

cs.CL · 2026-06-03 · unverdicted · novelty 7.0

Fanfiction subgenres from AO3 function as universal register-based jailbreaks, raising mean attack success rate from 0.278 to 0.731 across eight aligned LLMs on HarmBench and JailbreakBench.

Quality Is Not a Safety Proxy Under Quantization

cs.LG · 2026-06-08 · conditional · novelty 6.0

Across 51 quantized checkpoints, quality metrics fail to predict safety drops in 36 pairings and 10 hidden-danger cases, while a new RTSI screen routes all 10 dangerous rows to testing at matched bucket size.

Silent Failures in Federated Personalization of Foundation Models

cs.LG · 2026-05-31 · unverdicted · novelty 6.0

Federated personalization of foundation models creates hard-to-detect trustworthiness failures due to privacy constraints, and existing benchmarks cannot adequately evaluate them.

Prompt Governance? On Governing Technologies Governed by Natural Language

cs.CY · 2026-04-29 · unverdicted · novelty 4.0

Literature on system prompts for AI shows fragmented and contradictory claims that complicate policy efforts to use them as reliable governance mechanisms.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Quality Is Not a Safety Proxy Under Quantization cs.LG · 2026-06-08 · conditional · none · ref 3
Across 51 quantized checkpoints, quality metrics fail to predict safety drops in 36 pairings and 10 hidden-danger cases, while a new RTSI screen routes all 10 dangerous rows to testing at matched bucket size.
Silent Failures in Federated Personalization of Foundation Models cs.LG · 2026-05-31 · unverdicted · none · ref 8
Federated personalization of foundation models creates hard-to-detect trustworthiness failures due to privacy constraints, and existing benchmarks cannot adequately evaluate them.

Pappas, Florian Tramèr, Hamed Hassani, and Eric Wong

fields

years

verdicts

representative citing papers

citing papers explorer