pith. sign in

Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

fields

cs.CL 5 cs.CR 1

years

2026 6

verdicts

UNVERDICTED 6

representative citing papers

Boiling the Frog: A Multi-Turn Benchmark for Agentic Safety

cs.CL · 2026-05-21 · unverdicted · novelty 7.0 · 2 refs

Boiling the Frog is a new stateful multi-turn benchmark that finds an aggregate 44.4% strict attack success rate for incremental safety violations across nine AI models, with rates ranging from 20.5% to 92.9%.

A Theoretical Game of Attacks via Compositional Skills

cs.CL · 2026-05-01 · unverdicted · novelty 6.0

A theoretical attacker-defender game in LLM adversarial prompting yields a best-response attack related to existing methods, reveals attacker advantages at equilibrium, and derives a provably optimal defense with stronger empirical performance.

Metaphor Is Not All Attention Needs

cs.CL · 2026-05-12 · unverdicted · novelty 5.0

Poetic jailbreaks succeed because they induce distinct attention patterns in LLMs that are independent of harmful-content detection, not because models fail to recognize literary formatting.

citing papers explorer

Showing 6 of 6 citing papers.