pith. sign in

Derail yourself: Multi-turn llm jailbreak attack through self- discovered clues

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

citation-role summary

background 1 baseline 1

citation-polarity summary

years

2026 5 2025 2

clear filters

representative citing papers

Activation-Guided Local Editing for Jailbreaking Attacks

cs.CR · 2025-08-01 · unverdicted · novelty 5.0

AGILE is a two-stage jailbreak attack that combines scenario-based rephrasing with activation-guided local editing to reach state-of-the-art attack success rates and strong black-box transferability.

citing papers explorer

Showing 1 of 1 citing paper after filters.