arXiv preprint arXiv:2506.12880 , year=

Universal Jailbreak Suffixes Are Strong Attention Hijackers , author= · 2025 · arXiv 2506.12880

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

SlotGCG: Exploiting the Positional Vulnerability in LLMs for Jailbreak Attacks

cs.CR · 2026-06-04 · unverdicted · novelty 7.0

SlotGCG uses Vulnerable Slot Score (VSS) to identify and target the most vulnerable prompt positions for adversarial token insertion, delivering 14% higher ASR than standard GCG and 42% higher against defenses.

HARC: Coupling Harmfulness and Refusal Directions for Robust Safety Alignment

cs.AI · 2026-07-01 · unverdicted · novelty 6.0

HARC couples harmfulness and refusal directions across prompt and response positions via subspace fine-tuning, achieving better robustness-capability-usability trade-off than six baselines while transferring across model families.

citing papers explorer

Showing 2 of 2 citing papers.

SlotGCG: Exploiting the Positional Vulnerability in LLMs for Jailbreak Attacks cs.CR · 2026-06-04 · unverdicted · none · ref 9
SlotGCG uses Vulnerable Slot Score (VSS) to identify and target the most vulnerable prompt positions for adversarial token insertion, delivering 14% higher ASR than standard GCG and 42% higher against defenses.
HARC: Coupling Harmfulness and Refusal Directions for Robust Safety Alignment cs.AI · 2026-07-01 · unverdicted · none · ref 62
HARC couples harmfulness and refusal directions across prompt and response positions via subspace fine-tuning, achieving better robustness-capability-usability trade-off than six baselines while transferring across model families.

arXiv preprint arXiv:2506.12880 , year=

fields

years

verdicts

representative citing papers

citing papers explorer