pith. sign in

Sandwich attack: Multi-language mixture adaptive attack on LLMs.arXiv preprint arXiv:2404.07242,

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.AI 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

Safety Targeted Embedding Exploit via Refinement

cs.AI · 2026-07-02 · unverdicted · novelty 6.0

STEER is a gradient-guided attack that iteratively translates refusal-triggering words into low-resource languages to jailbreak LLMs, reaching 93-96.7% success on open models and 35.5% transfer to GPT-4o-mini.

citing papers explorer

Showing 1 of 1 citing paper.

  • Safety Targeted Embedding Exploit via Refinement cs.AI · 2026-07-02 · unverdicted · none · ref 12

    STEER is a gradient-guided attack that iteratively translates refusal-triggering words into low-resource languages to jailbreak LLMs, reaching 93-96.7% success on open models and 35.5% transfer to GPT-4o-mini.