pith. sign in

A cross-language investigation into jailbreak attacks in large language models

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

citation-role summary

background 1 method 1

citation-polarity summary

polarities

background 2

representative citing papers

TukaBench: A Culturally Grounded Jailbreak Benchmark for African Languages

cs.CL · 2026-05-31 · unverdicted · novelty 7.0

TukaBench extends JailbreakBench to African languages via human translation, cultural adaptation, curated prompts, and code-switching, finding lower refusal rates for culturally grounded prompts and surfacing comprehension and judging limitations.

Multilingual Safety Alignment via Self-Distillation

cs.LG · 2026-05-03 · unverdicted · novelty 6.0 · 2 refs

MSD enables cross-lingual safety transfer in LLMs via self-distillation with Dual-Perspective Safety Weighting, improving safety in low-resource languages without target response data.

Cross-Lingual Jailbreak Detection via Semantic Codebooks

cs.CL · 2026-04-28 · unverdicted · novelty 5.0

Semantic similarity to an English jailbreak codebook detects cross-lingual attacks with high accuracy on curated benchmarks but shows poor separability on diverse unsafe prompts.

citing papers explorer

Showing 7 of 7 citing papers.