Different LLM jailbreak techniques achieve similar harmful compliance but lead to distinct behavioral side effects and mechanistic changes.
The Twelfth International Conference on Learning Representations , year=
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3representative citing papers
Jupiter-N is a post-trained version of Nemotron 3 Super that reports gains on Welsh benchmarks, terminal agent tasks, and instruction following while retaining base capabilities, released openly as a template for sovereign cultural AI adaptation.
citing papers explorer
-
Different Paths to Harmful Compliance: Behavioral Side Effects and Mechanistic Divergence Across LLM Jailbreaks
Different LLM jailbreak techniques achieve similar harmful compliance but lead to distinct behavioral side effects and mechanistic changes.
-
Jupiter-N Technical Report
Jupiter-N is a post-trained version of Nemotron 3 Super that reports gains on Welsh benchmarks, terminal agent tasks, and instruction following while retaining base capabilities, released openly as a template for sovereign cultural AI adaptation.
- Language-Switching Triggers Take a Latent Detour Through Language Models