The Twelfth International Conference on Learning Representations , year=

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To! , author=

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

Different Paths to Harmful Compliance: Behavioral Side Effects and Mechanistic Divergence Across LLM Jailbreaks

cs.CR · 2026-04-20 · unverdicted · novelty 6.0

Different LLM jailbreak techniques achieve similar harmful compliance but lead to distinct behavioral side effects and mechanistic changes.

Jupiter-N Technical Report

cs.CL · 2026-04-19 · unverdicted · novelty 5.0

Jupiter-N is a post-trained version of Nemotron 3 Super that reports gains on Welsh benchmarks, terminal agent tasks, and instruction following while retaining base capabilities, released openly as a template for sovereign cultural AI adaptation.

Language-Switching Triggers Take a Latent Detour Through Language Models

cs.CL · 2026-05-18

citing papers explorer

Showing 3 of 3 citing papers.

Different Paths to Harmful Compliance: Behavioral Side Effects and Mechanistic Divergence Across LLM Jailbreaks cs.CR · 2026-04-20 · unverdicted · none · ref 19
Different LLM jailbreak techniques achieve similar harmful compliance but lead to distinct behavioral side effects and mechanistic changes.
Jupiter-N Technical Report cs.CL · 2026-04-19 · unverdicted · none · ref 30
Jupiter-N is a post-trained version of Nemotron 3 Super that reports gains on Welsh benchmarks, terminal agent tasks, and instruction following while retaining base capabilities, released openly as a template for sovereign cultural AI adaptation.
Language-Switching Triggers Take a Latent Detour Through Language Models cs.CL · 2026-05-18 · unreviewed · ref 32

The Twelfth International Conference on Learning Representations , year=

fields

years

verdicts

representative citing papers

citing papers explorer