BadDLM implants effective backdoors in diffusion language models across concept, attribute, alignment, and payload targets by exploiting denoising dynamics while preserving clean performance.
Mmlu-pro: A more robust and challenging multi-task language understanding benchmark.Advances in Neural Information Processing Systems, 37:95266–95290
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CR 2years
2026 2representative citing papers
A single safety demonstration appended at inference time mitigates many-shot jailbreak attacks by counteracting implicit malicious fine-tuning on harmful examples.
citing papers explorer
-
BadDLM: Backdooring Diffusion Language Models with Diverse Targets
BadDLM implants effective backdoors in diffusion language models across concept, attribute, alignment, and payload targets by exploiting denoising dynamics while preserving clean performance.
-
Mitigating Many-shot Jailbreak Attacks with One Single Demonstration
A single safety demonstration appended at inference time mitigates many-shot jailbreak attacks by counteracting implicit malicious fine-tuning on harmful examples.