SHADOWMASK backdoors MDLMs by modifying the forward corruption process with a trigger-mask mixture, achieving near-100% attack success while preserving clean utility on DiT-based and LLaDA models.
A study of backdoors in instruction fine-tuned language models
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
BadSkill poisons embedded models in agent skills to achieve up to 99.5% attack success rate on triggered tasks with only 3% poison rate while preserving normal behavior on non-trigger inputs.
Succinct Model Difference Proofs certify that a neural-network update stays inside a policy-defined drift class using zero-knowledge proofs whose cost depends only on the drift structure.
citing papers explorer
-
Backdooring Masked Diffusion Language Models
SHADOWMASK backdoors MDLMs by modifying the forward corruption process with a trigger-mask mixture, achieving near-100% attack success while preserving clean utility on DiT-based and LLaDA models.
-
BadSkill: Backdoor Attacks on Agent Skills via Model-in-Skill Poisoning
BadSkill poisons embedded models in agent skills to achieve up to 99.5% attack success rate on triggered tasks with only 3% poison rate while preserving normal behavior on non-trigger inputs.
-
Fine-Tuning Integrity for Modern Neural Networks: Structured Drift Proofs via Norm, Rank, and Sparsity Certificates
Succinct Model Difference Proofs certify that a neural-network update stays inside a policy-defined drift class using zero-knowledge proofs whose cost depends only on the drift structure.