RouteHijack is a routing-aware jailbreak that identifies safety-critical experts via activation contrast and optimizes suffixes to suppress them, reaching 69.3% average attack success rate on seven MoE LLMs with strong transfer to variants and VLMs.
The dark deep side of DeepSeek: Fine-tuning attacks against the safety alignment of CoT-enabled models,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
An experimental framework and annotated dataset show LLM-generated PowerShell malware triggers OS events with median 84.5% Jaccard overlap to real malware and 48.4% complete matches.
citing papers explorer
-
RouteHijack: Routing-Aware Attack on Mixture-of-Experts LLMs
RouteHijack is a routing-aware jailbreak that identifies safety-critical experts via activation contrast and optimizes suffixes to suppress them, reaching 69.3% average attack success rate on seven MoE LLMs with strong transfer to variants and VLMs.
-
AI-Generated PowerShell Malware: An Experimental Framework and Dataset
An experimental framework and annotated dataset show LLM-generated PowerShell malware triggers OS events with median 84.5% Jaccard overlap to real malware and 48.4% complete matches.