Visual contextual attack: Jailbreaking mllms with image-driven context injection

Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection , author= · 2025 · arXiv 2507.02844

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

SafeSteer: Localized On-Policy Distillation for Efficient Safety Alignment

cs.AI · 2026-06-01 · unverdicted · novelty 6.0

SafeSteer restricts reverse KL penalty to safety tokens selected via activation steering, achieving strong safety on seven benchmarks with minimal degradation on five capability benchmarks using only 100 harmful samples and no general data.

The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems

cs.CR · 2026-04-13 · unverdicted · novelty 6.0

Salami Attack chains low-risk inputs to cumulatively trigger high-risk LLM behaviors, achieving over 90% success on GPT-4o and Gemini while resisting some defenses.

Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMs

cs.LG · 2026-04-10 · unverdicted · novelty 6.0

DACO curates a 15,000-concept dictionary from 400K image-caption pairs and uses it to initialize an SAE that enables granular, concept-specific steering of MLLM activations, raising safety scores on MM-SafetyBench and JailBreakV while preserving general capabilities.

citing papers explorer

Showing 3 of 3 citing papers.

SafeSteer: Localized On-Policy Distillation for Efficient Safety Alignment cs.AI · 2026-06-01 · unverdicted · none · ref 68
SafeSteer restricts reverse KL penalty to safety tokens selected via activation steering, achieving strong safety on seven benchmarks with minimal degradation on five capability benchmarks using only 100 harmful samples and no general data.
The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems cs.CR · 2026-04-13 · unverdicted · none · ref 26
Salami Attack chains low-risk inputs to cumulatively trigger high-risk LLM behaviors, achieving over 90% success on GPT-4o and Gemini while resisting some defenses.
Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMs cs.LG · 2026-04-10 · unverdicted · none · ref 65
DACO curates a 15,000-concept dictionary from 400K image-caption pairs and uses it to initialize an SAE that enables granular, concept-specific steering of MLLM activations, raising safety scores on MM-SafetyBench and JailBreakV while preserving general capabilities.

Visual contextual attack: Jailbreaking mllms with image-driven context injection

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer