Exploiting Vision Encoder Vulnerabilities for Universal Adversarial Perturbations on Large Vision-Language Models

Changick Kim; Hee-Seon Kim; Minbeom Kim; Seokil Ham

arxiv: 2412.08108 · v3 · pith:2NSWDSLFnew · submitted 2024-12-11 · 💻 cs.CV · cs.CL· cs.CR

Exploiting Vision Encoder Vulnerabilities for Universal Adversarial Perturbations on Large Vision-Language Models

Hee-Seon Kim , Minbeom Kim , Seokil Ham , Changick Kim This is my paper

classification 💻 cs.CV cs.CLcs.CR

keywords encodervisionadversarialvev-uapacrossanalysisattackcomponents

0 comments

read the original abstract

Large Vision-Language Models (LVLMs) have achieved remarkable performance on multimodal tasks but remain highly vulnerable to small adversarial perturbations in input images. Existing attacks typically target the vision encoder's final output embeddings, implicitly treating the encoder as a uniform attack surface, while a systematic analysis of which internal components are most vulnerable has remained largely unexplored. We show such analysis is essential, as adversarial vulnerability in LVLM vision encoders is structurally concentrated rather than uniformly distributed. Building on this, we propose Vision Encoder Vulnerable-Component-Targeted Universal Adversarial Perturbation (VEV-UAP), a task-agnostic and cost-efficient attack framework. Through a component- and layer-wise analysis of attention mechanisms, we identify the value components in middle layers as critical vulnerabilities that strongly influence downstream language model behavior. VEV-UAP selectively targets these components to generate a single universal perturbation shared across images, without involving textual inputs or the language model during optimization. Experiments across multiple LVLMs and tasks show VEV-UAP achieves state-of-the-art attack success rates with reduced computational overhead. Moreover, a single VEV-UAP transfers across LVLMs sharing the same vision encoder, even when paired with different language models, making it a practical framework for scalable robustness evaluation.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Seeing No Evil: Blinding Large Vision-Language Models to Safety Instructions via Adversarial Attention Hijacking
cs.CV 2026-04 unverdicted novelty 6.0

Attention-Guided Visual Jailbreaking blinds LVLMs to safety instructions by suppressing attention to alignment prefixes and anchoring generation on adversarial image features, reaching 94.4% attack success rate on Qwen-VL.
ORCA: An Agentic Reasoning Framework for Hallucination and Adversarial Robustness in Vision-Language Models
cs.CV 2025-09 unverdicted novelty 6.0

ORCA is an inference-time agentic framework that boosts LVLM accuracy on hallucination benchmarks by 3.64-40.67% and adds adversarial robustness via cross-model validation with small vision tools.
ORCA: An Agentic Reasoning Framework for Hallucination and Adversarial Robustness in Vision-Language Models
cs.CV 2025-09 unverdicted novelty 6.0

ORCA is an agentic reasoning framework that enhances factual accuracy and adversarial robustness of pretrained LVLMs via an Observe-Reason-Critique-Act loop with small vision models, reporting accuracy gains of up to ...
Structure-Guided Visual Perturbation Neutralization for LVLMs
cs.CV 2026-05 unverdicted novelty 5.0

SIGN is a new defense framework for LVLMs that neutralizes adversarial perturbations with over 87% success rate using 0.5% pixel modification and 0.16 seconds per image while preserving model performance.