Diffcap: Diffusion-based cumulative adversarial purification for vision language models

Jia Fu, Yongtao Wu, Yihang Chen, Kunyu Peng, Xiao Zhang, V olkan Cevher, Sepideh Pashami, Anders Holst · 2025 · cs.CV · arXiv 2506.03933

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Vision Language Models (VLMs) have shown remarkable capabilities in multimodal understanding, yet their susceptibility to adversarial perturbations poses a significant threat to their reliability in real-world applications. Despite often being imperceptible to humans, these perturbations can drastically alter model outputs, leading to erroneous interpretations and decisions. This paper introduces DiffCAP, a novel diffusion-based purification strategy that can effectively neutralize adversarial corruptions in VLMs. We theoretically establish a provable recovery region in the forward diffusion process and meanwhile quantify the convergence rate of semantic variation with respect to VLMs. These findings manifest that adversarial effects monotonically fade as diffusion unfolds. Guided by this principle, DiffCAP leverages noise injection with a similarity threshold of VLM embeddings as an adaptive criterion, before reverse diffusion restores a clean and reliable representation for VLM inference. Through extensive experiments across six datasets with three VLMs under varying attack strengths in three task scenarios, we show that DiffCAP outperforms existing defense techniques by a substantial margin. Notably, DiffCAP significantly reduces both hyperparameter tuning complexity and the required diffusion time, thereby accelerating the denoising process. Equipped with theorems and empirical support, DiffCAP provides a robust and practical solution for securely deploying VLMs in adversarial environments. The source code is available at https://github.com/JasonFu1998/DiffCAP.

representative citing papers

Structure-Guided Visual Perturbation Neutralization for LVLMs

cs.CV · 2026-05-27 · unverdicted · novelty 5.0

SIGN is a new defense framework for LVLMs that neutralizes adversarial perturbations with over 87% success rate using 0.5% pixel modification and 0.16 seconds per image while preserving model performance.

Breaking the Illusion: Consensus-Based Generative Mitigation of Adversarial Illusions in Multi-Modal Embeddings

cs.LG · 2025-11-26 · conditional · novelty 5.0

Generative purification with consensus aggregation reduces adversarial illusion attack success rates to near zero on ImageBind while improving alignment on both clean and attacked inputs.

citing papers explorer

Showing 2 of 2 citing papers.

Structure-Guided Visual Perturbation Neutralization for LVLMs cs.CV · 2026-05-27 · unverdicted · none · ref 2 · internal anchor
SIGN is a new defense framework for LVLMs that neutralizes adversarial perturbations with over 87% success rate using 0.5% pixel modification and 0.16 seconds per image while preserving model performance.
Breaking the Illusion: Consensus-Based Generative Mitigation of Adversarial Illusions in Multi-Modal Embeddings cs.LG · 2025-11-26 · conditional · none · ref 8 · internal anchor
Generative purification with consensus aggregation reduces adversarial illusion attack success rates to near zero on ImageBind while improving alignment on both clean and attacked inputs.

Diffcap: Diffusion-based cumulative adversarial purification for vision language models

fields

years

verdicts

representative citing papers

citing papers explorer