Robust vision encoders from multimodal adversarial pretraining transfer to MLLMs and deliver large gains in adversarial captioning and VQA performance, while test-time stochastic transformations provide an effective black-box defense.
Adpo: Enhancing the adversarial ro- bustness of large vision-language models with preference optimization.arXiv preprint arXiv:2504.01735, 2025
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Investigating Adversarial Robustness of Multi-modal Large Language Models
Robust vision encoders from multimodal adversarial pretraining transfer to MLLMs and deliver large gains in adversarial captioning and VQA performance, while test-time stochastic transformations provide an effective black-box defense.