pith. sign in

arxiv: 2605.21541 · v1 · pith:WY63FT42new · submitted 2026-05-20 · 💻 cs.CR · cs.AI· cs.LG· stat.ML

Frequency-Domain Regularized Adversarial Alignment for Transferable Attacks against Closed-Source MLLMs

Pith reviewed 2026-05-22 01:20 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.LGstat.ML
keywords adversarial transfer attacksmultimodal large language modelsfrequency domain regularizationclosed-source modelspatch featuresgradient regularizationtargeted attacks
0
0 comments X

The pith

Frequency-domain regularization aligns attacks with shared visual focuses across MLLMs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to show that adversarial perturbations optimized on open surrogates can transfer more effectively to closed-source multimodal large language models when both feature alignment and gradient updates are regularized in the frequency domain. It identifies spatial-domain redundancy and surrogate-specific signals as the main barriers to transferability. FRA-Attack counters them by applying a high-pass DCT objective to patch features, concentrating loss on high-frequency bands that hold intrinsic visual cues, and by modulating gradients with a low-pass filter based solely on geometric frequency coordinates. This produces perturbations that follow transferable semantic directions rather than model-specific artifacts. Experiments across fifteen flagship models from seven vendors, with especially strong results on GPT-5.4, Claude-Opus-4.6 and Gemini-3-flash, support the claim that such regularization improves cross-model success.

Core claim

FRA-Attack establishes that a unified frequency-domain treatment, consisting of a high-pass DCT objective on patch features for alignment and a model-agnostic low-pass regularizer on gradients using only geometric frequency coordinates, removes surrogate-specific high-frequency artifacts while preserving transferable low-frequency directions, yielding superior cross-model transferability to closed-source MLLMs.

What carries the argument

Frequency-domain regularization with high-pass DCT for patch-feature alignment and Frequency-domain Gradient Regularization (FGR) that modulates gradients using only geometric frequency coordinates.

If this is right

  • Perturbations align with semantic cues that are common across models rather than with surrogate-specific behaviors.
  • Transferability increases to closed-source models without requiring any internal access or statistics from the target.
  • Low-frequency gradient directions remain effective for transfer while high-frequency surrogate artifacts are suppressed.
  • The approach scales to multiple vendors and flagship models including GPT-5.4 and Gemini-3-flash.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Frequency-based regularizers might extend to transfer attacks in other multimodal or unimodal settings where surrogate-specific noise limits generalization.
  • Defenses could target high-frequency input components to disrupt the shared visual focus exploited by such attacks.
  • Further tests on additional MLLM families would check whether the high-frequency band remains a reliable carrier of transferable cues.

Load-bearing premise

The high-frequency band in patch features carries the intrinsic visual focus shared across different MLLMs.

What would settle it

An ablation experiment in which either the high-pass DCT objective or the geometric-frequency low-pass regularizer is removed, followed by re-testing transfer success rates on the same closed-source targets such as Claude-Opus-4.6.

Figures

Figures reproduced from arXiv: 2605.21541 by Daizong Liu, Dongrui Liu, Jing Shao, Kun Wang, Leitao Yuan, Qinghua Mao, Wenjie Wang, Yan Teng.

Figure 1
Figure 1. Figure 1: Two orthogonal frequency-domain levers for transferable VLM attacks. (a) Loss￾side DCT alignment localizes the transferable visual focus. (b) Gradient-side FGR consensus suppresses model-specific gradient noise and stabilizes the update direction. During the perception of clean and adversarial visual patterns, modern MLLMs generally rely on attention-based visual encoders [30, 26, 4] that aggregate patch t… view at source ↗
Figure 2
Figure 2. Figure 2: Overall pipeline of FRA-Attack. Given a source image xs and a target xt, we update the perturbation by (i) aligning DCT high-frequency components of patch embeddings via optimal transport and (ii) regularizing input gradients with a radial low-pass filter. dimension to transform spatial-domain features into frequency-domain coefficients: Fk = P X−1 n=0 en cosh π P [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Mean ASR (%) of closed-source models with varied ℓ∞ budget. The advantage of FRA-Attack persists across per￾turbation budgets [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 3
Figure 3. Figure 3: Ablation Study of FGR design. (a) Mean ASR (%) for five FGR variants and a no-FGR [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Per-patch high-frequency DCT energy (n=5, θ=20) of the source, target, and FRA-Attack adversarial example, overlaid on the original image with the jet colormap. C.3 Full Experimental Setup This subsection collects the implementation details deferred from Section 3.1. Source–target pair construction. We sample 1,000 source images from the NIPS 2017 Adversarial Attacks and Defenses Competition development se… view at source ↗
Figure 6
Figure 6. Figure 6: Three-level keyword matching rate (KMR) on [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: GPT-5.2 responses to FRA-Attack adversarial examples. [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: GPT-5.4 responses to FRA-Attack adversarial examples. [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Claude-Opus-4.6 responses to FRA-Attack adversarial examples. [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Claude-Sonnet-4.6 responses to FRA-Attack adversarial examples. [PITH_FULL_IMAGE:figures/full_fig_p025_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Gemini-3-Flash responses to FRA-Attack adversarial examples. [PITH_FULL_IMAGE:figures/full_fig_p025_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Gemini-2.5-Flash responses to FRA-Attack adversarial examples. [PITH_FULL_IMAGE:figures/full_fig_p026_12.png] view at source ↗
read the original abstract

Multimodal large language models (MLLMs) remain vulnerable to transfer-based targeted attacks, where perturbations optimized on open-source surrogate encoders can generalize to closed-source MLLMs. A key challenge for improving adversarial transferability is to effectively capture the intrinsic visual focus shared across different models, such that perturbations align with transferable semantic cues rather than surrogate-specific behaviors. However, existing methods suffer from spatial-domain feature redundancy and surrogate-specific gradient signals, thereby hindering cross-model transferability. In this paper, we propose FRA-Attack, which addresses both challenges from a unified frequency-domain regularization perspective. For feature alignment, a high-pass DCT objective on patch features suppresses redundant global structures and concentrates the loss on the high-frequency band that carries the MLLMs' intrinsic visual focus. For gradient optimization, we introduce Frequency-domain Gradient Regularization (FGR), a \textit{model-agnostic} low-pass regularizer that modulates the surrogate gradient using only the geometric frequency coordinate, \textit{i.e.}, no surrogate-derived statistic is involved, so that FGR is model-agnostic by construction, removing surrogate-specific high-frequency artifacts while preserving transferable low-frequency directions. Together, the two components form a unified frequency-domain treatment of transferability. Extensive experiments on $15$ flagship MLLMs across $7$ vendors show that FRA-Attack achieves superior cross-model transferability, particularly with state-of-the-art performance on GPT-5.4, Claude-Opus-4.6 and Gemini-3-flash.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes FRA-Attack, a frequency-domain method to improve transfer-based targeted attacks from open-source surrogate encoders to closed-source MLLMs. It introduces a high-pass DCT objective on patch features to suppress spatial redundancy and concentrate on high-frequency components presumed to encode shared intrinsic visual focus across models. It also presents Frequency-domain Gradient Regularization (FGR), a low-pass regularizer that modulates surrogate gradients using only geometric frequency coordinates (no surrogate-derived statistics) to remove model-specific high-frequency artifacts while retaining transferable low-frequency directions. The authors claim that extensive experiments across 15 flagship MLLMs from 7 vendors demonstrate superior cross-model transferability, with state-of-the-art results on GPT-5.4, Claude-Opus-4.6, and Gemini-3-flash.

Significance. If the core premise that high-frequency patch features encode model-invariant visual focus can be substantiated with cross-model evidence, the unified frequency-domain treatment could offer a useful regularization strategy for adversarial transferability that avoids surrogate overfitting. The explicit model-agnostic construction of FGR is a methodological strength worth highlighting. However, the current lack of supporting statistics or ablation on the frequency-band invariance assumption limits the immediate impact on the field of black-box multimodal attacks.

major comments (2)
  1. [Abstract and §3] Abstract and §3 (method description): The central claim that the high-frequency band 'carries the MLLMs' intrinsic visual focus' shared across different models is stated without any cross-model feature statistics, gradient-spectrum comparisons, or ablation showing that surviving DCT components are model-invariant rather than surrogate-dependent. This assumption directly supports both the high-pass alignment objective and the reported transferability gains; its absence is load-bearing for the paper's contribution.
  2. [Abstract and experimental section] Abstract and experimental section: The claim of 'superior cross-model transferability' and 'state-of-the-art performance' on 15 models is presented without quantitative metrics, baseline tables, statistical tests, or details on data exclusion and hyperparameter choices in the provided text. This prevents verification of the data-to-claim link for the central empirical result.
minor comments (2)
  1. [Abstract] The abstract would benefit from a brief summary sentence containing at least one key quantitative result (e.g., attack success rate improvement) to allow readers to gauge the scale of the reported gains.
  2. [Method] Notation for the FGR modulation (geometric frequency coordinate) should be introduced with an explicit equation early in the method section for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. The comments identify important areas where additional evidence and clarity will strengthen the manuscript. We address each major comment below and commit to revisions that directly respond to the concerns raised.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (method description): The central claim that the high-frequency band 'carries the MLLMs' intrinsic visual focus' shared across different models is stated without any cross-model feature statistics, gradient-spectrum comparisons, or ablation showing that surviving DCT components are model-invariant rather than surrogate-dependent. This assumption directly supports both the high-pass alignment objective and the reported transferability gains; its absence is load-bearing for the paper's contribution.

    Authors: We agree that the invariance assumption is central and that direct supporting evidence should be provided. The high-pass DCT objective is motivated by the goal of suppressing spatially redundant low-frequency structures that tend to be surrogate-specific while emphasizing higher-frequency components that align with semantic details more likely to transfer. In the revised manuscript we will add a dedicated analysis subsection containing cross-model feature statistics (e.g., average cosine similarity of high-pass DCT patch features computed between the surrogate encoder and several target MLLMs) together with frequency-band ablations that compare transferability when only high-pass versus low-pass components are retained. These additions will make the empirical grounding of the assumption explicit. revision: yes

  2. Referee: [Abstract and experimental section] Abstract and experimental section: The claim of 'superior cross-model transferability' and 'state-of-the-art performance' on 15 models is presented without quantitative metrics, baseline tables, statistical tests, or details on data exclusion and hyperparameter choices in the provided text. This prevents verification of the data-to-claim link for the central empirical result.

    Authors: We acknowledge that the abstract and the high-level experimental summary should contain concrete numbers to allow immediate verification. The full manuscript already reports attack success rates on all 15 MLLMs, comparisons against multiple baselines, and hyperparameter settings in the experimental section. To improve accessibility we will revise the abstract to include the principal quantitative results (e.g., average ASR improvements) and will expand the experimental section with explicit references to the comparison tables, a description of the evaluation protocol, data selection criteria, and hyperparameter ranges. We will also report statistical significance tests (paired t-tests) for the main transferability gains. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper proposes FRA-Attack via two explicit design choices: a high-pass DCT objective motivated by the premise that high-frequency patch features encode shared visual focus, and FGR defined directly as a low-pass modulator using only geometric frequency coordinates with the explicit statement that no surrogate-derived statistics are involved. Neither component reduces to a fitted parameter renamed as prediction, a self-citation chain, or an ansatz smuggled from prior work by the same authors. The method is presented as a unified frequency-domain treatment whose transferability claims are evaluated on external closed-source targets rather than derived tautologically from its own inputs. This leaves the derivation self-contained against the stated modeling assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that high-frequency components encode shared visual focus across MLLMs and that a purely geometric low-pass filter suffices to remove surrogate-specific signals.

axioms (2)
  • domain assumption High-frequency band carries the MLLMs' intrinsic visual focus shared across models
    Invoked to justify the high-pass DCT objective in the abstract description of feature alignment.
  • domain assumption Geometric frequency coordinate alone produces a model-agnostic regularizer
    Stated as the basis for FGR being model-agnostic by construction.

pith-pipeline@v0.9.0 · 5834 in / 1391 out tokens · 58821 ms · 2026-05-22T01:20:11.016634+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 9 internal anchors

  1. [1]

    Discrete cosine transform.IEEE transactions on Computers, 100(1):90–93, 1974

    Nasir Ahmed, T_ Natarajan, and Kamisetty R Rao. Discrete cosine transform.IEEE transactions on Computers, 100(1):90–93, 1974

  2. [2]

    Flamingo: a visual language model for few-shot learning.Advances in neural information processing systems, 35:23716–23736, 2022

    Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katherine Millican, Malcolm Reynolds, et al. Flamingo: a visual language model for few-shot learning.Advances in neural information processing systems, 35:23716–23736, 2022

  3. [3]

    Claude opus 4.6 system card

    Anthropic. Claude opus 4.6 system card. Technical report, Anthropic, 2026

  4. [4]

    Qwen3-VL Technical Report

    Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-vl technical report.arXiv preprint arXiv:2511.21631, 2025

  5. [5]

    Bootstrap generalization ability from loss landscape perspective

    Huanran Chen, Shitong Shao, Ziyi Wang, Zirui Shang, Jin Chen, Xiaofeng Ji, and Xinxiao Wu. Bootstrap generalization ability from loss landscape perspective. InEuropean conference on computer vision, pages 500–517. Springer, 2022

  6. [6]

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261, 2025

  7. [7]

    How robust is google’s bard to adversarial image attacks?arXiv preprint arXiv:2309.11751, 2023

    Yinpeng Dong, Huanran Chen, Jiawei Chen, Zhengwei Fang, Xiao Yang, Yichi Zhang, Yu Tian, Hang Su, and Jun Zhu. How robust is google’s bard to adversarial image attacks?arXiv preprint arXiv:2309.11751, 2023

  8. [8]

    Frequency-guided adaptive gradient attack for transferable adversarial examples

    Zewei Fu, Ya Li, and Yan Huang. Frequency-guided adaptive gradient attack for transferable adversarial examples. InAsian Conference on Pattern Recognition, pages 381–396. Springer, 2025

  9. [9]

    Boosting transferability in vision-language attacks via diversification along the intersection region of adversarial trajectory

    Sensen Gao, Xiaojun Jia, Xuhong Ren, Ivor Tsang, and Qing Guo. Boosting transferability in vision-language attacks via diversification along the intersection region of adversarial trajectory. InEuropean Conference on Computer Vision, pages 442–460. Springer, 2024

  10. [10]

    Boosting adversarial transferability via commonality-oriented gradient optimization

    Yanting Gao, Yepeng Liu, Junming Liu, Qi Zhang, Hongyun Zhang, Duoqian Miao, and Cairong Zhao. Boosting adversarial transferability via commonality-oriented gradient optimization. InChinese Conference on Pattern Recognition and Computer Vision (PRCV), pages 62–76. Springer, 2025

  11. [11]

    Goodfellow, Jonathon Shlens, and Christian Szegedy

    Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adver- sarial examples. In3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015

  12. [12]

    The Llama 3 Herd of Models

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

  13. [13]

    Improving adversarial transferability in mllms via dynamic vision-language alignment attack.arXiv preprint arXiv:2502.19672, 2025

    Chenhe Gu, Jindong Gu, Andong Hua, and Yao Qin. Improving adversarial transferability in mllms via dynamic vision-language alignment attack.arXiv preprint arXiv:2502.19672, 2025

  14. [14]

    Tesser: Transfer-enhancing adversarial attacks from vision transformers via spectral and semantic regularization.arXiv preprint arXiv:2505.19613, 2025

    Amira Guesmi, Bassem Ouni, and Muhammad Shafique. Tesser: Transfer-enhancing adversarial attacks from vision transformers via spectral and semantic regularization.arXiv preprint arXiv:2505.19613, 2025

  15. [15]

    Low Frequency Adversarial Perturbation

    Chuan Guo, Jared S Frank, and Kilian Q Weinberger. Low frequency adversarial perturbation. arXiv preprint arXiv:1809.08758, 2018. 10

  16. [16]

    Qi Guo, Shanmin Pang, Xiaojun Jia, Yang Liu, and Qing Guo. Efficient generation of targeted and transferable adversarial examples for vision-language models via diffusion models.IEEE Transactions on Information Forensics and Security, 20:1333–1348, 2024

  17. [17]

    GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

    Wenyi Hong, Wenmeng Yu, Xiaotao Gu, Guo Wang, Guobing Gan, Haomiao Tang, Jiale Cheng, Ji Qi, Junhui Ji, Lihang Pan, et al. Glm-4.5 v and glm-4.1 v-thinking: Towards versatile multimodal reasoning with scalable reinforcement learning.arXiv preprint arXiv:2507.01006, 2025

  18. [18]

    Transferable adversarial attacks on black-box vision-language models.arXiv preprint arXiv:2505.01050, 2025

    Kai Hu, Weichen Yu, Li Zhang, Alexander Robey, Andy Zou, Chengming Xu, Haoqi Hu, and Matt Fredrikson. Transferable adversarial attacks on black-box vision-language models.arXiv preprint arXiv:2505.01050, 2025

  19. [19]

    Fe-advgan: A lightweight model adversarial attack method based on frequency-domain guidance and attention mechanism

    Man Hua and Ximing Liu. Fe-advgan: A lightweight model adversarial attack method based on frequency-domain guidance and attention mechanism. In2025 International Conference on Algorithms, Software and Network Security (ASNS), pages 30–35, 2025

  20. [20]

    GPT-4o System Card

    Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card.arXiv preprint arXiv:2410.21276, 2024

  21. [21]

    Adversarial attacks against closed-source mllms via feature optimal alignment

    Xiaojun Jia, Sensen Gao, Simeng Qin, Tianyu Pang, Chao Du, Yihao Huang, Xinfeng Li, Yiming Li, Bo Li, and Yang Liu. Adversarial attacks against closed-source mllms via feature optimal alignment. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  22. [22]

    Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models

    Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. InInternational conference on machine learning, pages 19730–19742. PMLR, 2023

  23. [23]

    Blip: Bootstrapping language- image pre-training for unified vision-language understanding and generation

    Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. Blip: Bootstrapping language- image pre-training for unified vision-language understanding and generation. InInternational conference on machine learning, pages 12888–12900. PMLR, 2022

  24. [24]

    A frustratingly simple yet highly effective attack baseline: Over 90% success rate against the strong black- box models of gpt-4.5/4o/o1

    Zhaoyi Li, Xiaohan Zhao, Dong-Dong Wu, Jiacheng Cui, and Zhiqiang Shen. A frustratingly simple yet highly effective attack baseline: Over 90% success rate against the strong black- box models of gpt-4.5/4o/o1. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  25. [25]

    Microsoft coco: Common objects in context

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. InEuropean conference on computer vision, pages 740–755. Springer, 2014

  26. [26]

    Visual instruction tuning.Advances in neural information processing systems, 36:34892–34916, 2023

    Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning.Advances in neural information processing systems, 36:34892–34916, 2023

  27. [27]

    Frequency-based comprehensive prompt learning for vision-language models.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

    Liangchen Liu, Nannan Wang, Chen Chen, Decheng Liu, Xi Yang, Xinbo Gao, and Tongliang Liu. Frequency-based comprehensive prompt learning for vision-language models.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

  28. [28]

    Safety of multimodal large language models on images and text

    Xin Liu, Yichen Zhu, Yunshi Lan, Chao Yang, and Yu Qiao. Safety of multimodal large language models on images and text. InProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, pages 8151–8159, 2024

  29. [29]

    Frequency domain model augmentation for adversarial attack

    Yuyang Long, Qilong Zhang, Boheng Zeng, Lianli Gao, Xianglong Liu, Jian Zhang, and Jingkuan Song. Frequency domain model augmentation for adversarial attack. InEuropean conference on computer vision, pages 549–566. Springer, 2022

  30. [30]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 11

  31. [31]

    Improving adversarial transferability on vision transformers via forward propagation refinement

    Yuchen Ren, Zhengyu Zhao, Chenhao Lin, Bo Yang, Lu Zhou, Zhe Liu, and Chao Shen. Improving adversarial transferability on vision transformers via forward propagation refinement. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 25071– 25080, 2025

  32. [32]

    OpenAI GPT-5 System Card

    Aaditya Singh, Adam Fry, Adam Perelman, Adam Tart, Adi Ganesh, Ahmed El-Kishky, Aidan McLaughlin, Aiden Low, AJ Ostrow, Akhila Ananthram, et al. Openai gpt-5 system card.arXiv preprint arXiv:2601.03267, 2025

  33. [33]

    Towards the resistance of neural network fingerprinting to fine-tuning

    Ling Tang, YueFeng Chen, Hui Xue, and Quanshi Zhang. Towards the resistance of neural network fingerprinting to fine-tuning. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  34. [34]

    Defects of convolutional decoder networks in frequency representation

    Ling Tang, Wen Shen, Zhanpeng Zhou, Yuefeng Chen, and Quanshi Zhang. Defects of convolutional decoder networks in frequency representation. InInternational Conference on Machine Learning, pages 33758–33791. PMLR, 2023

  35. [35]

    Gemma 3 Technical Report

    Gemma Team. Gemma 3 technical report.arXiv preprint arXiv:2503.19786, 2025

  36. [36]

    Kimi Team, Tongtong Bai, Yifan Bai, Yiping Bao, SH Cai, Yuan Cao, Y Charles, HS Che, Cheng Chen, Guanduo Chen, et al. Kimi k2. 5: Visual agentic intelligence.arXiv preprint arXiv:2602.02276, 2026

  37. [37]

    Enhancing transferability of adversarial examples with spatial momentum

    Guoqiu Wang, Huanqian Yan, and Xingxing Wei. Enhancing transferability of adversarial examples with spatial momentum. InChinese Conference on Pattern Recognition and Computer Vision (PRCV), pages 593–604. Springer, 2022

  38. [38]

    Enhancing the transferability of adversarial attacks through variance tuning

    Xiaosen Wang and Kun He. Enhancing the transferability of adversarial attacks through variance tuning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1924–1933, 2021

  39. [39]

    Exploring frequencies via feature mixing and meta-learning for improving adversarial transferability.IEEE Transactions on Image Processing, 2026

    Juanjuan Weng, Zhiming Luo, and Shaozi Li. Exploring frequencies via feature mixing and meta-learning for improving adversarial transferability.IEEE Transactions on Image Processing, 2026

  40. [40]

    Towards transferable adversarial attacks with centralized perturbation

    Shangbo Wu, Yu-an Tan, Yajie Wang, Ruinan Ma, Wencong Ma, and Yuanzhang Li. Towards transferable adversarial attacks with centralized perturbation. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 6109–6116, 2024

  41. [41]

    Stochastic variance reduced ensemble adversarial attack for boosting the adversarial transferability

    Yifeng Xiong, Jiadong Lin, Min Zhang, John E Hopcroft, and Kun He. Stochastic variance reduced ensemble adversarial attack for boosting the adversarial transferability. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14983–14992, 2022

  42. [42]

    FACL-Attack: Frequency-aware contrastive learning for transferable adversarial attacks

    Hunmin Yang, Jongoh Jeong, and Kuk-Jin Yoon. FACL-Attack: Frequency-aware contrastive learning for transferable adversarial attacks. InProceedings of the AAAI Conference on Artificial Intelligence, 2024

  43. [43]

    Anyattack: Towards large-scale self-supervised adversarial attacks on vision-language models

    Jiaming Zhang, Junhong Ye, Xingjun Ma, Yige Li, Yunfan Yang, Yunhao Chen, Jitao Sang, and Dit-Yan Yeung. Anyattack: Towards large-scale self-supervised adversarial attacks on vision-language models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 19900–19909, 2025

  44. [44]

    Transferable adversarial attacks on vision transformers with token gradient regularization

    Jianping Zhang, Yizhan Huang, Weibin Wu, and Michael R Lyu. Transferable adversarial attacks on vision transformers with token gradient regularization. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16415–16424, 2023

  45. [45]

    Pushing the frontier of black-box lvlm attacks via fine-grained detail targeting.arXiv preprint arXiv:2602.17645, 2026

    Xiaohan Zhao, Zhaoyi Li, Yaxin Luo, Jiacheng Cui, and Zhiqiang Shen. Pushing the frontier of black-box lvlm attacks via fine-grained detail targeting.arXiv preprint arXiv:2602.17645, 2026

  46. [46]

    On evaluating adversarial robustness of large vision-language models.Advances in Neural Information Processing Systems, 36:54111–54138, 2023

    Yunqing Zhao, Tianyu Pang, Chao Du, Xiao Yang, Chongxuan Li, Ngai-Man Man Cheung, and Min Lin. On evaluating adversarial robustness of large vision-language models.Advances in Neural Information Processing Systems, 36:54111–54138, 2023. 12

  47. [47]

    Boosting adversarial transferability via high-frequency augmentation and hierarchical-gradient fusion.arXiv preprint arXiv:2505.21181, 2025

    Yayin Zheng, Chen Wan, Zihong Guo, Hailing Kuang, and Xiaohai Lu. Boosting adversarial transferability via high-frequency augmentation and hierarchical-gradient fusion.arXiv preprint arXiv:2505.21181, 2025

  48. [48]

    Minigpt-4: En- hancing vision-language understanding with advanced large language models

    Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, and Mohamed Elhoseiny. Minigpt-4: En- hancing vision-language understanding with advanced large language models. InThe Twelfth International Conference on Learning Representations, 2024. A DCT High-Frequency Energy Visualization We visualize on real source–target–adversarial triples how the DCT high-frequency ...

  49. [49]

    FGR uses the polynomial radial decay (Eq

    The DCT alignment uses high-frequency threshold θ= 10 , number of selected high-frequency components n= 10 , global weight wg = 1.0 , local weight wl = 0.2 , and Sinkhorn entropic regularization λ= 0.1 . FGR uses the polynomial radial decay (Eq. 8) with exponent p= 1.5 . The MI-FGSM momentum decay is µ= 1.0 and the dynamic-weighting temperature is T= 1.0 ...

  50. [50]

    Average ASR (%) across GPT-5.4, Claude- Opus-4.6, and Gemini-3-flash on the 100-pair pilot panel; the default ϵ=16/255, N=300 matches the 1,000-pair main panel within ±3 ASR. FRA-Attack scales monotonically with the perturbation budget: ϵ= 4/255 is too small to escape the natural caption neighbourhood (4.0 ASR), ϵ= 8/255 already lands the perturbation in ...

  51. [51]

    Main Subject Consistency: same key subject/object→higher score

  52. [52]

    Relevant Description: same context or topic→higher score

  53. [53]

    Ignore Fine-Grained Details: do not penalize phrasing or minor variation

  54. [54]

    Partial Matches: extra information without contradiction→still high score

  55. [55]

    Describe this image

    Score Range: 1.0: nearly identical in meaning. 0.8–0.9: same subject, highly related descriptions. 0.7–0.8: same subject, core meaning aligned. 0.5–0.7: same subject, different perspectives or missing details. 0.3–0.5: related but not highly similar. 0.0–0.2: completely different subjects or unrelated. Text 1: {caption_adv} Text 2: {caption_tgt} Output on...