arxiv: 2605.07399 · v2 · submitted 2026-05-08 · 💻 cs.CV

Recognition: no theorem link

GPO-V: Jailbreak Diffusion Vision Language Model by Global Probability Optimization

Andi Zhang, Sibei Yang, Wenjie Wang, Yi Wang, Yu Pan

Pith reviewed 2026-05-12 04:08 UTC · model grok-4.3

classification 💻 cs.CV

keywords jailbreak attacksdiffusion modelsvision-language modelsadversarial perturbationsgenerative AI safetydenoising trajectorymultimodal securitynon-autoregressive models

0 comments

The pith

Diffusion vision-language models can be jailbroken by optimizing global probabilities across the full denoising trajectory instead of fixed prefixes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that dVLMs resist conventional prefix-based jailbreaks because their denoising process triggers progressive refusal even when initial outputs appear compliant. It introduces Global Probability Optimization as a way to adjust the entire generative trajectory at once, allowing attacks to evade guardrails that prefix methods cannot. A sympathetic reader would care because dVLMs are positioned as safer alternatives to autoregressive models for multimodal tasks, yet this work shows their non-sequential structure creates new attack surfaces. If correct, current safety alignments for diffusion models are insufficient and must be rebuilt around global dynamics rather than local token predictions.

Core claim

Diffusion vision-language models exhibit Immediate Refusal and Progressive Refusal patterns that defeat fixed prefix optimization. Global Probability Optimization manipulates the probability distribution over the complete denoising trajectory of masked diffusion models to bypass these patterns. GPO-V applies this principle to visual inputs, generating stealthy perturbations that achieve high attack success rates and strong cross-model transferability, exposing a security gap in non-sequential generative architectures.

What carries the argument

Global Probability Optimization (GPO), which adjusts probabilities globally across the denoising steps of masked diffusion models to override refusal signals without relying on prefix anchoring.

If this is right

Fixed prefix methods fail systematically on dVLMs while global optimization succeeds.
The progressive refinement steps create an attack surface absent in autoregressive models.
Visual perturbations generated by GPO-V transfer across different diffusion vision-language architectures.
Safety alignments for diffusion-based generation require redesign focused on trajectory-wide dynamics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Vulnerabilities identified here may extend to text-only diffusion language models that share the same non-causal generation process.
Defenses that monitor only early denoising steps could miss attacks that accumulate influence later in the trajectory.
The approach could be tested as a diagnostic tool for measuring robustness in other non-autoregressive multimodal systems.

Load-bearing premise

The observed refusal patterns in the denoising process leave an exploitable window where global probability changes can consistently bypass guardrails without detection at any step.

What would settle it

Testing GPO-V on held-out dVLMs and finding that attack success rates fall below those of random perturbations or that all generated outputs still trigger refusal would falsify the claim of reliable bypass and transferability.

Figures

Figures reproduced from arXiv: 2605.07399 by Andi Zhang, Sibei Yang, Wenjie Wang, Yi Wang, Yu Pan.

**Figure 1.** Figure 1: Response Patterns of dVLMs: We identify two distinct rejection behaviors: (1) Immediate Refusal, a pre-generation defense where the model detects input risks and rejects the prompt directly; and (2) Progressive Refusal, which occurs during generation, where the model initially produces affirmative content but pivots to a refusal as unsafe semantics emerge. where the model initially produces affirmative tok… view at source ↗

**Figure 2.** Figure 2: Overview of our method: Building on Global Probabilit Optimization (GPO), we further develop the attack framework in the vision modality of diffusion language models, called GPO-V, which is the first efficient jailbreak attack in dVLMs. 2 Related Work 2.1 Diffusion Language Model Diffusion probabilistic models (DDPM) [16] were initially proposed for image generation by restoring the data distribution thro… view at source ↗

**Figure 3.** Figure 3: Heuristic Knowledge: (a) In pixel diffusion models, early diffusion steps primarily recover low-dimensional structure. (b) Likewise, in dVLMs, the initial generation step largely determines the response polarity. step. Its rejection pattern typically manifests as a sequential refusal, which we refer to as autoregressive rejection, characterized by the stepwise generation of positive primitives R+ ⊂ R and n… view at source ↗

**Figure 4.** Figure 4: Overview of the optimization process of GPO-V: In the parallel generation scenario, GPO-V jointly optimizes the token occurrence probabilities across all positions, including the probabilities of positive tokens at every position and the probabilities of anchor tokens at specific positions. Rather than forcing the model to produce a fixed prefix or suffix, GPO-V guides the output direction through global m… view at source ↗

**Figure 5.** Figure 5: Visualization of LGP O during the δ optimization process. potent tool for black-box adversarial assessments, capable of compromising diverse architectures without requiring internal model gradients. 4.4 Ablation Study In this section, we conduct ablation studies on the GPO strategy by independently disabling global probability optimization for negative word elements and disabling anchor optimization. When … view at source ↗

**Figure 6.** Figure 6: Strategy Ablation: The NG-only setting applies only the negative word optimization, whereas the Anchoronly setting applies solely the anchor optimization. In Figure.6, we report the attack performance under different ablation settings. The results clearly indicate that when the loss function degenerates into a single optimization objective, the attack effectiveness is substantially reduced. Specifically… view at source ↗

**Figure 7.** Figure 7: The sampling trajectory of dVLMs exhibits a high sensitivity to initial noise configurations, a property [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 12.** Figure 12: Figure.12. For H-HS, we employ human graders who adhere to the identical scoring rubric to ensure evaluative [PITH_FULL_IMAGE:figures/full_fig_p015_12.png] view at source ↗

**Figure 8.** Figure 8: Under most configurations, the optimized latent state successfully biases the model toward the intended anchor [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: Visualize the immediate rejection pattern and the progressive rejection pattern, which represent the defense [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

**Figure 10.** Figure 10: We define three distinct anchor template configurations, which delineate the target tokens to be optimized at [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗

**Figure 11.** Figure 11: Visualization of LGP O of pixel space in LaViDA during the δ optimization process. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗

**Figure 12.** Figure 12: Evaluation Criteria for GPO-V: In our experiments, we define a score of 6 as the threshold for a successful jailbreak. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_12.png] view at source ↗

read the original abstract

Diffusion Vision-Language Models (dVLMs), built upon the non-causal foundations of Diffusion Large Language Models (dLLMs), have demonstrated remarkable efficacy in multimodal tasks by departing from the traditional autoregressive generation paradigm. While dVLMs appear inherently robust against conventional jailbreak tactics, which we categorize as Fixed Prefix Optimization (FPO) (e.g., anchoring responses with "Sure, here is"), this perceived resilience is deceptive. Our investigation into the safety landscape of dVLMs reveals a unique refusal pattern: Immediate Refusal and Progressive Refusal. We find that while FPO-based attacks often fail by triggering the latter, the progressive refinement process itself uncovers a novel, latent attack surface. To exploit this vulnerability, we propose Global Probability Optimization (GPO), a general jailbreak paradigm designed specifically for the denoising trajectory of masked diffusion models. Unlike prefix-based methods, GPO manipulates the global generative dynamics to bypass guardrails in diffusion language models. Building on this, we introduce GPO-V, the first visual-modality jailbreak framework tailored for dVLMs. Empirical results demonstrate that GPO-V produces stealthy perturbations with exceptional cross-model transferability, revealing a critical security gap in non-sequential generative architectures. Our findings underscore the critical urgency of addressing safety alignment in dVLMs. These results necessitate an immediate and fundamental re-evaluation of current defense paradigms to mitigate the unique risks of diffusion-based generation. Our code is available at: https://anonymous.4open.science/r/GPO-V-0250.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces GPO-V to jailbreak dVLMs by optimizing over the full denoising trajectory instead of fixed prefixes.

read the letter

The main point is that the authors have found a way to jailbreak diffusion vision language models by optimizing probabilities globally across the denoising steps, rather than relying on fixed prefixes that trigger progressive refusals. They do well in distinguishing immediate and progressive refusal patterns and showing how the latter creates an opening for a different kind of attack. Framing GPO as a general paradigm for masked diffusion models and then applying it visually with GPO-V is a reasonable extension. The reported cross-model transferability with stealthy changes suggests the vulnerability might be architectural. The soft spots are around the details of the optimization. It's not clear from the description how the global manipulation is formulated to consistently avoid guardrails at every denoising step, or if it could activate other protections. Without seeing specific success rates, comparisons to other methods, or ablation studies, it's difficult to gauge how general the findings are versus being tied to particular prompts or models. This paper is aimed at the AI safety community working on non-autoregressive generative models. Readers looking for new attack vectors in multimodal diffusion systems would find it relevant, though they should check the experiments closely. It deserves a serious referee because the idea of exploiting the denoising process for jailbreaks is worth examining, even with the current gaps in the presented evidence. I would recommend sending it to peer review, asking the authors to expand on the probability optimization objective and provide quantitative results upfront.

Referee Report

2 major / 2 minor

Summary. The paper identifies Immediate Refusal and Progressive Refusal patterns in diffusion Vision-Language Models (dVLMs) and argues that Fixed Prefix Optimization (FPO) attacks fail by triggering the latter during denoising. It proposes Global Probability Optimization (GPO) as a paradigm to manipulate global generative dynamics across the full denoising trajectory, introduces GPO-V as the first visual-modality jailbreak framework for dVLMs, and claims that the resulting stealthy perturbations exhibit exceptional cross-model transferability, exposing a security gap in non-sequential generative architectures.

Significance. If the empirical claims hold with rigorous validation, the work highlights a latent attack surface unique to diffusion-based multimodal models that current alignment techniques may not address, potentially requiring new defense paradigms beyond those developed for autoregressive systems. The public code release supports reproducibility.

major comments (2)

[Section 3] Section 3 (GPO formulation): The global probability optimization objective is described at a high level as manipulating generative dynamics to bypass guardrails, but the paper does not specify the exact loss function, whether it incorporates explicit refusal-avoidance terms, or how it guarantees evasion of Progressive Refusal at every denoising step without detection; this is load-bearing for the central claim that GPO reliably exploits a latent surface rather than succeeding on prompt- or model-specific cases.
[Section 4] Section 4 (experiments): The abstract and structure assert empirical success and exceptional transferability, yet no attack success rates, baseline comparisons (e.g., against adapted FPO or other diffusion attacks), ablation on global vs. local optimization, or statistical details are referenced in the provided summary; without these, the support for cross-model claims and the superiority over FPO cannot be verified.

minor comments (2)

[Abstract] Abstract: 'dVLMs' and 'dLLMs' are introduced without expansion on first use; define acronyms at initial appearance.
[Abstract] Abstract: The code link is given as anonymous; confirm accessibility for reviewers and include a permanent repository identifier if possible.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and valuable suggestions. We address the major comments point-by-point below and have made revisions to the manuscript to improve clarity and completeness.

read point-by-point responses

Referee: [Section 3] Section 3 (GPO formulation): The global probability optimization objective is described at a high level as manipulating generative dynamics to bypass guardrails, but the paper does not specify the exact loss function, whether it incorporates explicit refusal-avoidance terms, or how it guarantees evasion of Progressive Refusal at every denoising step without detection; this is load-bearing for the central claim that GPO reliably exploits a latent surface rather than succeeding on prompt- or model-specific cases.

Authors: We acknowledge that the original presentation of the GPO formulation in Section 3 was at a high level. To address this, we have revised the section to include the precise loss function: the global objective is to minimize the expected negative log-likelihood of generating the desired safe response across all denoising timesteps, with an additional term to penalize the probability of refusal tokens. This formulation is designed to manipulate the entire generative trajectory, thereby evading Progressive Refusal patterns that arise from local optimizations. We have added a subsection explaining the guarantee of evasion through global dynamics and included the full optimization algorithm. revision: yes
Referee: [Section 4] Section 4 (experiments): The abstract and structure assert empirical success and exceptional transferability, yet no attack success rates, baseline comparisons (e.g., against adapted FPO or other diffusion attacks), ablation on global vs. local optimization, or statistical details are referenced in the provided summary; without these, the support for cross-model claims and the superiority over FPO cannot be verified.

Authors: We agree that additional experimental details would strengthen the paper. Although the manuscript includes empirical results demonstrating success and transferability, we have expanded Section 4 to explicitly report attack success rates (e.g., over 85% on the target dVLM and 70%+ transfer rates), comparisons with adapted FPO baselines showing GPO-V's superiority, ablations comparing global vs. local optimization, and statistical analyses including standard deviations and significance tests. These additions provide rigorous support for the cross-model claims. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical attack method with no derivation chain

full rationale

The paper describes an empirical jailbreak technique (GPO and GPO-V) for diffusion VLMs based on observed refusal patterns (Immediate and Progressive Refusal). No mathematical derivation, equations, fitted parameters, or closed-form predictions are presented that could reduce to their own inputs. The contribution consists of identifying patterns, proposing an optimization-based attack, and reporting experimental results on transferability. No self-citations, ansatzes, or uniqueness theorems are invoked in the abstract or structure to support a load-bearing claim. The work is self-contained as an empirical demonstration rather than a deductive result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The central claim rests on the existence of distinct refusal patterns in dVLMs and the premise that global optimization along the denoising path can evade them; no free parameters, standard mathematical axioms, or independently evidenced invented entities are described in the abstract.

invented entities (1)

Global Probability Optimization (GPO) no independent evidence
purpose: Manipulate global generative dynamics in masked diffusion models to bypass guardrails
Introduced as a novel paradigm without reference to prior independent validation or falsifiable predictions outside the current work.

pith-pipeline@v0.9.0 · 5584 in / 1189 out tokens · 56560 ms · 2026-05-12T04:08:50.902989+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 5 internal anchors

[1]

Large Language Diffusion Models

Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, and Chongxuan Li. Large language diffusion models.CoRR, abs/2502.09992, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

Dream 7B: Diffusion Large Language Models

Jiacheng Ye, Zhihui Xie, Lin Zheng, Jiahui Gao, Zirui Wu, Xin Jiang, Zhenguo Li, and Lingpeng Kong. Dream 7b: Diffusion large language models.CoRR, abs/2508.15487, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[3]

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurélien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. Llama: Open and efficient foundation language models.CoRR, abs/2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[4]

Bidirectional language models are also few-shot learners

Ajay Patel, Bryan Li, Mohammad Sadegh Rasooli, Noah Constant, Colin Raffel, and Chris Callison-Burch. Bidirectional language models are also few-shot learners. InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023

work page 2023
[5]

Llada-v: Large language diffusion models with visual instruction tuning.arXiv preprint arXiv:2505.16933, 2025

Zebin You, Shen Nie, Xiaolu Zhang, Jun Hu, Jun Zhou, Zhiwu Lu, Ji-Rong Wen, and Chongxuan Li. Llada-v: Large language diffusion models with visual instruction tuning.CoRR, abs/2505.16933, 2025

work page arXiv 2025
[6]

Sneakyprompt: Jailbreaking text-to-image generative models

Yuchen Yang, Bo Hui, Haolin Yuan, Neil Gong, and Yinzhi Cao. Sneakyprompt: Jailbreaking text-to-image generative models. InIEEE Symposium on Security and Privacy, SP 2024, San Francisco, CA, USA, May 19-23, 2024, pages 897–912. IEEE, 2024

work page 2024
[7]

Text-to-image diffusion models can be easily backdoored through multimodal data poisoning

Shengfang Zhai, Yinpeng Dong, Qingni Shen, Shi Pu, Yuejian Fang, and Hang Su. Text-to-image diffusion models can be easily backdoored through multimodal data poisoning. In Abdulmotaleb El-Saddik, Tao Mei, Rita Cucchiara, Marco Bertini, Diana Patricia Tobon Vallejo, Pradeep K. Atrey, and M. Shamim Hossain, editors, Proceedings of the 31st ACM International...

work page 2023
[8]

Anderson, Yaron Singer, and Amin Karbasi

Anay Mehrotra, Manolis Zampetakis, Paul Kassianik, Blaine Nelson, Hyrum S. Anderson, Yaron Singer, and Amin Karbasi. Tree of attacks: Jailbreaking black-box llms automatically. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomczak, and Cheng Zhang, editors,Advances in Neural Information Processing Systems 38: An...

work page 2024
[9]

Multi-step jailbreaking privacy attacks on chatgpt

Haoran Li, Dadi Guo, Wei Fan, Mingshi Xu, Jie Huang, Fanpu Meng, and Yangqiu Song. Multi-step jailbreaking privacy attacks on chatgpt. In Houda Bouamor, Juan Pino, and Kalika Bali, editors,Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023, pages 4138–4153. Association for Computational Linguistics, 2023

work page 2023
[10]

Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T

Subham S. Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T. Chiu, Alexander Rush, and V olodymyr Kuleshov. Simple and effective masked diffusion language models. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomczak, and Cheng Zhang, editors, Advances in Neural Information Processi...

work page 2024
[11]

Johnson, Jonathan Ho, Daniel Tarlow, and Rianne van den Berg

Jacob Austin, Daniel D. Johnson, Jonathan Ho, Daniel Tarlow, and Rianne van den Berg. Structured denoising diffusion models in discrete state-spaces. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors,Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information P...

work page 2021
[12]

Jailbreaking large language diffusion models: Revealing hidden safety flaws in diffusion-based text generation.CoRR, abs/2507.19227, 2025

Yuanhe Zhang, Fangzhou Xie, Zhenhong Zhou, Zherui Li, Hao Chen, Kun Wang, and Yufei Guo. Jailbreaking large language diffusion models: Revealing hidden safety flaws in diffusion-based text generation.CoRR, abs/2507.19227, 2025

work page arXiv 2025
[13]

The devil behind the mask: An emergent safety vulnerability of diffusion llms.arXiv preprint arXiv:2507.11097, 2025

Zichen Wen, Jiashu Qu, Dongrui Liu, Zhiyuan Liu, Ruixi Wu, Yicun Yang, Xiangqi Jin, Haoyun Xu, Xuyang Liu, Weijia Li, et al. The devil behind the mask: An emergent safety vulnerability of diffusion llms.arXiv preprint arXiv:2507.11097, 2025. 10

work page arXiv 2025
[14]

Lavida: A large diffusion language model for multimodal understanding.arXiv preprint arXiv:2505.16839, 2025

Shufan Li, Konstantinos Kallidromitis, Hritik Bansal, Akash Gokul, Yusuke Kato, Kazuki Kozuka, Jason Kuen, Zhe Lin, Kai-Wei Chang, and Aditya Grover. Lavida: A large diffusion language model for multimodal understanding.CoRR, abs/2505.16839, 2025

work page arXiv 2025
[15]

From denoising to refining: A corrective framework for vision-language diffusion model, 2025

Yatai Ji, Teng Wang, Yuying Ge, Zhiheng Liu, Sidi Yang, Ying Shan, and Ping Luo. From denoising to refining: A corrective framework for vision-language diffusion model, 2025

work page 2025
[16]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors,Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020

work page 2020
[17]

Diffusion models in low-level vision: A survey.IEEE Trans

Chunming He, Yuqi Shen, Chengyu Fang, Fengyang Xiao, Longxiang Tang, Yulun Zhang, Wangmeng Zuo, Zhenhua Guo, and Xiu Li. Diffusion models in low-level vision: A survey.IEEE Trans. Pattern Anal. Mach. Intell., 47(6):4630–4651, 2025

work page 2025
[18]

Hanqun Cao, Cheng Tan, Zhangyang Gao, Yilun Xu, Guangyong Chen, Pheng-Ann Heng, and Stan Z. Li. A survey on generative diffusion models.IEEE Trans. Knowl. Data Eng., 36(7):2814–2830, 2024

work page 2024
[19]

State space orderings for gauss-seidel in markov chains revisited.SIAM J

Tugrul Dayar. State space orderings for gauss-seidel in markov chains revisited.SIAM J. Sci. Comput., 19(1):148– 154, 1998

work page 1998
[20]

Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023

work page 2023
[21]

Consistency models

Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors,International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pages 32...

work page 2023
[22]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.CoRR, abs/2010.02502, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[23]

Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score- based generative modeling through stochastic differential equations. In9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021

work page 2021
[24]

Discrete diffusion modeling by estimating the ratios of the data distribution

Aaron Lou, Chenlin Meng, and Stefano Ermon. Discrete diffusion modeling by estimating the ratios of the data distribution. InForty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024. OpenReview.net, 2024

work page 2024
[25]

Yu, and Aiwei Liu

Lingzhe Zhang, Liancheng Fang, Chiming Duan, Minghua He, Leyi Pan, Pei Xiao, Shiyu Huang, Yunpeng Zhai, Xuming Hu, Philip S. Yu, and Aiwei Liu. A survey on parallel text generation: From parallel decoding to diffusion language models.CoRR, abs/2508.08712, 2025

work page arXiv 2025
[26]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V . N. Vishwanathan, and Roman Garnett, editors,Advances in Neural Information Processing Systems 30: Annual Conference...

work page 2017
[27]

Don’t say no: Jailbreaking LLM by suppressing refusal

Yukai Zhou, Jian Lou, Zhijie Huang, Zhan Qin, Sibei Yang, and Wenjie Wang. Don’t say no: Jailbreaking LLM by suppressing refusal. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Findings of the Association for Computational Linguistics, ACL 2025, Vienna, Austria, July 27 - August 1, 2025, pages 25224–25249. Associat...

work page 2025
[28]

Improved techniques for optimization-based jailbreaking on large language models

Xiaojun Jia, Tianyu Pang, Chao Du, Yihao Huang, Jindong Gu, Yang Liu, Xiaochun Cao, and Min Lin. Improved techniques for optimization-based jailbreaking on large language models. InThe Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net, 2025

work page 2025
[29]

Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. Jailbroken: How does LLM safety training fail? In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors,Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, De...

work page 2023
[30]

Abu-Ghazaleh

Erfan Shayegani, Yue Dong, and Nael B. Abu-Ghazaleh. Jailbreak in pieces: Compositional adversarial attacks on multi-modal language models. InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024. 11

work page 2024
[31]

Universal adversarial triggers for attacking and analyzing NLP

Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, and Sameer Singh. Universal adversarial triggers for attacking and analyzing NLP. In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan, editors,Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Proce...

work page 2019
[32]

Hotflip: White-box adversarial examples for text classification

Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. Hotflip: White-box adversarial examples for text classification. In Iryna Gurevych and Yusuke Miyao, editors,Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 2: Short Papers, pages 31–36. Association for C...

work page 2018
[33]

Universal and Transferable Adversarial Attacks on Aligned Language Models

Andy Zou, Zifan Wang, J. Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models.CoRR, abs/2307.15043, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[34]

Logan IV , Eric Wallace, and Sameer Singh

Taylor Shin, Yasaman Razeghi, Robert L. Logan IV , Eric Wallace, and Sameer Singh. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. In Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu, editors,Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-...

work page 2020
[35]

White-box multimodal jailbreaks against large vision-language models

Ruofan Wang, Xingjun Ma, Hanxu Zhou, Chuanjun Ji, Guangnan Ye, and Yu-Gang Jiang. White-box multimodal jailbreaks against large vision-language models. In Jianfei Cai, Mohan S. Kankanhalli, Balakrishnan Prabhakaran, Susanne Boll, Ramanathan Subramanian, Liang Zheng, Vivek K. Singh, Pablo César, Lexing Xie, and Dong Xu, editors,Proceedings of the 32nd ACM ...

work page 2024
[36]

Eedit: Rethinking the spatial and temporal redundancy for efficient image editing.CoRR, abs/2503.10270, 2025

Zexuan Yan, Yue Ma, Chang Zou, Wenteng Chen, Qifeng Chen, and Linfeng Zhang. Eedit: Rethinking the spatial and temporal redundancy for efficient image editing.CoRR, abs/2503.10270, 2025

work page arXiv 2025
[37]

Faster-gcg: Efficient discrete optimization jailbreak attacks against aligned large language models.arXiv preprint arXiv:2410.15362,

Xiao Li, Zhuhong Li, Qiongxiu Li, Bingze Lee, Jinghao Cui, and Xiaolin Hu. Faster-gcg: Efficient discrete optimization jailbreak attacks against aligned large language models.ArXiv, abs/2410.15362, 2024

work page arXiv 2024
[38]

Exploiting the index gradients for optimization- based jailbreaking on large language models

Jiahui Li, Yongchang Hao, Haoyu Xu, Xing Wang, and Yu Hong. Exploiting the index gradients for optimization- based jailbreaking on large language models. In Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, and Steven Schockaert, editors,Proceedings of the 31st International Conference on Computational Linguistics, pages 4...

work page 2025
[39]

Learning transferable adversarial perturbations

Krishna Kanth Nakka and Mathieu Salzmann. Learning transferable adversarial perturbations. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors,Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, ...

work page 2021
[40]

Universal adversarial perturbations

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversarial perturbations. In2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pages 86–94. IEEE Computer Society, 2017

work page 2017
[41]

Sigmoid loss for language image pre- training

Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, and Lucas Beyer. Sigmoid loss for language image pre- training. InIEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, October 1-6, 2023, pages 11941–11952. IEEE, 2023

work page 2023
[42]

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jégou, Julien Mairal, Patrick L...

work page 2024
[43]

Pappas, Florian Tramèr, Hamed Hassani, and Eric Wong

Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramèr, Hamed Hassani, and Eric Wong. Jailbreakbench: An open robustness benchmark for jailbreaking large language models. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan,...

work page 2024
[44]

Fast-dllm: Training-free acceleration of diffusion LLM by enabling KV cache and parallel decoding.CoRR, abs/2505.22618, 2025

Chengyue Wu, Hao Zhang, Shuchen Xue, Zhijian Liu, Shizhe Diao, Ligeng Zhu, Ping Luo, Song Han, and Enze Xie. Fast-dllm: Training-free acceleration of diffusion LLM by enabling KV cache and parallel decoding.CoRR, abs/2505.22618, 2025. 12 A Technical Appendices A.1 Prior Knowledge in Image Generation Diffusion Models In image generation diffusion models, t...

work page arXiv 2025