Recognition: no theorem link
GPO-V: Jailbreak Diffusion Vision Language Model by Global Probability Optimization
Pith reviewed 2026-05-12 04:08 UTC · model grok-4.3
The pith
Diffusion vision-language models can be jailbroken by optimizing global probabilities across the full denoising trajectory instead of fixed prefixes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Diffusion vision-language models exhibit Immediate Refusal and Progressive Refusal patterns that defeat fixed prefix optimization. Global Probability Optimization manipulates the probability distribution over the complete denoising trajectory of masked diffusion models to bypass these patterns. GPO-V applies this principle to visual inputs, generating stealthy perturbations that achieve high attack success rates and strong cross-model transferability, exposing a security gap in non-sequential generative architectures.
What carries the argument
Global Probability Optimization (GPO), which adjusts probabilities globally across the denoising steps of masked diffusion models to override refusal signals without relying on prefix anchoring.
If this is right
- Fixed prefix methods fail systematically on dVLMs while global optimization succeeds.
- The progressive refinement steps create an attack surface absent in autoregressive models.
- Visual perturbations generated by GPO-V transfer across different diffusion vision-language architectures.
- Safety alignments for diffusion-based generation require redesign focused on trajectory-wide dynamics.
Where Pith is reading between the lines
- Vulnerabilities identified here may extend to text-only diffusion language models that share the same non-causal generation process.
- Defenses that monitor only early denoising steps could miss attacks that accumulate influence later in the trajectory.
- The approach could be tested as a diagnostic tool for measuring robustness in other non-autoregressive multimodal systems.
Load-bearing premise
The observed refusal patterns in the denoising process leave an exploitable window where global probability changes can consistently bypass guardrails without detection at any step.
What would settle it
Testing GPO-V on held-out dVLMs and finding that attack success rates fall below those of random perturbations or that all generated outputs still trigger refusal would falsify the claim of reliable bypass and transferability.
Figures
read the original abstract
Diffusion Vision-Language Models (dVLMs), built upon the non-causal foundations of Diffusion Large Language Models (dLLMs), have demonstrated remarkable efficacy in multimodal tasks by departing from the traditional autoregressive generation paradigm. While dVLMs appear inherently robust against conventional jailbreak tactics, which we categorize as Fixed Prefix Optimization (FPO) (e.g., anchoring responses with "Sure, here is"), this perceived resilience is deceptive. Our investigation into the safety landscape of dVLMs reveals a unique refusal pattern: Immediate Refusal and Progressive Refusal. We find that while FPO-based attacks often fail by triggering the latter, the progressive refinement process itself uncovers a novel, latent attack surface. To exploit this vulnerability, we propose Global Probability Optimization (GPO), a general jailbreak paradigm designed specifically for the denoising trajectory of masked diffusion models. Unlike prefix-based methods, GPO manipulates the global generative dynamics to bypass guardrails in diffusion language models. Building on this, we introduce GPO-V, the first visual-modality jailbreak framework tailored for dVLMs. Empirical results demonstrate that GPO-V produces stealthy perturbations with exceptional cross-model transferability, revealing a critical security gap in non-sequential generative architectures. Our findings underscore the critical urgency of addressing safety alignment in dVLMs. These results necessitate an immediate and fundamental re-evaluation of current defense paradigms to mitigate the unique risks of diffusion-based generation. Our code is available at: https://anonymous.4open.science/r/GPO-V-0250.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper identifies Immediate Refusal and Progressive Refusal patterns in diffusion Vision-Language Models (dVLMs) and argues that Fixed Prefix Optimization (FPO) attacks fail by triggering the latter during denoising. It proposes Global Probability Optimization (GPO) as a paradigm to manipulate global generative dynamics across the full denoising trajectory, introduces GPO-V as the first visual-modality jailbreak framework for dVLMs, and claims that the resulting stealthy perturbations exhibit exceptional cross-model transferability, exposing a security gap in non-sequential generative architectures.
Significance. If the empirical claims hold with rigorous validation, the work highlights a latent attack surface unique to diffusion-based multimodal models that current alignment techniques may not address, potentially requiring new defense paradigms beyond those developed for autoregressive systems. The public code release supports reproducibility.
major comments (2)
- [Section 3] Section 3 (GPO formulation): The global probability optimization objective is described at a high level as manipulating generative dynamics to bypass guardrails, but the paper does not specify the exact loss function, whether it incorporates explicit refusal-avoidance terms, or how it guarantees evasion of Progressive Refusal at every denoising step without detection; this is load-bearing for the central claim that GPO reliably exploits a latent surface rather than succeeding on prompt- or model-specific cases.
- [Section 4] Section 4 (experiments): The abstract and structure assert empirical success and exceptional transferability, yet no attack success rates, baseline comparisons (e.g., against adapted FPO or other diffusion attacks), ablation on global vs. local optimization, or statistical details are referenced in the provided summary; without these, the support for cross-model claims and the superiority over FPO cannot be verified.
minor comments (2)
- [Abstract] Abstract: 'dVLMs' and 'dLLMs' are introduced without expansion on first use; define acronyms at initial appearance.
- [Abstract] Abstract: The code link is given as anonymous; confirm accessibility for reviewers and include a permanent repository identifier if possible.
Simulated Author's Rebuttal
We thank the referee for the thorough review and valuable suggestions. We address the major comments point-by-point below and have made revisions to the manuscript to improve clarity and completeness.
read point-by-point responses
-
Referee: [Section 3] Section 3 (GPO formulation): The global probability optimization objective is described at a high level as manipulating generative dynamics to bypass guardrails, but the paper does not specify the exact loss function, whether it incorporates explicit refusal-avoidance terms, or how it guarantees evasion of Progressive Refusal at every denoising step without detection; this is load-bearing for the central claim that GPO reliably exploits a latent surface rather than succeeding on prompt- or model-specific cases.
Authors: We acknowledge that the original presentation of the GPO formulation in Section 3 was at a high level. To address this, we have revised the section to include the precise loss function: the global objective is to minimize the expected negative log-likelihood of generating the desired safe response across all denoising timesteps, with an additional term to penalize the probability of refusal tokens. This formulation is designed to manipulate the entire generative trajectory, thereby evading Progressive Refusal patterns that arise from local optimizations. We have added a subsection explaining the guarantee of evasion through global dynamics and included the full optimization algorithm. revision: yes
-
Referee: [Section 4] Section 4 (experiments): The abstract and structure assert empirical success and exceptional transferability, yet no attack success rates, baseline comparisons (e.g., against adapted FPO or other diffusion attacks), ablation on global vs. local optimization, or statistical details are referenced in the provided summary; without these, the support for cross-model claims and the superiority over FPO cannot be verified.
Authors: We agree that additional experimental details would strengthen the paper. Although the manuscript includes empirical results demonstrating success and transferability, we have expanded Section 4 to explicitly report attack success rates (e.g., over 85% on the target dVLM and 70%+ transfer rates), comparisons with adapted FPO baselines showing GPO-V's superiority, ablations comparing global vs. local optimization, and statistical analyses including standard deviations and significance tests. These additions provide rigorous support for the cross-model claims. revision: yes
Circularity Check
No circularity: empirical attack method with no derivation chain
full rationale
The paper describes an empirical jailbreak technique (GPO and GPO-V) for diffusion VLMs based on observed refusal patterns (Immediate and Progressive Refusal). No mathematical derivation, equations, fitted parameters, or closed-form predictions are presented that could reduce to their own inputs. The contribution consists of identifying patterns, proposing an optimization-based attack, and reporting experimental results on transferability. No self-citations, ansatzes, or uniqueness theorems are invoked in the abstract or structure to support a load-bearing claim. The work is self-contained as an empirical demonstration rather than a deductive result.
Axiom & Free-Parameter Ledger
invented entities (1)
-
Global Probability Optimization (GPO)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Large Language Diffusion Models
Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, and Chongxuan Li. Large language diffusion models.CoRR, abs/2502.09992, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[2]
Dream 7B: Diffusion Large Language Models
Jiacheng Ye, Zhihui Xie, Lin Zheng, Jiahui Gao, Zirui Wu, Xin Jiang, Zhenguo Li, and Lingpeng Kong. Dream 7b: Diffusion large language models.CoRR, abs/2508.15487, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[3]
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurélien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. Llama: Open and efficient foundation language models.CoRR, abs/2302.13971, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[4]
Bidirectional language models are also few-shot learners
Ajay Patel, Bryan Li, Mohammad Sadegh Rasooli, Noah Constant, Colin Raffel, and Chris Callison-Burch. Bidirectional language models are also few-shot learners. InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023
work page 2023
-
[5]
Zebin You, Shen Nie, Xiaolu Zhang, Jun Hu, Jun Zhou, Zhiwu Lu, Ji-Rong Wen, and Chongxuan Li. Llada-v: Large language diffusion models with visual instruction tuning.CoRR, abs/2505.16933, 2025
-
[6]
Sneakyprompt: Jailbreaking text-to-image generative models
Yuchen Yang, Bo Hui, Haolin Yuan, Neil Gong, and Yinzhi Cao. Sneakyprompt: Jailbreaking text-to-image generative models. InIEEE Symposium on Security and Privacy, SP 2024, San Francisco, CA, USA, May 19-23, 2024, pages 897–912. IEEE, 2024
work page 2024
-
[7]
Text-to-image diffusion models can be easily backdoored through multimodal data poisoning
Shengfang Zhai, Yinpeng Dong, Qingni Shen, Shi Pu, Yuejian Fang, and Hang Su. Text-to-image diffusion models can be easily backdoored through multimodal data poisoning. In Abdulmotaleb El-Saddik, Tao Mei, Rita Cucchiara, Marco Bertini, Diana Patricia Tobon Vallejo, Pradeep K. Atrey, and M. Shamim Hossain, editors, Proceedings of the 31st ACM International...
work page 2023
-
[8]
Anderson, Yaron Singer, and Amin Karbasi
Anay Mehrotra, Manolis Zampetakis, Paul Kassianik, Blaine Nelson, Hyrum S. Anderson, Yaron Singer, and Amin Karbasi. Tree of attacks: Jailbreaking black-box llms automatically. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomczak, and Cheng Zhang, editors,Advances in Neural Information Processing Systems 38: An...
work page 2024
-
[9]
Multi-step jailbreaking privacy attacks on chatgpt
Haoran Li, Dadi Guo, Wei Fan, Mingshi Xu, Jie Huang, Fanpu Meng, and Yangqiu Song. Multi-step jailbreaking privacy attacks on chatgpt. In Houda Bouamor, Juan Pino, and Kalika Bali, editors,Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023, pages 4138–4153. Association for Computational Linguistics, 2023
work page 2023
-
[10]
Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T
Subham S. Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T. Chiu, Alexander Rush, and V olodymyr Kuleshov. Simple and effective masked diffusion language models. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomczak, and Cheng Zhang, editors, Advances in Neural Information Processi...
work page 2024
-
[11]
Johnson, Jonathan Ho, Daniel Tarlow, and Rianne van den Berg
Jacob Austin, Daniel D. Johnson, Jonathan Ho, Daniel Tarlow, and Rianne van den Berg. Structured denoising diffusion models in discrete state-spaces. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors,Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information P...
work page 2021
-
[12]
Yuanhe Zhang, Fangzhou Xie, Zhenhong Zhou, Zherui Li, Hao Chen, Kun Wang, and Yufei Guo. Jailbreaking large language diffusion models: Revealing hidden safety flaws in diffusion-based text generation.CoRR, abs/2507.19227, 2025
-
[13]
Zichen Wen, Jiashu Qu, Dongrui Liu, Zhiyuan Liu, Ruixi Wu, Yicun Yang, Xiangqi Jin, Haoyun Xu, Xuyang Liu, Weijia Li, et al. The devil behind the mask: An emergent safety vulnerability of diffusion llms.arXiv preprint arXiv:2507.11097, 2025. 10
-
[14]
Shufan Li, Konstantinos Kallidromitis, Hritik Bansal, Akash Gokul, Yusuke Kato, Kazuki Kozuka, Jason Kuen, Zhe Lin, Kai-Wei Chang, and Aditya Grover. Lavida: A large diffusion language model for multimodal understanding.CoRR, abs/2505.16839, 2025
-
[15]
From denoising to refining: A corrective framework for vision-language diffusion model, 2025
Yatai Ji, Teng Wang, Yuying Ge, Zhiheng Liu, Sidi Yang, Ying Shan, and Ping Luo. From denoising to refining: A corrective framework for vision-language diffusion model, 2025
work page 2025
-
[16]
Denoising diffusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors,Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020
work page 2020
-
[17]
Diffusion models in low-level vision: A survey.IEEE Trans
Chunming He, Yuqi Shen, Chengyu Fang, Fengyang Xiao, Longxiang Tang, Yulun Zhang, Wangmeng Zuo, Zhenhua Guo, and Xiu Li. Diffusion models in low-level vision: A survey.IEEE Trans. Pattern Anal. Mach. Intell., 47(6):4630–4651, 2025
work page 2025
-
[18]
Hanqun Cao, Cheng Tan, Zhangyang Gao, Yilun Xu, Guangyong Chen, Pheng-Ann Heng, and Stan Z. Li. A survey on generative diffusion models.IEEE Trans. Knowl. Data Eng., 36(7):2814–2830, 2024
work page 2024
-
[19]
State space orderings for gauss-seidel in markov chains revisited.SIAM J
Tugrul Dayar. State space orderings for gauss-seidel in markov chains revisited.SIAM J. Sci. Comput., 19(1):148– 154, 1998
work page 1998
-
[20]
Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023
work page 2023
-
[21]
Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors,International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pages 32...
work page 2023
-
[22]
Denoising Diffusion Implicit Models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.CoRR, abs/2010.02502, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[23]
Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole
Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score- based generative modeling through stochastic differential equations. In9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021
work page 2021
-
[24]
Discrete diffusion modeling by estimating the ratios of the data distribution
Aaron Lou, Chenlin Meng, and Stefano Ermon. Discrete diffusion modeling by estimating the ratios of the data distribution. InForty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024. OpenReview.net, 2024
work page 2024
-
[25]
Lingzhe Zhang, Liancheng Fang, Chiming Duan, Minghua He, Leyi Pan, Pei Xiao, Shiyu Huang, Yunpeng Zhai, Xuming Hu, Philip S. Yu, and Aiwei Liu. A survey on parallel text generation: From parallel decoding to diffusion language models.CoRR, abs/2508.08712, 2025
-
[26]
Gomez, Lukasz Kaiser, and Illia Polosukhin
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V . N. Vishwanathan, and Roman Garnett, editors,Advances in Neural Information Processing Systems 30: Annual Conference...
work page 2017
-
[27]
Don’t say no: Jailbreaking LLM by suppressing refusal
Yukai Zhou, Jian Lou, Zhijie Huang, Zhan Qin, Sibei Yang, and Wenjie Wang. Don’t say no: Jailbreaking LLM by suppressing refusal. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Findings of the Association for Computational Linguistics, ACL 2025, Vienna, Austria, July 27 - August 1, 2025, pages 25224–25249. Associat...
work page 2025
-
[28]
Improved techniques for optimization-based jailbreaking on large language models
Xiaojun Jia, Tianyu Pang, Chao Du, Yihao Huang, Jindong Gu, Yang Liu, Xiaochun Cao, and Min Lin. Improved techniques for optimization-based jailbreaking on large language models. InThe Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net, 2025
work page 2025
-
[29]
Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. Jailbroken: How does LLM safety training fail? In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors,Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, De...
work page 2023
-
[30]
Erfan Shayegani, Yue Dong, and Nael B. Abu-Ghazaleh. Jailbreak in pieces: Compositional adversarial attacks on multi-modal language models. InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024. 11
work page 2024
-
[31]
Universal adversarial triggers for attacking and analyzing NLP
Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, and Sameer Singh. Universal adversarial triggers for attacking and analyzing NLP. In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan, editors,Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Proce...
work page 2019
-
[32]
Hotflip: White-box adversarial examples for text classification
Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. Hotflip: White-box adversarial examples for text classification. In Iryna Gurevych and Yusuke Miyao, editors,Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 2: Short Papers, pages 31–36. Association for C...
work page 2018
-
[33]
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou, Zifan Wang, J. Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models.CoRR, abs/2307.15043, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[34]
Logan IV , Eric Wallace, and Sameer Singh
Taylor Shin, Yasaman Razeghi, Robert L. Logan IV , Eric Wallace, and Sameer Singh. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. In Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu, editors,Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-...
work page 2020
-
[35]
White-box multimodal jailbreaks against large vision-language models
Ruofan Wang, Xingjun Ma, Hanxu Zhou, Chuanjun Ji, Guangnan Ye, and Yu-Gang Jiang. White-box multimodal jailbreaks against large vision-language models. In Jianfei Cai, Mohan S. Kankanhalli, Balakrishnan Prabhakaran, Susanne Boll, Ramanathan Subramanian, Liang Zheng, Vivek K. Singh, Pablo César, Lexing Xie, and Dong Xu, editors,Proceedings of the 32nd ACM ...
work page 2024
-
[36]
Zexuan Yan, Yue Ma, Chang Zou, Wenteng Chen, Qifeng Chen, and Linfeng Zhang. Eedit: Rethinking the spatial and temporal redundancy for efficient image editing.CoRR, abs/2503.10270, 2025
-
[37]
Xiao Li, Zhuhong Li, Qiongxiu Li, Bingze Lee, Jinghao Cui, and Xiaolin Hu. Faster-gcg: Efficient discrete optimization jailbreak attacks against aligned large language models.ArXiv, abs/2410.15362, 2024
-
[38]
Exploiting the index gradients for optimization- based jailbreaking on large language models
Jiahui Li, Yongchang Hao, Haoyu Xu, Xing Wang, and Yu Hong. Exploiting the index gradients for optimization- based jailbreaking on large language models. In Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, and Steven Schockaert, editors,Proceedings of the 31st International Conference on Computational Linguistics, pages 4...
work page 2025
-
[39]
Learning transferable adversarial perturbations
Krishna Kanth Nakka and Mathieu Salzmann. Learning transferable adversarial perturbations. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors,Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, ...
work page 2021
-
[40]
Universal adversarial perturbations
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversarial perturbations. In2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pages 86–94. IEEE Computer Society, 2017
work page 2017
-
[41]
Sigmoid loss for language image pre- training
Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, and Lucas Beyer. Sigmoid loss for language image pre- training. InIEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, October 1-6, 2023, pages 11941–11952. IEEE, 2023
work page 2023
-
[42]
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jégou, Julien Mairal, Patrick L...
work page 2024
-
[43]
Pappas, Florian Tramèr, Hamed Hassani, and Eric Wong
Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramèr, Hamed Hassani, and Eric Wong. Jailbreakbench: An open robustness benchmark for jailbreaking large language models. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan,...
work page 2024
-
[44]
Chengyue Wu, Hao Zhang, Shuchen Xue, Zhijian Liu, Shizhe Diao, Ligeng Zhu, Ping Luo, Song Han, and Enze Xie. Fast-dllm: Training-free acceleration of diffusion LLM by enabling KV cache and parallel decoding.CoRR, abs/2505.22618, 2025. 12 A Technical Appendices A.1 Prior Knowledge in Image Generation Diffusion Models In image generation diffusion models, t...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.