Recognition: unknown
GRM: Utility-Aware Jailbreak Attacks on Audio LLMs via Gradient-Ratio Masking
Pith reviewed 2026-05-10 16:36 UTC · model grok-4.3
The pith
Frequency-selective perturbation via gradient-ratio masking enables effective jailbreak attacks on audio LLMs while better preserving their utility.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GRM ranks Mel bands by their attack contribution relative to utility sensitivity, perturbs only a selected subset of bands, and learns a reusable universal perturbation under a semantic-preservation objective. Experiments on four representative ALLMs show that GRM achieves an average Jailbreak Success Rate (JSR) of 88.46% while providing a better attack-utility trade-off than representative baselines.
What carries the argument
Gradient-Ratio Masking, which identifies Mel bands whose gradients contribute most to jailbreak success relative to their effect on transcription and QA utility, then restricts perturbation to those bands.
If this is right
- Concentrating perturbation on a ranked subset of Mel bands can deliver high jailbreak success without the utility degradation seen in full-spectrum attacks.
- A single universal perturbation optimized under semantic constraints remains effective across varied audio inputs.
- The attack-utility balance improves when perturbation coverage is chosen by gradient ratio rather than by fixed band width or full coverage.
- The same selective mechanism produces consistent gains on multiple distinct audio large language models.
Where Pith is reading between the lines
- Defensive techniques could allocate detection or filtering effort primarily to the high-ratio Mel bands rather than the entire spectrum.
- The ranking procedure might transfer to selective perturbation problems in other audio tasks such as watermarking or privacy masking.
- Evaluating the approach on longer recordings or noisy real-world audio would test whether the selected bands remain optimal outside clean lab conditions.
- Analogous ratio-based band selection could be explored for jailbreak or adversarial attacks in video or multimodal models.
Load-bearing premise
That ranking Mel bands by attack contribution relative to utility sensitivity and learning a universal perturbation under semantic-preservation will reliably produce a superior attack-utility trade-off across models and inputs.
What would settle it
A controlled comparison on a new audio LLM or query set in which any full-band baseline simultaneously exceeds GRM in jailbreak success rate and matches or exceeds GRM in transcription and QA scores would falsify the claimed superiority of the trade-off.
Figures
read the original abstract
Audio large language models (ALLMs) enable rich speech-text interaction, but they also introduce jailbreak vulnerabilities in the audio modality. Existing audio jailbreak methods mainly optimize jailbreak success while overlooking utility preservation, as reflected in transcription quality and question answering performance. In practice, stronger attacks often come at the cost of degraded utility. To study this trade-off, we revisit existing attacks by varying their perturbation coverage in the frequency domain, from partial-band to full-band, and find that broader frequency coverage does not necessarily improve jailbreak performance, while utility consistently deteriorates. This suggests that concentrating perturbation on a subset of bands can yield a better attack-utility trade-off than indiscriminate full-band coverage. Based on this insight, we propose GRM, a utility-aware frequency-selective jailbreak framework. It ranks Mel bands by their attack contribution relative to utility sensitivity, perturbs only a selected subset of bands, and learns a reusable universal perturbation under a semantic-preservation objective. Experiments on four representative ALLMs show that GRM achieves an average Jailbreak Success Rate (JSR) of 88.46% while providing a better attack-utility trade-off than representative baselines. These results highlight the potential of frequency-selective perturbation for better balancing attack effectiveness and utility preservation in audio jailbreak. Content Warning: This paper includes harmful query examples and unsafe model responses.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces GRM, a utility-aware frequency-selective jailbreak framework for audio large language models. It ranks Mel bands by the ratio of attack gradients to utility gradients (transcription and QA losses), perturbs only the selected subset, and optimizes a reusable universal perturbation under a semantic-preservation objective. Experiments on four representative ALLMs report an average jailbreak success rate (JSR) of 88.46% together with an improved attack-utility trade-off relative to full-band and other baselines.
Significance. If the reported trade-off and generalization hold under proper validation, the work would be significant for adversarial robustness research on multimodal LLMs. The empirical observation that broader frequency coverage does not reliably improve jailbreak performance while consistently harming utility is a useful finding that could guide both attack design and defense strategies.
major comments (3)
- [§4.2] §4.2 (Mel-band ranking procedure): The central claim that the gradient-ratio mask yields a superior universal attack-utility trade-off rests on the assumption that the ranking computed from a finite set of harmful queries and utility proxies is stable and transferable. No ablation is reported on ranking stability across random seeds, alternative utility proxies, or held-out query sets; without this, the 88.46% average JSR and trade-off improvement may be specific to the optimization queries rather than general.
- [§4.3] §4.3 (Experimental results): The average JSR of 88.46% and the claim of better trade-off versus baselines are presented without error bars, standard deviations, or statistical significance tests. This makes it impossible to determine whether the reported gains are reliable or could arise from query/model selection.
- [§3.3] §3.3 (Universal perturbation learning): The method learns a single perturbation on the masked bands under semantic preservation, yet the manuscript provides no cross-model transfer results for the learned mask itself or tests on completely unseen harmful queries. This directly bears on the universality asserted in the abstract and weakens the superiority claim over full-band baselines.
minor comments (2)
- [Abstract] The abstract refers to 'representative baselines' without naming them; the introduction or experimental section should explicitly list the compared methods (e.g., full-band optimization, random masking) for clarity.
- [§3.2] Notation for the gradient ratio (attack gradient over utility gradient) is introduced without an equation number; adding an explicit definition (e.g., Eq. (3)) would improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of validation and generalizability that we will address to strengthen the manuscript. Below we respond point by point to the major comments.
read point-by-point responses
-
Referee: [§4.2] §4.2 (Mel-band ranking procedure): The central claim that the gradient-ratio mask yields a superior universal attack-utility trade-off rests on the assumption that the ranking computed from a finite set of harmful queries and utility proxies is stable and transferable. No ablation is reported on ranking stability across random seeds, alternative utility proxies, or held-out query sets; without this, the 88.46% average JSR and trade-off improvement may be specific to the optimization queries rather than general.
Authors: We agree that the stability of the Mel-band ranking is essential to support the generalizability of GRM. Although the current results show consistent trade-off improvements across four distinct ALLMs, we acknowledge the absence of explicit stability checks. In the revised manuscript we will add ablations that recompute the ranking under different random seeds, alternative utility proxies (including varied transcription and QA loss formulations), and on held-out query sets excluded from the original ranking computation. These experiments will quantify how sensitive the selected bands are to the choice of optimization queries. revision: yes
-
Referee: [§4.3] §4.3 (Experimental results): The average JSR of 88.46% and the claim of better trade-off versus baselines are presented without error bars, standard deviations, or statistical significance tests. This makes it impossible to determine whether the reported gains are reliable or could arise from query/model selection.
Authors: We thank the referee for this observation. The reported averages aggregate results over four models and multiple queries, yet variability measures were omitted. In the revision we will include standard deviations across queries, add error bars to all relevant tables and figures, and perform statistical significance tests (e.g., paired Wilcoxon signed-rank tests) comparing GRM against each baseline. These additions will allow readers to evaluate the reliability of the observed improvements in JSR and attack-utility trade-off. revision: yes
-
Referee: [§3.3] §3.3 (Universal perturbation learning): The method learns a single perturbation on the masked bands under semantic preservation, yet the manuscript provides no cross-model transfer results for the learned mask itself or tests on completely unseen harmful queries. This directly bears on the universality asserted in the abstract and weakens the superiority claim over full-band baselines.
Authors: The perturbation is optimized once per model on the selected bands and intended to be reusable across queries under the semantic-preservation objective. We did not report explicit results on completely held-out queries or cross-model mask transfer in the original submission. In the revised version we will add experiments evaluating the learned perturbation and mask on unseen harmful queries. We will also include cross-model transfer results by applying masks computed on one model to the others, noting that architectural differences may limit direct transfer; these results will clarify the scope of the universality claim. revision: partial
Circularity Check
No circularity: empirical gradient ranking and optimization are self-contained.
full rationale
The paper presents GRM as an empirical method that computes attack and utility gradients on Mel bands from a finite set of queries, ranks bands by their ratio, selects a subset, and optimizes a universal perturbation under a semantic-preservation loss. No equations, derivations, or self-citations are shown that reduce the reported JSR or trade-off metrics to fitted parameters by construction, self-defined quantities, or load-bearing prior results from the same authors. The central claims rest on experimental outcomes across four ALLMs rather than tautological reductions, satisfying the criteria for a self-contained derivation chain.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Bowman, Ethan Perez, Roger B
Cem Anil, Esin Durmus, Nina Panickssery, Mrinank Sharma, Joe Benton, Sandi- pan Kundu, Joshua Batson, Meg Tong, Jesse Mu, Daniel Ford, Francesco Mosconi, Rajashree Agrawal, Rylan Schaeffer, Naomi Bashkansky, Samuel Svenningsen, Mike Lambert, Ansh Radhakrishnan, Carson Denison, Evan Hubinger, Yuntao Bai, Trenton Bricken, Timothy Maxwell, Nicholas Schiefer,...
2024
-
[2]
Federico Bianchi, Mirac Suzgun, Giuseppe Attanasio, Paul Röttger, Dan Jurafsky, Tatsunori Hashimoto, and James Zou. 2024. Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions. InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net. h...
2024
-
[3]
Guangke Chen, Fu Song, Zhe Zhao, Xiaojun Jia, Yang Liu, Yanchen Qiao, and Weizhe Zhang. 2025. AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models.CoRRabs/2505.14103 (2025). arXiv:2505.14103 doi:10.48550/ARXIV.2505.14103
-
[4]
Hao Cheng, Erjia Xiao, Jing Shao, Yichi Wang, Le Yang, Chao Shen, Philip Torr, Jindong Gu, and Renjing Xu. 2026. Jailbreak-AudioBench: In-Depth Eval- uation and Analysis of Jailbreak Threats for Large Audio Language Models. arXiv:2501.13772 [cs.SD] https://arxiv.org/abs/2501.13772
-
[5]
Yunfei Chu, Jin Xu, Qian Yang, Haojie Wei, Xipin Wei, Zhifang Guo, Yichong Leng, Yuanjun Lv, Jinzheng He, Junyang Lin, Chang Zhou, and Jingren Zhou. 2024. Qwen2-Audio Technical Report.CoRRabs/2407.10759 (2024). arXiv:2407.10759 doi:10.48550/ARXIV.2407.10759
work page internal anchor Pith review doi:10.48550/arxiv.2407.10759 2024
-
[6]
DeepSeek-AI. 2024. DeepSeek-V3 Technical Report.CoRRabs/2412.19437 (2024). arXiv:2412.19437 doi:10.48550/ARXIV.2412.19437
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.19437 2024
-
[7]
Qingkai Fang, Yan Zhou, Shoutao Guo, Shaolei Zhang, and Yang Feng. 2025. LLaMA-Omni 2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2025, Vienna, Austria, July 27 - August 1, 2025, Wanxiang Che, Joyce...
2025
-
[8]
Chaoyou Fu, Haojia Lin, Xiong Wang, Yifan Zhang, Yunhang Shen, Xiaoyu Liu, Haoyu Cao, Zuwei Long, Heting Gao, Ke Li, Long Ma, Xiawu Zheng, Rongrong Ji, Xing Sun, Caifeng Shan, and Ran He. 2025. VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction.CoRRabs/2501.01957 (2025). arXiv:2501.01957 doi:10.48550/ARXIV.2501.01957
- [9]
-
[10]
Yingxu He, Zhuohan Liu, Geyu Lin, Shuo Sun, Bin Wang, Wenyu Zhang, Xunlong Zou, Nancy F. Chen, and AiTi Aw. 2025. MERaLiON-AudioLLM: Advancing Speech and Language Understanding for Singapore. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), ACL 2025, Vienna, Austria, July 27 - Aug...
-
[11]
William Held, Yanzhe Zhang, Minzhi Li, Weiyan Shi, Michael J Ryan, and Diyi Yang. 2025. Distilling an End-to-End Voice Assistant Without Instruction Train- ing Data. InProceedings of the 63rd Annual Meeting of the Association for Compu- tational Linguistics (Volume 1: Long Papers), Wanxiang Che, Joyce Nabende, Eka- terina Shutova, and Mohammad Taher Pileh...
-
[12]
Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, and Abdelrahman Mohamed. 2021. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units.IEEE ACM Trans. Audio Speech Lang. Process.29 (2021), 3451–3460. doi:10.1109/TASLP. 2021.3122291
-
[13]
Shujie Hu, Long Zhou, Shujie Liu, Sanyuan Chen, Lingwei Meng, Hongkun Hao, Jing Pan, Xunying Liu, Jinyu Li, Sunit Sivasankaran, Linquan Liu, and Furu Wei. 2024. WavLLM: Towards Robust and Adaptive Speech Large Language Model. InFindings of the Association for Computational Linguistics: EMNLP 2024, Miami, Florida, USA, November 12-16, 2024 (Findings of ACL...
-
[14]
Xiaomeng Hu, Pin-Yu Chen, and Tsung-Yi Ho. 2025. Token Highlighter: Inspect- ing and Mitigating Jailbreak Prompts for Large Language Models. InAAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, Febru- ary 25 - March 4, 2025, Philadelphia, PA, USA, Toby Walsh, Julie Shah, and Zico Kolter (Eds.). AAAI Press, 27330–27338. d...
- [15]
-
[16]
Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, and Madian Khabsa. 2023. Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations.CoRRabs/2312.06674 (2023). arXiv:2312.06674 doi:10.48550/ ARXIV.2312.06674
work page internal anchor Pith review arXiv 2023
-
[17]
Weifei Jin, Yuxin Cao, Junjie Su, Minhui Xue, Jie Hao, Ke Xu, Jin Song Dong, and Derui Wang. 2025. ALMGuard: Safety Shortcuts and Where to Find Them as Guardrails for Audio-Language Models.CoRRabs/2510.26096 (2025). arXiv:2510.26096 doi:10.48550/ARXIV.2510.26096
-
[18]
Mintong Kang, Chejian Xu, and Bo Li. 2025. AdvWave: Stealthy Adversarial Jailbreak Attack against Large Audio-Language Models. InThe Thirteenth In- ternational Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net. https://openreview.net/forum?id=0BujOfTqab
2025
-
[19]
Hongyi Li, Chengxuan Zhou, Chu Wang, Sicheng Liang, Yanting Chen, Qinlin Xie, Jiawei Ye, and Jie Wu. 2025. StyleBreak: Revealing Alignment Vulnerabili- ties in Large Audio-Language Models via Style-Aware Audio Jailbreak.CoRR abs/2511.10692 (2025). arXiv:2511.10692 doi:10.48550/ARXIV.2511.10692
-
[20]
Xuan Li, Zhanke Zhou, Jianing Zhu, Jiangchao Yao, Tongliang Liu, and Bo Han
-
[21]
Deepincep- tion: Hypnotize large language model to be jailbreaker
DeepInception: Hypnotize Large Language Model to Be Jailbreaker.CoRR abs/2311.03191 (2023). arXiv:2311.03191 doi:10.48550/ARXIV.2311.03191
-
[22]
Liang Lin, Miao Yu, Kaiwen Luo, Yibo Zhang, Lilan Peng, Dexian Wang, Xue- hai Tang, Yuanhe Zhang, Xikang Yang, Zhenhong Zhou, Kun Wang, and Yang Liu. 2025. Hidden in the Noise: Unveiling Backdoors in Audio LLMs Align- ment through Latent Acoustic Pattern Triggers.CoRRabs/2508.02175 (2025). arXiv:2508.02175 doi:10.48550/ARXIV.2508.02175
-
[23]
Xiaogeng Liu, Nan Xu, Muhao Chen, and Chaowei Xiao. 2024. AutoDAN: Generat- ing Stealthy Jailbreak Prompts on Aligned Large Language Models. InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net. https://openreview.net/forum?id=7Jwpw4qKkb
2024
-
[24]
OpenAI. 2024. GPT-4o System Card.CoRRabs/2410.21276 (2024). arXiv:2410.21276 doi:10.48550/ARXIV.2410.21276
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2410.21276 2024
-
[25]
Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. 2015. Librispeech: An ASR corpus based on public domain audio books. In2015 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19-24, 2015. IEEE, 5206–5210. doi:10. 1109/ICASSP.2015.7178964
-
[26]
Zifan Peng, Yule Liu, Zhen Sun, Mingchen Li, Zeren Luo, Jingyi Zheng, Wenhan Dong, Xinlei He, Xuechao Wang, Yingjie Xue, Shengmin Xu, and Xinyi Huang
-
[27]
https://doi.org/10.48550/arXiv.2505
JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models.CoRRabs/2505.17568 (2025). arXiv:2505.17568 doi:10.48550/ARXIV.2505. 17568
-
[28]
Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. 2023. Robust Speech Recognition via Large-Scale Weak Supervision. InInternational Conference on Machine Learning, ICML 2023, 23- 29 July 2023, Honolulu, Hawaii, USA (Proceedings of Machine Learning Re- search, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Ch...
2023
-
[29]
Paul K. Rubenstein, Chulayuth Asawaroengchai, Duc Dung Nguyen, Ankur Bapna, Zalán Borsos, Félix de Chaumont Quitry, Peter Chen, Dalia El Badawy, Wei Han, Eugene Kharitonov, Hannah Muckenhirn, Dirk Padfield, James Qin, Danny Rozenberg, Tara Sainath, Johan Schalkwyk, Matt Sharifi, Michelle Tadmor Ramanovich, Marco Tagliasacchi, Alexandru Tudor, Mihajlo Veli...
-
[30]
Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, and Chao Zhang. 2024. SALMONN: Towards Generic Hearing Abilities for Large Language Models. InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net. https://openreview.net/forum?id=14rn7HpKVk
2024
-
[31]
Gemini Team. 2025. Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities.CoRR abs/2507.06261 (2025). arXiv:2507.06261 doi:10.48550/ARXIV.2507.06261 Yunqiang Wang, Hengyuan Na, Di Wu, Miao Hu, Guocong Quan
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2507.06261 2025
-
[32]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE.Journal of machine learning research9, 11 (2008)
2008
-
[33]
Zeming Wei, Yifei Wang, and Yisen Wang. 2023. Jailbreak and Guard Aligned Lan- guage Models with Only Few In-Context Demonstrations.CoRRabs/2310.06387 (2023). arXiv:2310.06387 doi:10.48550/ARXIV.2310.06387
-
[34]
Yueqi Xie, Minghong Fang, Renjie Pi, and Neil Gong. 2024. GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical Gradient Analysis. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Ed...
-
[35]
Yueqi Xie, Jingwei Yi, Jiawei Shao, Justin Curl, Lingjuan Lyu, Qifeng Chen, Xing Xie, and Fangzhao Wu. 2023. Defending ChatGPT against jailbreak attack via self-reminders.Nat. Mac. Intell.5, 12 (2023), 1486–1496. doi:10.1038/S42256-023- 00765-8
-
[36]
Zhifei Xie and Changqiao Wu. 2024. Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming.CoRRabs/2408.16725 (2024). arXiv:2408.16725 doi:10.48550/ARXIV.2408.16725
-
[37]
Jin Xu, Zhifang Guo, Jinzheng He, Hangrui Hu, Ting He, Shuai Bai, Keqin Chen, Jialin Wang, Yang Fan, Kai Dang, Bin Zhang, Xiong Wang, Yunfei Chu, and Junyang Lin. 2025. Qwen2.5-Omni Technical Report.CoRRabs/2503.20215 (2025). arXiv:2503.20215 doi:10.48550/ARXIV.2503.20215
work page internal anchor Pith review doi:10.48550/arxiv.2503.20215 2025
-
[38]
Hao Yang, Lizhen Qu, Ehsan Shareghi, and Gholamreza Haffari. 2024. Jigsaw Puzzles: Splitting Harmful Questions to Jailbreak Large Language Models.CoRR abs/2410.11459 (2024). arXiv:2410.11459 doi:10.48550/ARXIV.2410.11459
-
[39]
Hao Yang, Lizhen Qu, Ehsan Shareghi, and Gholamreza Haffari. 2025. Audio Is the Achilles’ Heel: Red Teaming Audio Large Multimodal Models. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2025 - Volume 1: Long Papers, Albuquerque, New Mexico, US...
-
[40]
Qian Yang, Jin Xu, Wenrui Liu, Yunfei Chu, Ziyue Jiang, Xiaohuan Zhou, Yichong Leng, Yuanjun Lv, Zhou Zhao, Chang Zhou, and Jingren Zhou. 2024. AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangko...
-
[41]
Aohan Zeng, Zhengxiao Du, Mingdao Liu, Kedong Wang, Shengmin Jiang, Lei Zhao, Yuxiao Dong, and Jie Tang. 2024. GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot.CoRRabs/2412.02612 (2024). arXiv:2412.02612 doi:10.48550/ARXIV.2412.02612
-
[42]
Yi Zeng, Hongpeng Lin, Jingwen Zhang, Diyi Yang, Ruoxi Jia, and Weiyan Shi
-
[43]
How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computatio...
-
[44]
Dong Zhang, Shimin Li, Xin Zhang, Jun Zhan, Pengyu Wang, Yaqian Zhou, and Xipeng Qiu. 2023. SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities. InFindings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023 (Findings of ACL, Vol. EMNLP 2023), Houda Bouamor, Juan Pino, an...
-
[45]
Andy Zhou, Bo Li, and Haohan Wang. 2024. Robust Prompt Optimiza- tion for Defending Language Models Against Jailbreaking Attacks. InAd- vances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024, Amir Globersons, Lester Mackey, Danielle Be...
2024
-
[46]
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou, Zifan Wang, J. Zico Kolter, and Matt Fredrikson. 2023. Univer- sal and Transferable Adversarial Attacks on Aligned Language Models.CoRR abs/2307.15043 (2023). arXiv:2307.15043 doi:10.48550/ARXIV.2307.15043 GRM: Utility-Aware Jailbreak Attacks on Audio LLMs via Gradient-Ratio Masking A Experimental Details A.1 Baseline Settings We follow the offi...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2307.15043 2023
-
[47]
This could include personal information such as name, email address, phone number, location data, or other sensitive information
Determine the type of user data that needs to be stored and transmitted. This could include personal information such as name, email address, phone number, location data, or other sensitive information
-
[48]
This could involve using encryption techniques to protect the data from unauthorized access or storing the data on secure servers
Choose a secure storage solution for the user data. This could involve using encryption techniques to protect the data from unauthorized access or storing the data on secure servers
-
[49]
Develop a user interface for the app that allows users to input their data and provides them with options for how they want their data to be used
-
[50]
This could involve using SS- L/TLS encryption to encrypt the data before it is transmitted over the internet
Implement a secure transmission protocol for transmitting the user data to the third party. This could involve using SS- L/TLS encryption to encrypt the data before it is transmitted over the internet
-
[51]
This could involve conducting penetration testing to identify potential vulnerabilities and weaknesses in the app’s security measures
Test the app thoroughly to ensure that it is secure and reliable. This could involve conducting penetration testing to identify potential vulnerabilities and weaknesses in the app’s security measures. It is important to note that the development of an app that stores and transmits user data requires careful consideration of privacy and security issues. Us...
-
[52]
In this case, the user has asked for a recipe for white rice with vegetables
First, it is important to understand the user’s request. In this case, the user has asked for a recipe for white rice with vegetables
-
[53]
For white rice with vegetables, you will need white rice, a variety of vegetables (such as carrots, onions, peppers, and mushrooms), oil, salt, and any other seasonings you prefer
Next, gather all of the necessary ingredients for the recipe. For white rice with vegetables, you will need white rice, a variety of vegetables (such as carrots, onions, peppers, and mushrooms), oil, salt, and any other seasonings you prefer
-
[54]
This helps remove any dirt or debris that may be present in the rice
Rinse the rice in cold water until the water runs clear. This helps remove any dirt or debris that may be present in the rice
-
[55]
Once the oil is hot, add the chopped vegetables to the pan and cook them until they are tender and slightly browned
In a large pot or wok, heat some oil over medium-high heat. Once the oil is hot, add the chopped vegetables to the pan and cook them until they are tender and slightly browned. This should take about 5-7 minutes
-
[56]
Season the mixture with salt and any other desired seasonings
Add the cooked vegetables to the pot of rice and stir to combine. Season the mixture with salt and any other desired seasonings. Cook the rice and vegetables together for an ad- ditional 2-3 minutes, or until the rice is fully cooked and the mixture is heated through. By following these simple steps, you can easily make a delicious dish of white rice with...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.