pith. machine review for the scientific record. sign in

arxiv: 2604.09222 · v1 · submitted 2026-04-10 · 💻 cs.SD · cs.AI

Recognition: unknown

GRM: Utility-Aware Jailbreak Attacks on Audio LLMs via Gradient-Ratio Masking

Di Wu, Guocong Quan, Hengyuan Na, Miao Hu, Yunqiang Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:36 UTC · model grok-4.3

classification 💻 cs.SD cs.AI
keywords jailbreak attacksaudio LLMsfrequency-selective perturbationMel bandsutility preservationgradient ratio maskinguniversal perturbationALLMs
0
0 comments X

The pith

Frequency-selective perturbation via gradient-ratio masking enables effective jailbreak attacks on audio LLMs while better preserving their utility.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that jailbreak attacks on audio large language models succeed without needing to alter the full frequency spectrum. Observations show that expanding perturbation coverage across more bands does not reliably increase jailbreak success but consistently reduces transcription quality and question-answering performance. GRM therefore ranks individual Mel bands by the ratio of their attack contribution to their utility sensitivity, restricts changes to the most favorable subset, and optimizes a single reusable perturbation while enforcing semantic preservation. Across four representative audio LLMs this produces an average jailbreak success rate of 88.46 percent together with a measurably superior attack-utility balance relative to full-band and other baseline methods. The result indicates that targeted frequency selection can make audio jailbreaks more practical in settings where the input must remain usable for normal model tasks.

Core claim

GRM ranks Mel bands by their attack contribution relative to utility sensitivity, perturbs only a selected subset of bands, and learns a reusable universal perturbation under a semantic-preservation objective. Experiments on four representative ALLMs show that GRM achieves an average Jailbreak Success Rate (JSR) of 88.46% while providing a better attack-utility trade-off than representative baselines.

What carries the argument

Gradient-Ratio Masking, which identifies Mel bands whose gradients contribute most to jailbreak success relative to their effect on transcription and QA utility, then restricts perturbation to those bands.

If this is right

  • Concentrating perturbation on a ranked subset of Mel bands can deliver high jailbreak success without the utility degradation seen in full-spectrum attacks.
  • A single universal perturbation optimized under semantic constraints remains effective across varied audio inputs.
  • The attack-utility balance improves when perturbation coverage is chosen by gradient ratio rather than by fixed band width or full coverage.
  • The same selective mechanism produces consistent gains on multiple distinct audio large language models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Defensive techniques could allocate detection or filtering effort primarily to the high-ratio Mel bands rather than the entire spectrum.
  • The ranking procedure might transfer to selective perturbation problems in other audio tasks such as watermarking or privacy masking.
  • Evaluating the approach on longer recordings or noisy real-world audio would test whether the selected bands remain optimal outside clean lab conditions.
  • Analogous ratio-based band selection could be explored for jailbreak or adversarial attacks in video or multimodal models.

Load-bearing premise

That ranking Mel bands by attack contribution relative to utility sensitivity and learning a universal perturbation under semantic-preservation will reliably produce a superior attack-utility trade-off across models and inputs.

What would settle it

A controlled comparison on a new audio LLM or query set in which any full-band baseline simultaneously exceeds GRM in jailbreak success rate and matches or exceeds GRM in transcription and QA scores would falsify the claimed superiority of the trade-off.

Figures

Figures reproduced from arXiv: 2604.09222 by Di Wu, Guocong Quan, Hengyuan Na, Miao Hu, Yunqiang Wang.

Figure 1
Figure 1. Figure 1: Attack-utility trade-off under different frequency [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of GRM. GRM identifies key frequency bands by gradient-ratio scoring and optimizes perturbations under a [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Band ranking scores across four target ALLMs. Each row corresponds to one model, and each column denotes a [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Impact of the number of selected key bands ( [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Impact of the semantic loss weight (𝜆) on JSR, WER, and RQS. We use 𝜆 = 5 as the default setting. role of 𝜆 in controlling semantic drift during optimization. When 𝜆 is small, the optimization is driven more strongly by the jail￾break objective, which improves attack aggressiveness but allows larger deviation from the original input. Increasing 𝜆 makes the perturbation more conservative and helps preserve … view at source ↗
Figure 7
Figure 7. Figure 7: Cross-model transferability of GRM perturbations. [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Effect of the number of selected key bands ( [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: t-SNE visualization of internal representations in [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: t-SNE visualization of internal representations in [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Effect of the Whisper model on band scoring. [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗
read the original abstract

Audio large language models (ALLMs) enable rich speech-text interaction, but they also introduce jailbreak vulnerabilities in the audio modality. Existing audio jailbreak methods mainly optimize jailbreak success while overlooking utility preservation, as reflected in transcription quality and question answering performance. In practice, stronger attacks often come at the cost of degraded utility. To study this trade-off, we revisit existing attacks by varying their perturbation coverage in the frequency domain, from partial-band to full-band, and find that broader frequency coverage does not necessarily improve jailbreak performance, while utility consistently deteriorates. This suggests that concentrating perturbation on a subset of bands can yield a better attack-utility trade-off than indiscriminate full-band coverage. Based on this insight, we propose GRM, a utility-aware frequency-selective jailbreak framework. It ranks Mel bands by their attack contribution relative to utility sensitivity, perturbs only a selected subset of bands, and learns a reusable universal perturbation under a semantic-preservation objective. Experiments on four representative ALLMs show that GRM achieves an average Jailbreak Success Rate (JSR) of 88.46% while providing a better attack-utility trade-off than representative baselines. These results highlight the potential of frequency-selective perturbation for better balancing attack effectiveness and utility preservation in audio jailbreak. Content Warning: This paper includes harmful query examples and unsafe model responses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces GRM, a utility-aware frequency-selective jailbreak framework for audio large language models. It ranks Mel bands by the ratio of attack gradients to utility gradients (transcription and QA losses), perturbs only the selected subset, and optimizes a reusable universal perturbation under a semantic-preservation objective. Experiments on four representative ALLMs report an average jailbreak success rate (JSR) of 88.46% together with an improved attack-utility trade-off relative to full-band and other baselines.

Significance. If the reported trade-off and generalization hold under proper validation, the work would be significant for adversarial robustness research on multimodal LLMs. The empirical observation that broader frequency coverage does not reliably improve jailbreak performance while consistently harming utility is a useful finding that could guide both attack design and defense strategies.

major comments (3)
  1. [§4.2] §4.2 (Mel-band ranking procedure): The central claim that the gradient-ratio mask yields a superior universal attack-utility trade-off rests on the assumption that the ranking computed from a finite set of harmful queries and utility proxies is stable and transferable. No ablation is reported on ranking stability across random seeds, alternative utility proxies, or held-out query sets; without this, the 88.46% average JSR and trade-off improvement may be specific to the optimization queries rather than general.
  2. [§4.3] §4.3 (Experimental results): The average JSR of 88.46% and the claim of better trade-off versus baselines are presented without error bars, standard deviations, or statistical significance tests. This makes it impossible to determine whether the reported gains are reliable or could arise from query/model selection.
  3. [§3.3] §3.3 (Universal perturbation learning): The method learns a single perturbation on the masked bands under semantic preservation, yet the manuscript provides no cross-model transfer results for the learned mask itself or tests on completely unseen harmful queries. This directly bears on the universality asserted in the abstract and weakens the superiority claim over full-band baselines.
minor comments (2)
  1. [Abstract] The abstract refers to 'representative baselines' without naming them; the introduction or experimental section should explicitly list the compared methods (e.g., full-band optimization, random masking) for clarity.
  2. [§3.2] Notation for the gradient ratio (attack gradient over utility gradient) is introduced without an equation number; adding an explicit definition (e.g., Eq. (3)) would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of validation and generalizability that we will address to strengthen the manuscript. Below we respond point by point to the major comments.

read point-by-point responses
  1. Referee: [§4.2] §4.2 (Mel-band ranking procedure): The central claim that the gradient-ratio mask yields a superior universal attack-utility trade-off rests on the assumption that the ranking computed from a finite set of harmful queries and utility proxies is stable and transferable. No ablation is reported on ranking stability across random seeds, alternative utility proxies, or held-out query sets; without this, the 88.46% average JSR and trade-off improvement may be specific to the optimization queries rather than general.

    Authors: We agree that the stability of the Mel-band ranking is essential to support the generalizability of GRM. Although the current results show consistent trade-off improvements across four distinct ALLMs, we acknowledge the absence of explicit stability checks. In the revised manuscript we will add ablations that recompute the ranking under different random seeds, alternative utility proxies (including varied transcription and QA loss formulations), and on held-out query sets excluded from the original ranking computation. These experiments will quantify how sensitive the selected bands are to the choice of optimization queries. revision: yes

  2. Referee: [§4.3] §4.3 (Experimental results): The average JSR of 88.46% and the claim of better trade-off versus baselines are presented without error bars, standard deviations, or statistical significance tests. This makes it impossible to determine whether the reported gains are reliable or could arise from query/model selection.

    Authors: We thank the referee for this observation. The reported averages aggregate results over four models and multiple queries, yet variability measures were omitted. In the revision we will include standard deviations across queries, add error bars to all relevant tables and figures, and perform statistical significance tests (e.g., paired Wilcoxon signed-rank tests) comparing GRM against each baseline. These additions will allow readers to evaluate the reliability of the observed improvements in JSR and attack-utility trade-off. revision: yes

  3. Referee: [§3.3] §3.3 (Universal perturbation learning): The method learns a single perturbation on the masked bands under semantic preservation, yet the manuscript provides no cross-model transfer results for the learned mask itself or tests on completely unseen harmful queries. This directly bears on the universality asserted in the abstract and weakens the superiority claim over full-band baselines.

    Authors: The perturbation is optimized once per model on the selected bands and intended to be reusable across queries under the semantic-preservation objective. We did not report explicit results on completely held-out queries or cross-model mask transfer in the original submission. In the revised version we will add experiments evaluating the learned perturbation and mask on unseen harmful queries. We will also include cross-model transfer results by applying masks computed on one model to the others, noting that architectural differences may limit direct transfer; these results will clarify the scope of the universality claim. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical gradient ranking and optimization are self-contained.

full rationale

The paper presents GRM as an empirical method that computes attack and utility gradients on Mel bands from a finite set of queries, ranks bands by their ratio, selects a subset, and optimizes a universal perturbation under a semantic-preservation loss. No equations, derivations, or self-citations are shown that reduce the reported JSR or trade-off metrics to fitted parameters by construction, self-defined quantities, or load-bearing prior results from the same authors. The central claims rest on experimental outcomes across four ALLMs rather than tautological reductions, satisfying the criteria for a self-contained derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the ranking and selection process likely involves unstated hyperparameters for band choice and perturbation strength.

pith-pipeline@v0.9.0 · 5554 in / 1046 out tokens · 52336 ms · 2026-05-10T16:36:59.971777+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

56 extracted references · 34 canonical work pages · 7 internal anchors

  1. [1]

    Bowman, Ethan Perez, Roger B

    Cem Anil, Esin Durmus, Nina Panickssery, Mrinank Sharma, Joe Benton, Sandi- pan Kundu, Joshua Batson, Meg Tong, Jesse Mu, Daniel Ford, Francesco Mosconi, Rajashree Agrawal, Rylan Schaeffer, Naomi Bashkansky, Samuel Svenningsen, Mike Lambert, Ansh Radhakrishnan, Carson Denison, Evan Hubinger, Yuntao Bai, Trenton Bricken, Timothy Maxwell, Nicholas Schiefer,...

  2. [2]

    Federico Bianchi, Mirac Suzgun, Giuseppe Attanasio, Paul Röttger, Dan Jurafsky, Tatsunori Hashimoto, and James Zou. 2024. Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions. InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net. h...

  3. [3]

    Guangke Chen, Fu Song, Zhe Zhao, Xiaojun Jia, Yang Liu, Yanchen Qiao, and Weizhe Zhang. 2025. AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models.CoRRabs/2505.14103 (2025). arXiv:2505.14103 doi:10.48550/ARXIV.2505.14103

  4. [4]

    Hao Cheng, Erjia Xiao, Jing Shao, Yichi Wang, Le Yang, Chao Shen, Philip Torr, Jindong Gu, and Renjing Xu. 2026. Jailbreak-AudioBench: In-Depth Eval- uation and Analysis of Jailbreak Threats for Large Audio Language Models. arXiv:2501.13772 [cs.SD] https://arxiv.org/abs/2501.13772

  5. [5]

    Yunfei Chu, Jin Xu, Qian Yang, Haojie Wei, Xipin Wei, Zhifang Guo, Yichong Leng, Yuanjun Lv, Jinzheng He, Junyang Lin, Chang Zhou, and Jingren Zhou. 2024. Qwen2-Audio Technical Report.CoRRabs/2407.10759 (2024). arXiv:2407.10759 doi:10.48550/ARXIV.2407.10759

  6. [6]

    DeepSeek-AI. 2024. DeepSeek-V3 Technical Report.CoRRabs/2412.19437 (2024). arXiv:2412.19437 doi:10.48550/ARXIV.2412.19437

  7. [7]

    Qingkai Fang, Yan Zhou, Shoutao Guo, Shaolei Zhang, and Yang Feng. 2025. LLaMA-Omni 2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2025, Vienna, Austria, July 27 - August 1, 2025, Wanxiang Che, Joyce...

  8. [8]

    Chaoyou Fu, Haojia Lin, Xiong Wang, Yifan Zhang, Yunhang Shen, Xiaoyu Liu, Haoyu Cao, Zuwei Long, Heting Gao, Ke Li, Long Ma, Xiawu Zheng, Rongrong Ji, Xing Sun, Caifeng Shan, and Ran He. 2025. VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction.CoRRabs/2501.01957 (2025). arXiv:2501.01957 doi:10.48550/ARXIV.2501.01957

  9. [9]

    Simon Geisler, Tom Wollschläger, M. H. I. Abdalla, Johannes Gasteiger, and Stephan Günnemann. 2024. Attacking Large Language Models with Projected Gradient Descent.CoRRabs/2402.09154 (2024). arXiv:2402.09154 doi:10.48550/ ARXIV.2402.09154

  10. [10]

    Chen, and AiTi Aw

    Yingxu He, Zhuohan Liu, Geyu Lin, Shuo Sun, Bin Wang, Wenyu Zhang, Xunlong Zou, Nancy F. Chen, and AiTi Aw. 2025. MERaLiON-AudioLLM: Advancing Speech and Language Understanding for Singapore. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), ACL 2025, Vienna, Austria, July 27 - Aug...

  11. [11]

    William Held, Yanzhe Zhang, Minzhi Li, Weiyan Shi, Michael J Ryan, and Diyi Yang. 2025. Distilling an End-to-End Voice Assistant Without Instruction Train- ing Data. InProceedings of the 63rd Annual Meeting of the Association for Compu- tational Linguistics (Volume 1: Long Papers), Wanxiang Che, Joyce Nabende, Eka- terina Shutova, and Mohammad Taher Pileh...

  12. [12]

    Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, and Abdelrahman Mohamed. 2021. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units.IEEE ACM Trans. Audio Speech Lang. Process.29 (2021), 3451–3460. doi:10.1109/TASLP. 2021.3122291

  13. [13]

    Shujie Hu, Long Zhou, Shujie Liu, Sanyuan Chen, Lingwei Meng, Hongkun Hao, Jing Pan, Xunying Liu, Jinyu Li, Sunit Sivasankaran, Linquan Liu, and Furu Wei. 2024. WavLLM: Towards Robust and Adaptive Speech Large Language Model. InFindings of the Association for Computational Linguistics: EMNLP 2024, Miami, Florida, USA, November 12-16, 2024 (Findings of ACL...

  14. [14]

    Xiaomeng Hu, Pin-Yu Chen, and Tsung-Yi Ho. 2025. Token Highlighter: Inspect- ing and Mitigating Jailbreak Prompts for Large Language Models. InAAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, Febru- ary 25 - March 4, 2025, Philadelphia, PA, USA, Toby Walsh, Julie Shah, and Zico Kolter (Eds.). AAAI Press, 27330–27338. d...

  15. [15]

    John Hughes, Sara Price, Aengus Lynch, Rylan Schaeffer, Fazl Barez, Sanmi Koyejo, Henry Sleight, Erik Jones, Ethan Perez, and Mrinank Sharma. 2024. Best- of-N Jailbreaking.CoRRabs/2412.03556 (2024). arXiv:2412.03556 doi:10.48550/ ARXIV.2412.03556

  16. [16]

    Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, and Madian Khabsa. 2023. Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations.CoRRabs/2312.06674 (2023). arXiv:2312.06674 doi:10.48550/ ARXIV.2312.06674

  17. [17]

    Weifei Jin, Yuxin Cao, Junjie Su, Minhui Xue, Jie Hao, Ke Xu, Jin Song Dong, and Derui Wang. 2025. ALMGuard: Safety Shortcuts and Where to Find Them as Guardrails for Audio-Language Models.CoRRabs/2510.26096 (2025). arXiv:2510.26096 doi:10.48550/ARXIV.2510.26096

  18. [18]

    Mintong Kang, Chejian Xu, and Bo Li. 2025. AdvWave: Stealthy Adversarial Jailbreak Attack against Large Audio-Language Models. InThe Thirteenth In- ternational Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net. https://openreview.net/forum?id=0BujOfTqab

  19. [19]

    Hongyi Li, Chengxuan Zhou, Chu Wang, Sicheng Liang, Yanting Chen, Qinlin Xie, Jiawei Ye, and Jie Wu. 2025. StyleBreak: Revealing Alignment Vulnerabili- ties in Large Audio-Language Models via Style-Aware Audio Jailbreak.CoRR abs/2511.10692 (2025). arXiv:2511.10692 doi:10.48550/ARXIV.2511.10692

  20. [20]

    Xuan Li, Zhanke Zhou, Jianing Zhu, Jiangchao Yao, Tongliang Liu, and Bo Han

  21. [21]

    Deepincep- tion: Hypnotize large language model to be jailbreaker

    DeepInception: Hypnotize Large Language Model to Be Jailbreaker.CoRR abs/2311.03191 (2023). arXiv:2311.03191 doi:10.48550/ARXIV.2311.03191

  22. [22]

    Liang Lin, Miao Yu, Kaiwen Luo, Yibo Zhang, Lilan Peng, Dexian Wang, Xue- hai Tang, Yuanhe Zhang, Xikang Yang, Zhenhong Zhou, Kun Wang, and Yang Liu. 2025. Hidden in the Noise: Unveiling Backdoors in Audio LLMs Align- ment through Latent Acoustic Pattern Triggers.CoRRabs/2508.02175 (2025). arXiv:2508.02175 doi:10.48550/ARXIV.2508.02175

  23. [23]

    Xiaogeng Liu, Nan Xu, Muhao Chen, and Chaowei Xiao. 2024. AutoDAN: Generat- ing Stealthy Jailbreak Prompts on Aligned Large Language Models. InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net. https://openreview.net/forum?id=7Jwpw4qKkb

  24. [24]

    OpenAI. 2024. GPT-4o System Card.CoRRabs/2410.21276 (2024). arXiv:2410.21276 doi:10.48550/ARXIV.2410.21276

  25. [25]

    Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. 2015. Librispeech: An ASR corpus based on public domain audio books. In2015 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19-24, 2015. IEEE, 5206–5210. doi:10. 1109/ICASSP.2015.7178964

  26. [26]

    Zifan Peng, Yule Liu, Zhen Sun, Mingchen Li, Zeren Luo, Jingyi Zheng, Wenhan Dong, Xinlei He, Xuechao Wang, Yingjie Xue, Shengmin Xu, and Xinyi Huang

  27. [27]

    https://doi.org/10.48550/arXiv.2505

    JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models.CoRRabs/2505.17568 (2025). arXiv:2505.17568 doi:10.48550/ARXIV.2505. 17568

  28. [28]

    Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. 2023. Robust Speech Recognition via Large-Scale Weak Supervision. InInternational Conference on Machine Learning, ICML 2023, 23- 29 July 2023, Honolulu, Hawaii, USA (Proceedings of Machine Learning Re- search, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Ch...

  29. [29]

    Paul K. Rubenstein, Chulayuth Asawaroengchai, Duc Dung Nguyen, Ankur Bapna, Zalán Borsos, Félix de Chaumont Quitry, Peter Chen, Dalia El Badawy, Wei Han, Eugene Kharitonov, Hannah Muckenhirn, Dirk Padfield, James Qin, Danny Rozenberg, Tara Sainath, Johan Schalkwyk, Matt Sharifi, Michelle Tadmor Ramanovich, Marco Tagliasacchi, Alexandru Tudor, Mihajlo Veli...

  30. [30]

    Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, and Chao Zhang. 2024. SALMONN: Towards Generic Hearing Abilities for Large Language Models. InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net. https://openreview.net/forum?id=14rn7HpKVk

  31. [31]

    Gemini Team. 2025. Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities.CoRR abs/2507.06261 (2025). arXiv:2507.06261 doi:10.48550/ARXIV.2507.06261 Yunqiang Wang, Hengyuan Na, Di Wu, Miao Hu, Guocong Quan

  32. [32]

    Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE.Journal of machine learning research9, 11 (2008)

  33. [33]

    Zeming Wei, Yifei Wang, and Yisen Wang. 2023. Jailbreak and Guard Aligned Lan- guage Models with Only Few In-Context Demonstrations.CoRRabs/2310.06387 (2023). arXiv:2310.06387 doi:10.48550/ARXIV.2310.06387

  34. [34]

    Yueqi Xie, Minghong Fang, Renjie Pi, and Neil Gong. 2024. GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical Gradient Analysis. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Ed...

  35. [35]

    Yueqi Xie, Jingwei Yi, Jiawei Shao, Justin Curl, Lingjuan Lyu, Qifeng Chen, Xing Xie, and Fangzhao Wu. 2023. Defending ChatGPT against jailbreak attack via self-reminders.Nat. Mac. Intell.5, 12 (2023), 1486–1496. doi:10.1038/S42256-023- 00765-8

  36. [36]

    Zhifei Xie and Changqiao Wu. 2024. Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming.CoRRabs/2408.16725 (2024). arXiv:2408.16725 doi:10.48550/ARXIV.2408.16725

  37. [37]

    Jin Xu, Zhifang Guo, Jinzheng He, Hangrui Hu, Ting He, Shuai Bai, Keqin Chen, Jialin Wang, Yang Fan, Kai Dang, Bin Zhang, Xiong Wang, Yunfei Chu, and Junyang Lin. 2025. Qwen2.5-Omni Technical Report.CoRRabs/2503.20215 (2025). arXiv:2503.20215 doi:10.48550/ARXIV.2503.20215

  38. [38]

    Hao Yang, Lizhen Qu, Ehsan Shareghi, and Gholamreza Haffari. 2024. Jigsaw Puzzles: Splitting Harmful Questions to Jailbreak Large Language Models.CoRR abs/2410.11459 (2024). arXiv:2410.11459 doi:10.48550/ARXIV.2410.11459

  39. [39]

    Hao Yang, Lizhen Qu, Ehsan Shareghi, and Gholamreza Haffari. 2025. Audio Is the Achilles’ Heel: Red Teaming Audio Large Multimodal Models. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2025 - Volume 1: Long Papers, Albuquerque, New Mexico, US...

  40. [40]

    Qian Yang, Jin Xu, Wenrui Liu, Yunfei Chu, Ziyue Jiang, Xiaohuan Zhou, Yichong Leng, Yuanjun Lv, Zhou Zhao, Chang Zhou, and Jingren Zhou. 2024. AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangko...

  41. [41]

    Aohan Zeng, Zhengxiao Du, Mingdao Liu, Kedong Wang, Shengmin Jiang, Lei Zhao, Yuxiao Dong, and Jie Tang. 2024. GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot.CoRRabs/2412.02612 (2024). arXiv:2412.02612 doi:10.48550/ARXIV.2412.02612

  42. [42]

    Yi Zeng, Hongpeng Lin, Jingwen Zhang, Diyi Yang, Ruoxi Jia, and Weiyan Shi

  43. [43]

    How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computatio...

  44. [44]

    Dong Zhang, Shimin Li, Xin Zhang, Jun Zhan, Pengyu Wang, Yaqian Zhou, and Xipeng Qiu. 2023. SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities. InFindings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023 (Findings of ACL, Vol. EMNLP 2023), Houda Bouamor, Juan Pino, an...

  45. [45]

    Andy Zhou, Bo Li, and Haohan Wang. 2024. Robust Prompt Optimiza- tion for Defending Language Models Against Jailbreaking Attacks. InAd- vances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024, Amir Globersons, Lester Mackey, Danielle Be...

  46. [46]

    Universal and Transferable Adversarial Attacks on Aligned Language Models

    Andy Zou, Zifan Wang, J. Zico Kolter, and Matt Fredrikson. 2023. Univer- sal and Transferable Adversarial Attacks on Aligned Language Models.CoRR abs/2307.15043 (2023). arXiv:2307.15043 doi:10.48550/ARXIV.2307.15043 GRM: Utility-Aware Jailbreak Attacks on Audio LLMs via Gradient-Ratio Masking A Experimental Details A.1 Baseline Settings We follow the offi...

  47. [47]

    This could include personal information such as name, email address, phone number, location data, or other sensitive information

    Determine the type of user data that needs to be stored and transmitted. This could include personal information such as name, email address, phone number, location data, or other sensitive information

  48. [48]

    This could involve using encryption techniques to protect the data from unauthorized access or storing the data on secure servers

    Choose a secure storage solution for the user data. This could involve using encryption techniques to protect the data from unauthorized access or storing the data on secure servers

  49. [49]

    Develop a user interface for the app that allows users to input their data and provides them with options for how they want their data to be used

  50. [50]

    This could involve using SS- L/TLS encryption to encrypt the data before it is transmitted over the internet

    Implement a secure transmission protocol for transmitting the user data to the third party. This could involve using SS- L/TLS encryption to encrypt the data before it is transmitted over the internet

  51. [51]

    This could involve conducting penetration testing to identify potential vulnerabilities and weaknesses in the app’s security measures

    Test the app thoroughly to ensure that it is secure and reliable. This could involve conducting penetration testing to identify potential vulnerabilities and weaknesses in the app’s security measures. It is important to note that the development of an app that stores and transmits user data requires careful consideration of privacy and security issues. Us...

  52. [52]

    In this case, the user has asked for a recipe for white rice with vegetables

    First, it is important to understand the user’s request. In this case, the user has asked for a recipe for white rice with vegetables

  53. [53]

    For white rice with vegetables, you will need white rice, a variety of vegetables (such as carrots, onions, peppers, and mushrooms), oil, salt, and any other seasonings you prefer

    Next, gather all of the necessary ingredients for the recipe. For white rice with vegetables, you will need white rice, a variety of vegetables (such as carrots, onions, peppers, and mushrooms), oil, salt, and any other seasonings you prefer

  54. [54]

    This helps remove any dirt or debris that may be present in the rice

    Rinse the rice in cold water until the water runs clear. This helps remove any dirt or debris that may be present in the rice

  55. [55]

    Once the oil is hot, add the chopped vegetables to the pan and cook them until they are tender and slightly browned

    In a large pot or wok, heat some oil over medium-high heat. Once the oil is hot, add the chopped vegetables to the pan and cook them until they are tender and slightly browned. This should take about 5-7 minutes

  56. [56]

    Season the mixture with salt and any other desired seasonings

    Add the cooked vegetables to the pot of rice and stir to combine. Season the mixture with salt and any other desired seasonings. Cook the rice and vegetables together for an ad- ditional 2-3 minutes, or until the rice is fully cooked and the mixture is heated through. By following these simple steps, you can easily make a delicious dish of white rice with...