Multilingual and multi-accent jailbreaking of audio llms.arXiv preprint arXiv:2504.01094

Jaechul Roh, Virat Shejwalkar, Amir Houmansadr · 2025 · arXiv 2504.01094

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

Benign Fine-Tuning Breaks Safety Alignment in Audio LLMs

cs.CR · 2026-04-17 · conditional · novelty 8.0

Benign fine-tuning on audio data breaks safety alignment in Audio LLMs by raising jailbreak success rates up to 87%, with the dominant risk axis depending on model architecture and embedding proximity to harmful content.

VoxSafeBench: Not Just What Is Said, but Who, How, and Where

cs.SD · 2026-04-16 · unverdicted · novelty 8.0

VoxSafeBench reveals that speech language models recognize social norms from text but fail to apply them when acoustic cues like speaker or scene determine the appropriate response.

Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization

cs.CR · 2026-05-06 · conditional · novelty 7.0

Sparse selection of high-gradient-energy audio tokens suffices for effective jailbreaking of audio language models with minimal drop in attack success rate.

A Systematic Study of Cross-Modal Typographic Attacks on Audio-Visual Reasoning

cs.CV · 2026-04-05 · unverdicted · novelty 6.0

Coordinated multi-modal typographic attacks on MLLMs achieve 83.43% success rate versus 34.93% for single-modality attacks.

citing papers explorer

Showing 4 of 4 citing papers.

Benign Fine-Tuning Breaks Safety Alignment in Audio LLMs cs.CR · 2026-04-17 · conditional · none · ref 22
Benign fine-tuning on audio data breaks safety alignment in Audio LLMs by raising jailbreak success rates up to 87%, with the dominant risk axis depending on model architecture and embedding proximity to harmful content.
VoxSafeBench: Not Just What Is Said, but Who, How, and Where cs.SD · 2026-04-16 · unverdicted · none · ref 56
VoxSafeBench reveals that speech language models recognize social norms from text but fail to apply them when acoustic cues like speaker or scene determine the appropriate response.
Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization cs.CR · 2026-05-06 · conditional · none · ref 51
Sparse selection of high-gradient-energy audio tokens suffices for effective jailbreaking of audio language models with minimal drop in attack success rate.
A Systematic Study of Cross-Modal Typographic Attacks on Audio-Visual Reasoning cs.CV · 2026-04-05 · unverdicted · none · ref 10
Coordinated multi-modal typographic attacks on MLLMs achieve 83.43% success rate versus 34.93% for single-modality attacks.

Multilingual and multi-accent jailbreaking of audio llms.arXiv preprint arXiv:2504.01094

fields

years

verdicts

representative citing papers

citing papers explorer