arxiv: 2604.05498 · v1 · submitted 2026-04-07 · 💻 cs.RO

Recognition: 2 theorem links

· Lean Theorem

JailWAM: Jailbreaking World Action Models in Robot Control

Hanqing Liu , Songping Wang , Jiahuan Long , Jiacheng Hou , Jialiang Sun , Chao Li , Yang Yang , Wei Peng

show 4 more authors

Xu Liu Tingsong Jiang Wen Yao Yao Mu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 19:18 UTC · model grok-4.3

classification 💻 cs.RO

keywords jailbreak attacksworld action modelsrobot controlsafety evaluationvisual trajectory mappingrisk discriminatordual-path verificationrobotic safety

0 comments

The pith

JailWAM shows that World Action Models in robot control can be jailbroken to produce unsafe physical motions at 84 percent success rates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces JailWAM as the first framework to attack and measure safety in World Action Models, AI systems that jointly predict future states and robot actions. It defines a Three-Level Safety Classification for robotic motions and builds three components: visual-trajectory mapping to standardize different action formats, a risk discriminator to spot dangerous patterns, and a dual-path check that combines quick image-based screening with full simulation. Experiments on the RoboTwin platform reach an 84.2 percent attack success rate against a leading model and include a new benchmark for safety testing. A reader cares because these models enable stronger physical control yet risk direct harm to people and surroundings if left unprotected.

Core claim

The central claim is that World Action Models, while powerful for physical prediction, contain exploitable safety gaps that JailWAM exposes through Visual-Trajectory Mapping to unify action spaces, a Risk Discriminator for high-recall screening of destructive behaviors, and a Dual-Path Verification Strategy that first uses single-image generation for coarse filtering then full closed-loop simulation for confirmation, yielding an 84.2 percent attack success rate on the state-of-the-art LingBot-VA model together with the JailWAM-Bench for systematic evaluation.

What carries the argument

JailWAM framework, built on Visual-Trajectory Mapping to convert heterogeneous actions into comparable visual paths, a Risk Discriminator that screens for harmful patterns, and Dual-Path Verification that runs rapid then thorough physical simulation checks.

If this is right

Current World Action Models lack sufficient built-in safeguards against prompts that induce dangerous physical actions.
The JailWAM-Bench supplies a repeatable way to measure and compare safety alignment across different model architectures.
Defense strategies can be developed by analyzing the failure modes identified through visual-trajectory and simulation testing.
Efficient screening tools like the Risk Discriminator make large-scale safety audits of robot predictors practical.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same mapping and verification steps could be applied to test safety in other embodied prediction systems such as autonomous navigation models.
Running the framework on additional simulation environments would test whether the reported success rate depends on RoboTwin-specific features.
Embedding the risk discriminator inside model training loops might reduce vulnerabilities before deployment rather than only detecting them afterward.

Load-bearing premise

The Three-Level Safety Classification and RoboTwin simulation accurately capture real physical risks and the visual-trajectory conversion preserves the essential properties of the original action spaces.

What would settle it

Demonstrating that the same jailbreak prompts produce no unsafe arm motions when transferred from the RoboTwin simulation to physical robot hardware would show the framework does not expose meaningful vulnerabilities.

Figures

Figures reproduced from arXiv: 2604.05498 by Chao Li, Hanqing Liu, Jiacheng Hou, Jiahuan Long, Jialiang Sun, Songping Wang, Tingsong Jiang, Wei Peng, Wen Yao, Xu Liu, Yang Yang, Yao Mu.

**Figure 2.** Figure 2: Comparison of jailbreak consequences across dif [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the proposed JailWAM framework. [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: The finetuned pipeline of the Risk Discriminator (RD) and its performance evaluation. [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 3.** Figure 3: To address the prohibitive computational cost of verifying [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 5.** Figure 5: Qualitative results demonstrating jailbreak attack robustness. The same jailbreak instruction consistently elicits [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Cross-seed reliability of LLM-Generated prompts [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Evaluation of the inference-time defense. We com [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

read the original abstract

The World Action Model (WAM) can jointly predict future world states and actions, exhibiting stronger physical manipulation capabilities compared with traditional models. Such powerful physical interaction ability is a double-edged sword: if safety is ignored, it will directly threaten personal safety, property security and environmental safety. However, existing research pays extremely limited attention to the critical security gap: the vulnerability of WAM to jailbreak attacks. To fill this gap, we define the Three-Level Safety Classification Framework to systematically quantify the safety of robotic arm motions. Furthermore, we propose JailWAM, the first dedicated jailbreak attack and evaluation framework for WAM, which consists of three core components: (1) Visual-Trajectory Mapping, which unifies heterogeneous action spaces into visual trajectory representations and enables cross-architectural unified evaluation; (2) Risk Discriminator, which serves as a high-recall screening tool that optimizes the efficiency-accuracy trade-off when identifying destructive behaviors in visual trajectories; (3) Dual-Path Verification Strategy, which first conducts rapid coarse screening via a single-image-based video-action generation module, and then performs efficient and comprehensive verification through full closed-loop physical simulation. In addition, we construct JailWAM-Bench, a benchmark for comprehensively evaluating the safety alignment performance of WAM under jailbreak attacks. Experiments in RoboTwin simulation environment demonstrate that the proposed framework efficiently exposes physical vulnerabilities, achieving an 84.2% attack success rate on the state-of-the-art LingBot-VA. Meanwhile, robust defense mechanisms can be constructed based on JailWAM, providing an effective technical solution for designing safe and reliable robot control systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is the first jailbreak framework for World Action Models, with a new benchmark and components that unify evaluation, but the physical-vulnerability claims rest only on simulation results.

read the letter

The paper's main contribution is introducing JailWAM as the first attack and evaluation setup built specifically for World Action Models. It defines a three-level safety classification for robotic motions, adds Visual-Trajectory Mapping to handle different action spaces, a Risk Discriminator for quick screening, and Dual-Path Verification that combines fast single-image checks with full closed-loop simulation. They also release JailWAM-Bench and report an 84.2% attack success rate against LingBot-VA inside the RoboTwin simulator. That is genuinely new work; no prior paper had targeted jailbreaks on these models with this kind of unified pipeline and benchmark. The practical framing around efficiency-accuracy trade-offs in the discriminator and the dual verification path is useful for anyone who needs to test safety at scale. The central limitation is that every result stays in simulation. The abstract and stress-test note both indicate no hardware deployment, no sim-to-real transfer runs, and no hardware-in-the-loop checks. Robotics work routinely shows that contact forces, sensor noise, and dynamics gaps break simulation-derived safety conclusions, so the claim that the attacks expose real physical vulnerabilities is not yet supported by evidence. The paper would be stronger with at least a discussion of those gaps or a small real-robot pilot. This is relevant for groups working on embodied world models and robot safety. Readers who care about adversarial robustness in physical AI will find the benchmark and mapping technique worth looking at. It is coherent on its own terms and engages the right literature, so it deserves a serious referee even though the simulation-only design means it will need revision to support the physical-risk language.

Referee Report

3 major / 2 minor

Summary. The paper introduces JailWAM as the first dedicated jailbreak attack and evaluation framework for World Action Models (WAMs) in robot control. It defines a Three-Level Safety Classification Framework to quantify robotic arm motion safety, proposes three core components—Visual-Trajectory Mapping to unify heterogeneous action spaces into visual representations, a Risk Discriminator for high-recall screening of destructive behaviors, and a Dual-Path Verification Strategy combining single-image coarse screening with full closed-loop physical simulation—and constructs the JailWAM-Bench benchmark. Experiments in the RoboTwin simulator report an 84.2% attack success rate on the state-of-the-art LingBot-VA model, with the framework also positioned as a basis for constructing defense mechanisms.

Significance. If the simulation results generalize, the work would be significant for identifying a previously understudied security gap in physically capable WAMs and for supplying a unified evaluation methodology and benchmark that could guide safer robot control system design. The empirical focus on cross-architectural attack transfer via visual trajectories offers a practical contribution to robotics security literature.

major comments (3)

[Experiments section (RoboTwin evaluation)] Experiments section (RoboTwin evaluation): The headline 84.2% attack success rate on LingBot-VA is obtained exclusively inside the RoboTwin simulator via Visual-Trajectory Mapping and Dual-Path Verification. The central claim that JailWAM 'efficiently exposes physical vulnerabilities' therefore depends on the untested assumption that simulation trajectories correspond to equivalent real-world unsafe physical interactions. No real-robot deployment, sim-to-real transfer experiments, or analysis of dynamics mismatch, sensor noise, and contact modeling gaps is described, which is load-bearing for the physical safety conclusions.
[Three-Level Safety Classification Framework (Section 3)] Three-Level Safety Classification Framework (Section 3): The framework is presented as a systematic quantifier of safety for robotic motions, yet the manuscript supplies no details on its validation against real-world harm (e.g., calibration to physical injury metrics, inter-annotator agreement, or comparison with established robotics safety standards). This directly affects the interpretability and reliability of the reported attack success rate.
[Evaluation on JailWAM-Bench] Evaluation on JailWAM-Bench: The results lack explicit baseline comparisons to prior jailbreak techniques, statistical significance testing for the 84.2% ASR, data exclusion criteria, or ablation studies isolating the contribution of each component (Visual-Trajectory Mapping, Risk Discriminator, Dual-Path Verification). These omissions limit assessment of whether the framework advances the state of the art.

minor comments (2)

[Abstract and Conclusion] The abstract states that 'robust defense mechanisms can be constructed based on JailWAM' but the main text provides only high-level mention without concrete defense implementations, evaluations, or quantitative results; this should be expanded or the claim tempered.
[Figures] Figure captions and diagrams illustrating the Dual-Path Verification Strategy and Visual-Trajectory Mapping would benefit from additional labels and step-by-step annotations to improve clarity for readers unfamiliar with the WAM architectures.

Simulated Author's Rebuttal

3 responses · 2 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of simulation fidelity, framework validation, and empirical rigor that we will address in the revision. We provide point-by-point responses below.

read point-by-point responses

Referee: Experiments section (RoboTwin evaluation): The headline 84.2% attack success rate on LingBot-VA is obtained exclusively inside the RoboTwin simulator via Visual-Trajectory Mapping and Dual-Path Verification. The central claim that JailWAM 'efficiently exposes physical vulnerabilities' therefore depends on the untested assumption that simulation trajectories correspond to equivalent real-world unsafe physical interactions. No real-robot deployment, sim-to-real transfer experiments, or analysis of dynamics mismatch, sensor noise, and contact modeling gaps is described, which is load-bearing for the physical safety conclusions.

Authors: We agree that all reported results are obtained in the RoboTwin simulator and that no real-robot or sim-to-real experiments are included. The simulator provides a controlled environment for closed-loop physical simulation, but we acknowledge that dynamics mismatch, sensor noise, and contact modeling differences remain untested. We will revise the abstract, introduction, and conclusion to qualify claims about 'physical vulnerabilities' as referring to simulated environments. A new limitations subsection will explicitly discuss these gaps and frame real-world transfer as important future work. This revision will ensure the safety conclusions are appropriately scoped to the simulation setting. revision: partial
Referee: Three-Level Safety Classification Framework (Section 3): The framework is presented as a systematic quantifier of safety for robotic motions, yet the manuscript supplies no details on its validation against real-world harm (e.g., calibration to physical injury metrics, inter-annotator agreement, or comparison with established robotics safety standards). This directly affects the interpretability and reliability of the reported attack success rate.

Authors: The Three-Level Safety Classification Framework categorizes motions according to observable kinematic and interaction properties (velocity thresholds, proximity to humans/objects, and potential for collision or damage). We did not provide calibration to physical injury metrics or inter-annotator studies in the submitted version. We will expand Section 3 with a more detailed rationale, explicit mapping to ISO 10218 and related robotics safety guidelines, and a clear statement that the levels serve as an initial proxy for evaluation rather than a fully validated harm metric. We will also note the requirement for future empirical calibration as a limitation. revision: partial
Referee: Evaluation on JailWAM-Bench: The results lack explicit baseline comparisons to prior jailbreak techniques, statistical significance testing for the 84.2% ASR, data exclusion criteria, or ablation studies isolating the contribution of each component (Visual-Trajectory Mapping, Risk Discriminator, Dual-Path Verification). These omissions limit assessment of whether the framework advances the state of the art.

Authors: We accept that the current evaluation section would benefit from these additions. In the revised manuscript we will: (i) adapt and compare against representative prior jailbreak approaches from the LLM and vision-language literature where applicable to the WAM setting; (ii) report statistical significance and confidence intervals for the 84.2% ASR; (iii) document the data exclusion criteria used when constructing JailWAM-Bench; and (iv) present ablation results that isolate the contribution of Visual-Trajectory Mapping, the Risk Discriminator, and the Dual-Path Verification Strategy. These changes will allow readers to better gauge the framework's incremental contribution. revision: yes

standing simulated objections not resolved

Real-robot deployment, sim-to-real transfer experiments, and analysis of dynamics mismatch/sensor noise/contact modeling gaps (first major comment).
Empirical validation of the Three-Level Safety Classification Framework against real-world physical injury metrics, inter-annotator agreement, or direct comparison with established safety standards (second major comment).

Circularity Check

0 steps flagged

No circularity: empirical measurement of attack success rate in simulation

full rationale

The paper is an empirical contribution that defines a Three-Level Safety Classification Framework, proposes JailWAM components (Visual-Trajectory Mapping, Risk Discriminator, Dual-Path Verification), constructs JailWAM-Bench, and reports a directly measured 84.2% attack success rate inside the RoboTwin simulator on LingBot-VA. No equations, derivations, or first-principles results are presented that reduce to their own inputs by construction. The success rate is obtained from simulation runs rather than from any fitted parameter renamed as a prediction, self-referential definition, or load-bearing self-citation chain. The central claim therefore remains an independent experimental observation within the stated simulation environment.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical free parameters, standard axioms, or invented physical entities are described. Contributions consist of new methodological components and an empirical benchmark rather than derivations or postulated entities.

pith-pipeline@v0.9.0 · 5628 in / 1235 out tokens · 78490 ms · 2026-05-10T19:18:02.379912+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We define the Three-Level Safety Classification Framework to systematically quantify the safety of robotic arm motions... Visual-Trajectory Mapping... Risk Discriminator... Dual-Path Verification Strategy
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Experiments in RoboTwin simulation environment demonstrate... 84.2% attack success rate

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

30 extracted references · 19 canonical work pages · 9 internal anchors

[1]

Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, et al . 2025. Qwen3-vl technical report.arXiv preprint arXiv:2511.21631(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

Hongzhe Bi, Hengkai Tan, Shenghao Xie, Zeyuan Wang, Shuhe Huang, Haitian Liu, Ruowen Zhao, Yao Feng, Chendong Xiang, Yinze Rong, et al. 2025. Motus: A unified latent action world model.arXiv preprint arXiv:2512.13030(2025)

work page internal anchor Pith review arXiv 2025
[3]

Tianxing Chen, Zanxin Chen, Baijun Chen, Zijian Cai, Yibin Liu, Zixuan Li, Qiwei Liang, Xianliang Lin, Yiheng Ge, Zhenyu Gu, et al. 2025. Robotwin 2.0: A scalable data generator and benchmark with strong domain randomization for robust bimanual robotic manipulation.arXiv preprint arXiv:2506.18088(2025)

work page internal anchor Pith review arXiv 2025
[4]

Gelei Deng, Yi Liu, Yuekang Li, Kailong Wang, Ying Zhang, Zefeng Li, Haoyu Wang, Tianwei Zhang, and Yang Liu. 2023. Masterkey: Automated jailbreak across multiple large language model chatbots.arXiv preprint arXiv:2307.08715 (2023)

work page arXiv 2023
[5]

Yichen Gong, Delong Ran, Jinyuan Liu, Conglei Wang, Tianshuo Cong, Anyu Wang, Sisi Duan, and Xiaoyun Wang. 2025. Figstep: Jailbreaking large vision- language models via typographic visual prompts. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 23951–23959

2025
[6]

Yucheng Hu, Yanjiang Guo, Pengchao Wang, Xiaoyu Chen, Yen-Jen Wang, Jianke Zhang, Koushil Sreenath, Chaochao Lu, and Jianyu Chen. 2024. Video prediction policy: A generalist robot policy with predictive visual representations.arXiv preprint arXiv:2412.14803(2024)

work page internal anchor Pith review arXiv 2024
[7]

Xiaojun Jia, Tianyu Pang, Chao Du, Yihao Huang, Jindong Gu, Yang Liu, Xi- aochun Cao, and Min Lin. 2024. Improved techniques for optimization-based jailbreaking on large language models.arXiv preprint arXiv:2405.21018(2024)

work page arXiv 2024
[8]

Moo Jin Kim, Yihuai Gao, Tsung-Yi Lin, Yen-Chen Lin, Yunhao Ge, Grace Lam, Percy Liang, Shuran Song, Ming-Yu Liu, Chelsea Finn, et al. 2026. Cosmos policy: Fine-tuning video models for visuomotor control and planning.arXiv preprint arXiv:2601.16163(2026)

work page internal anchor Pith review arXiv 2026
[9]

Wonjun Lee, Haon Park, Doehyeon Lee, Bumsub Ham, and Suhyun Kim. 2025. Jailbreaking on Text-to-Video Models via Scene Splitting Strategy.arXiv preprint arXiv:2509.22292(2025)

work page arXiv 2025
[10]

Lin Li, Qihang Zhang, Yiming Luo, Shuai Yang, Ruilin Wang, Fei Han, Mingrui Yu, Zelin Gao, Nan Xue, Xing Zhu, et al. 2026. Causal World Modeling for Robot Control.arXiv preprint arXiv:2601.21998(2026)

work page internal anchor Pith review arXiv 2026
[11]

Xuanlin Li, Kyle Hsu, Jiayuan Gu, Karl Pertsch, Oier Mees, Homer Rich Walke, Chuyuan Fu, Ishikaa Lunawat, Isabel Sieh, Sean Kirmani, et al . 2024. Eval- uating real-world robot manipulation policies in simulation.arXiv preprint arXiv:2405.05941(2024)

work page internal anchor Pith review arXiv 2024
[12]

Yue Liao, Pengfei Zhou, Siyuan Huang, Donglin Yang, Shengcong Chen, Yuxin Jiang, Yue Hu, Jingbin Cai, Si Liu, Jianlan Luo, et al . 2025. Genie envisioner: A unified world foundation platform for robotic manipulation.arXiv preprint arXiv:2508.05635(2025)

work page arXiv 2025
[13]

Bo Liu, Yifeng Zhu, Chongkai Gao, Yihao Feng, Qiang Liu, Yuke Zhu, and Peter Stone. 2023. Libero: Benchmarking knowledge transfer for lifelong robot learning. Advances in Neural Information Processing Systems36 (2023), 44776–44791

2023
[14]

Hanqing Liu, Lifeng Zhou, and Huanqian Yan. 2024. Boosting jailbreak transfer- ability for large language models.arXiv preprint arXiv:2410.15645(2024)

work page arXiv 2024
[15]

Jiayang Liu, Siyuan Liang, Shiqian Zhao, Rongcheng Tu, Wenbo Zhou, Xiaochun Cao, Dacheng Tao, and Siew Kei Lam. 2025. Jailbreaking the text-to-video generative models.arXiv e-prints(2025), arXiv–2505

2025
[16]

Weidi Luo, Siyuan Ma, Xiaogeng Liu, Xiaoyu Guo, and Chaowei Xiao. 2024. Jail- breakv: A benchmark for assessing the robustness of multimodal large language models against jailbreak attacks.arXiv preprint arXiv:2404.03027(2024)

work page arXiv 2024
[17]

Yibo Miao, Yifan Zhu, Lijia Yu, Jun Zhu, Xiao-Shan Gao, and Yinpeng Dong
[18]

Advances in Neural Information Processing Systems37 (2024), 63858–63872

T2vsafetybench: Evaluating the safety of text-to-video generative models. Advances in Neural Information Processing Systems37 (2024), 63858–63872

2024
[19]

Soroush Nasiriany, Fei Xia, Wenhao Yu, Ted Xiao, Jacky Liang, Ishita Dasgupta, Annie Xie, Danny Driess, Ayzaan Wahid, Zhuo Xu, et al . 2024. Pivot: Itera- tive visual prompting elicits actionable knowledge for vlms.arXiv preprint arXiv:2402.07872(2024)

work page arXiv 2024
[20]

Physical Intelligence, Kevin Black, Noah Brown, James Darpinian, Karan Dha- balia, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, et al. 2025. 𝜋0.5: A Vision-Language-Action Model with Open-World Generaliza- tion.arXiv preprint arXiv:2504.16054(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[21]

Xiangyu Qi, Kaixuan Huang, Ashwinee Panda, Peter Henderson, Mengdi Wang, and Prateek Mittal. 2024. Visual adversarial examples jailbreak aligned large language models. InProceedings of the AAAI conference on artificial intelligence, Vol. 38. 21527–21536

2024
[22]

Xijia Tao, Shuai Zhong, Lei Li, Qi Liu, and Lingpeng Kong. 2025. Imgtrojan: Jailbreaking vision-language models with one image. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Com- putational Linguistics: Human Language Technologies (Volume 1: Long Papers). 7048–7063

2025
[23]

Songping Wang, Rufan Qian, Yueming Lyu, Qinglong Liu, Linzhuang Zou, Jie Qin, Songhua Liu, and Caifeng Shan. 2025. RunawayEvil: Jailbreaking the Image- to-Video Generative Models.arXiv preprint arXiv:2512.06674(2025)

work page arXiv 2025
[24]

Taowen Wang, Cheng Han, James Liang, Wenhao Yang, Dongfang Liu, Luna Xinyu Zhang, Qifan Wang, Jiebo Luo, and Ruixiang Tang. 2025. Exploring the adversarial vulnerabilities of vision-language-action models in robotics. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6948– 6958

2025
[25]

Xiaofei Wang, Mingliang Han, Tianyu Hao, Cegang Li, Yunbo Zhao, and Keke Tang. 2025. Advgrasp: Adversarial attacks on robotic grasping from a physical perspective.arXiv preprint arXiv:2507.09857(2025)

work page arXiv 2025
[26]

Zonghao Ying, Aishan Liu, Tianyuan Zhang, Zhengmin Yu, Siyuan Liang, Xiang- long Liu, and Dacheng Tao. 2025. Jailbreak vision language models via bi-modal adversarial prompt.IEEE Transactions on Information Forensics and Security (2025)

2025
[27]

Hangtao Zhang, Chenyu Zhu, Xianlong Wang, Ziqi Zhou, Changgan Yin, Minghui Li, Lulu Xue, Yichen Wang, Shengshan Hu, Aishan Liu, et al . 2025. Badrobot: Jailbreaking embodied LLM agents in the physical world. InThe Thir- teenth International Conference on Learning Representations

2025
[28]

Jiaming Zhang, Junhong Ye, Xingjun Ma, Yige Li, Yunfan Yang, Yunhao Chen, Jitao Sang, and Dit-Yan Yeung. 2025. Anyattack: Towards large-scale self- supervised adversarial attacks on vision-language models. InProceedings of the Computer Vision and Pattern Recognition Conference. 19900–19909

2025
[29]

Ruijie Zheng, Yongyuan Liang, Shuaiyi Huang, Jianfeng Gao, Hal Daumé III, Andrey Kolobov, Furong Huang, and Jianwei Yang. 2024. Tracevla: Visual trace prompting enhances spatial-temporal awareness for generalist robotic policies. arXiv preprint arXiv:2412.10345(2024)

work page arXiv 2024
[30]

Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J Zico Kolter, and Matt Fredrikson. 2023. Universal and transferable adversarial attacks on aligned language models.arXiv preprint arXiv:2307.15043(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023