Trajectory-Level Redirection Attacks on Vision-Language-Action Models

Dilek Hakkani-T\"ur; Gokul Puthumanaillam; Hooshang Nayyeri; Melkior Ornik; Pranay Thangeda; Vardhan Dongre

arxiv: 2606.12978 · v1 · pith:HQSNFHOZnew · submitted 2026-06-11 · 💻 cs.RO · cs.CV· cs.SY· eess.SY

Trajectory-Level Redirection Attacks on Vision-Language-Action Models

Gokul Puthumanaillam , Vardhan Dongre , Pranay Thangeda , Hooshang Nayyeri , Dilek Hakkani-T\"ur , Melkior Ornik This is my paper

Pith reviewed 2026-06-27 06:31 UTC · model grok-4.3

classification 💻 cs.RO cs.CVcs.SYeess.SY

keywords vision-language-action modelstrajectory redirectionprompt attacksrobot manipulationadversarial promptsclosed-loop controlcommand preservation

0 comments

The pith

Near-benign prompt perturbations can redirect VLA rollouts to attacker-specified targets under a prompt-only threat model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that vision-language-action policies, which control robots from repeated text prompts, admit a trajectory-level attack in which one altered prompt steers the entire closed-loop episode to a different physical outcome. The altered prompt must remain close to the benign instruction, omit any mention of the target task, and avoid correction language, yet still produce the attacker-chosen result across changing observations. The authors formalize this as command-preserving trajectory redirection and supply an on-policy search procedure that evaluates candidate prompts by executing full rollouts to locate perturbations meeting the constraints. Experiments confirm the attacks succeed in both simulation and on physical hardware, showing that instruction grounding in these models can be subverted at the level of final physical state without obvious textual signals.

Core claim

A single attacker-chosen prompt, fixed before the episode, can make a VLA policy's closed-loop behavior track an attacker-specified target task while satisfying the constraints of command-preserving trajectory redirection: the prompt stays near the benign instruction, omits target words and correction language, and all other policy and environment components remain unchanged. The prompt is found by an on-policy search that uses rollouts to identify perturbations whose executed trajectories match the target.

What carries the argument

Command-preserving trajectory redirection, a prompt-only threat model in which one fixed prompt redirects the full trajectory to a new target while staying close to the original instruction and omitting target words or corrections, located via on-policy prompt search that evaluates candidates through actual rollouts.

If this is right

Prompt perturbations achieve redirection to attacker-specified targets while obeying the closeness and omission constraints.
The on-policy rollout search locates perturbations whose closed-loop trajectories track the chosen target task.
The redirection occurs in both simulated and physical hardware settings.
Text that appears to preserve the intended command can still give control over the robot's final physical outcome.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Safety checks for VLA systems may need to evaluate candidate prompts by simulating their potential full trajectories rather than inspecting text alone.
The same redirection risk could appear in other closed-loop language-controlled processes where instructions are reused across steps.
Applying the search across wider task distributions would reveal whether the redirection capability scales or breaks on more complex behaviors.

Load-bearing premise

The assumption that an on-policy prompt search using rollouts can reliably discover perturbations that satisfy the command-preserving constraints while achieving the target trajectory without the search itself introducing artifacts that would not appear in a real deployment.

What would settle it

Execute the prompts discovered by the search method inside an actual deployed VLA system that never ran the search procedure, and check whether the redirection to the attacker target still occurs or whether no qualifying near-benign prompts exist for the tested tasks.

Figures

Figures reproduced from arXiv: 2606.12978 by Dilek Hakkani-T\"ur, Gokul Puthumanaillam, Hooshang Nayyeri, Melkior Ornik, Pranay Thangeda, Vardhan Dongre.

**Figure 1.** Figure 1: A command-preserving prompt perturbation redirects the closed-loop VLA trajectory toward an adversary-specified physical goal. Although the adversarial prompt differs from the benign instruction by only a small text change and contains no explicit target command, the robot places the bowl on the plate rather than the stove. However, eliciting a targeted action at one inference step, or making that action p… view at source ↗

**Figure 2.** Figure 2: High-level overview of the on-policy teacher-matching prompt search. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Causal trace of a command-preserving perturbation [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Closed-loop target affinity on attack-induced states. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Perturbation budget versus search cost. We vary the maximum number of prompt tokens the search may modify. KF#4: Small perturbation budgets already suffice, while larger budgets reduce search cost. Averaging the reported edit column over the VLA rows in [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Hardware validation. (a) Qualitative results from SmolVLA. (b) Performance of SO100-finetuned VLAs. queries to find a successful perturbation; with a larger budget, the same attack objective becomes easier to satisfy and the number of queries drop. KF#5: The attack survives real-robot deployment. The hardware results show that commandpreserving redirection is not only a simulator artifact. Under the benig… view at source ↗

**Figure 7.** Figure 7: LIBERO simulation. Unless otherwise stated, LIBERO actions are represented in a 7- D single-arm action space consisting of translation, rotation, and gripper components. For models whose native action representation is discrete action tokens, generated text, or a higher-dimensional padded action vector, actions are decoded using the corresponding model wrapper before environment execution. Teacher-matching… view at source ↗

**Figure 8.** Figure 8: Hardware setup. The real-robot experiments use a Standard Open Arm 100 (SO-100) tabletop manipulation setup. The SO-100 follower arm has five actuated arm degrees of freedom and one actuated gripper degree of freedom. We therefore refer to the hardware control interface as a six-channel joint-and-gripper interface: five arm joints plus one gripper command. The gripper is treated as part of the robot actio… view at source ↗

**Figure 9.** Figure 9: Additional simulation results 44 [PITH_FULL_IMAGE:figures/full_fig_p044_9.png] view at source ↗

**Figure 10.** Figure 10: Additional hardware results. All the benign prompts are evaluated on π 0.5 45 [PITH_FULL_IMAGE:figures/full_fig_p045_10.png] view at source ↗

read the original abstract

Vision-language-action (VLA) policies bring natural language into closed-loop robot control, enabling robots to execute manipulation tasks directly from text instructions. The same interface gives text a recurring role in control because the prompt is reused at every replanning step, and each prompt-conditioned action changes the future observations on which the policy acts. Existing VLA attacks study adversarial prompts that elicit targeted low-level actions or make such actions persist across changing images. We identify a stronger trajectory-level failure mode: a prompt that still $\textit{appears}$ to specify the intended task but redirects the final physical outcome. We mathematically formalize this setting as $\textit{command-preserving trajectory redirection}$, a prompt-only threat model in which the attacker chooses one prompt before the episode, all policy and environment components remain fixed, and the prompt must stay close to the benign instruction while omitting target words and correction language. To find such prompts, we introduce an on-policy prompt search method that uses rollouts to discover perturbations whose closed-loop behavior tracks a target task while satisfying the command-preserving constraints. Experiments in simulation and on hardware show that near-benign prompt perturbations can redirect VLA rollouts to attacker-specified targets. These results expose a trajectory-level vulnerability in VLA instruction grounding: text that appears to preserve the intended command can still give an adversary control over the robot's final physical outcome. Project website: https://vla-redirection-attack.github.io/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper formalizes command-preserving trajectory redirection on VLAs and offers an on-policy search to find such prompts, but the search procedure may not align with the fixed one-prompt threat model.

read the letter

The core contribution is a formalization of command-preserving trajectory redirection: a single prompt chosen upfront that still looks like the original task instruction but steers the closed-loop VLA rollout to an attacker-chosen physical outcome. They also describe an on-policy search that uses rollouts to locate prompts meeting the closeness and omission constraints.

That framing is new relative to earlier VLA attack papers that focused on eliciting specific low-level actions or making them persist. Treating the prompt as part of a recurring control loop and requiring it to preserve the command surface is a reasonable way to tighten the threat model.

The abstract claims simulation and hardware results back this up. If the full paper supplies clear success rates, trial counts, and measurement definitions, that would be useful evidence for people studying VLA robustness.

The soft spot is the search method itself. The procedure runs repeated full rollouts to optimize the prompt while enforcing the constraints inside the loop. This gives the attacker repeated interaction with the robot during discovery. The stated threat model, however, is a fixed prompt selected before the episode with no further access. Prompts that only appear under that search regime may not satisfy the model when the attacker must commit without ongoing rollouts. The stress-test concern lands here.

This is relevant to researchers working on language-conditioned robot policies and on security of learned controllers. It is worth sending to peer review so the experiments and the search-to-threat-model gap can be examined directly.

Referee Report

1 major / 1 minor

Summary. The paper claims that near-benign prompt perturbations, found via an on-policy search over rollouts, can redirect closed-loop VLA trajectories to attacker-specified targets while satisfying a command-preserving threat model (prompt chosen once before the episode, remains close to the benign instruction, omits target words and correction language, and all other components fixed). Simulation and hardware experiments are said to support the existence of such prompts.

Significance. If the central claim holds under the stated threat model, the result would identify a trajectory-level vulnerability in VLA instruction grounding that goes beyond existing low-level action attacks, with potential implications for the security of language-conditioned robot policies.

major comments (1)

[Abstract (threat model and method description)] The threat model (stated in the abstract) requires the attacker to select one fixed prompt before the episode begins, with all policy and environment components remaining fixed thereafter. However, the on-policy prompt search method relies on repeated full rollouts to discover and optimize the perturbation while enforcing the command-preserving constraints inside the same loop. This procedure implicitly grants the attacker repeated execution access during discovery, which appears inconsistent with the one-shot pre-episode choice and raises the possibility that reported successes exploit transient state sequences or observation statistics induced by the search itself rather than generalizing to a single fixed prompt chosen without further interaction.

minor comments (1)

[Abstract] The abstract states that simulation and hardware experiments support the claim but provides no quantitative results, success rates, number of trials, or details on how redirection success and command preservation were measured.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for identifying a potential ambiguity between the threat model and the prompt discovery procedure. We address the comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract (threat model and method description)] The threat model (stated in the abstract) requires the attacker to select one fixed prompt before the episode begins, with all policy and environment components remaining fixed thereafter. However, the on-policy prompt search method relies on repeated full rollouts to discover and optimize the perturbation while enforcing the command-preserving constraints inside the same loop. This procedure implicitly grants the attacker repeated execution access during discovery, which appears inconsistent with the one-shot pre-episode choice and raises the possibility that reported successes exploit transient state sequences or observation statistics induced by the search itself rather than generalizing to a single fixed prompt chosen without further interaction.

Authors: We agree that the distinction between prompt discovery and deployment requires clearer separation to avoid ambiguity. The on-policy search is an offline optimization performed by the attacker (typically in simulation or with white-box access to the policy) to identify a prompt satisfying the command-preserving constraints; once identified, that single prompt is fixed and used for the entire episode with no further interaction or adaptation. The reported results evaluate the fixed prompt on independent rollouts (including hardware transfers), not on the search trajectories themselves. We will revise the abstract, threat model section, and method description to explicitly delineate the discovery phase (offline, attacker-controlled) from the attack phase (one-shot, fixed prompt, no repeated access), and we will add explicit statements that success is measured on held-out episodes after search termination. revision: partial

Circularity Check

0 steps flagged

No circularity detected in empirical attack paper

full rationale

The paper is an empirical demonstration of a prompt-only attack on VLA policies. It formalizes a threat model and introduces an on-policy search procedure whose success is shown via simulation and hardware experiments. No equations, fitted parameters, or derivations are presented that reduce a claimed result to its own inputs by construction. No self-citation load-bearing steps, ansatz smuggling, or renaming of known results appear in the provided text. The central claim rests on experimental outcomes rather than any self-referential reduction, making the work self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the existence of a closed-loop VLA policy that reuses the same prompt at every replanning step and on the ability of an on-policy search to locate suitable perturbations; no free parameters, ad-hoc axioms, or new physical entities are introduced.

axioms (1)

domain assumption VLA policies condition actions on the same text prompt at every replanning step
Stated in the abstract as the mechanism that gives text a recurring role in control.

invented entities (1)

command-preserving trajectory redirection threat model no independent evidence
purpose: Formalizes the attack setting in which a single prompt redirects the final physical outcome while remaining close to the benign instruction
Newly defined in the paper; no independent evidence outside the attack demonstration itself.

pith-pipeline@v0.9.1-grok · 5823 in / 1284 out tokens · 18188 ms · 2026-06-27T06:31:57.856865+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

54 extracted references · 17 linked inside Pith

[2]

M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi, et al. Openvla: An open-source vision-language-action model.arXiv preprint arXiv:2406.09246, 2024

Pith/arXiv arXiv 2024
[3]

Zitkovich, T

B. Zitkovich, T. Yu, S. Xu, P. Xu, T. Xiao, F. Xia, J. Wu, P. Wohlhart, S. Welker, A. Wahid, et al. Rt-2: Vision-language-action models transfer web knowledge to robotic control. In Conference on Robot Learning, pages 2165–2183. PMLR, 2023

2023
[4]

Black, N

K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, L. Groom, K. Hausman, B. Ichter, et al.pi 0: A vision-language-action flow model for general robot control.arXiv preprint arXiv:2410.24164, 2024

Pith/arXiv arXiv 2024
[5]

E. K. Jones, A. Robey, A. Zou, Z. Ravichandran, G. J. Pappas, H. Hassani, M. Fredrikson, and J. Z. Kolter. Adversarial attacks on robotic vision language action models.arXiv preprint arXiv:2506.03350, 2025

arXiv 2025
[6]

A. Zou, Z. Wang, N. Carlini, M. Nasr, J. Z. Kolter, and M. Fredrikson. Universal and transfer- able adversarial attacks on aligned language models.arXiv preprint arXiv:2307.15043, 2023

Pith/arXiv arXiv 2023
[8]

S. Ross, G. Gordon, and D. Bagnell. A reduction of imitation learning and structured predic- tion to no-regret online learning. InProceedings of the fourteenth international conference on artificial intelligence and statistics, pages 627–635. JMLR Workshop and Conference Pro- ceedings, 2011

2011
[9]

Brohan, N

A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Haus- man, A. Herzog, J. Hsu, et al. Rt-1: Robotics transformer for real-world control at scale.arXiv preprint arXiv:2212.06817, 2022

Pith/arXiv arXiv 2022
[10]

O’Neill, A

A. O’Neill, A. Rehman, A. Maddukuri, A. Gupta, A. Padalkar, A. Lee, A. Pooley, A. Gupta, A. Mandlekar, A. Jain, et al. Open x-embodiment: Robotic learning datasets and rt-x models: Open x-embodiment collaboration 0. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 6892–6903. IEEE, 2024

2024
[11]

Driess, F

D. Driess, F. Xia, M. S. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. Tompson, Q. Vuong, T. Yu, et al. Palm-e: An embodied multimodal language model.arXiv preprint arXiv:2303.03378, 2023

Pith/arXiv arXiv 2023
[12]

J. Lee, J. Duan, H. Fang, Y . Deng, S. Liu, B. Li, B. Fang, J. Zhang, Y . R. Wang, S. Lee, et al. Molmoact: Action reasoning models that can reason in space.arXiv preprint arXiv:2508.07917, 2025. 9

Pith/arXiv arXiv 2025
[13]

Pertsch, K

K. Pertsch, K. Stachowicz, B. Ichter, D. Driess, S. Nair, Q. Vuong, O. Mees, C. Finn, and S. Levine. FAST: Efficient action tokenization for vision-language-action models.arXiv preprint arXiv:2501.09747, 2025

Pith/arXiv arXiv 2025
[14]

Y . Wang, H. Zhu, M. Liu, J. Yang, H.-S. Fang, and T. He. VQ-VLA: Improving vision- language-action models via scaling vector-quantized action tokenizers. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025

2025
[15]

Bousmalis, G

K. Bousmalis, G. Vezzani, D. Rao, C. Devin, A. X. Lee, M. Bauza, T. Davchev, Y . Zhou, A. Gupta, et al. RoboCat: A self-improving generalist agent for robotic manipulation.Trans- actions on Machine Learning Research (TMLR), 2023

2023
[16]

M. J. Kim, C. Finn, and P. Liang. Fine-tuning vision-language-action models: Optimizing speed and success.arXiv preprint arXiv:2502.19645, 2025

Pith/arXiv arXiv 2025
[17]

Ghosh, H

Octo Model Team, D. Ghosh, H. Walke, K. Pertsch, K. Black, O. Mees, S. Dasari, J. Hejna, T. Kreiman, C. Xu, J. Luo, Y . L. Tan, P. Sanketi, Q. Vuong, T. Xiao, D. Sadigh, C. Finn, and S. Levine. Octo: An open-source generalist robot policy.arXiv preprint arXiv:2405.12213, 2024

Pith/arXiv arXiv 2024
[18]

Black, N

Physical Intelligence, K. Black, N. Brown, J. Darpinian, K. Dhabalia, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, et al.π 0.5: a vision-language-action model with open-world generalization.arXiv preprint arXiv:2504.16054, 2025

Pith/arXiv arXiv 2025
[19]

S. Liu, L. Wu, B. Li, H. Tan, H. Chen, Z. Wang, K. Xu, H. Su, and J. Zhu. RDT-1B: a diffusion foundation model for bimanual manipulation.arXiv preprint arXiv:2410.07864, 2024

Pith/arXiv arXiv 2024
[20]

Y . Li, Y . Deng, J. Zhang, J. Jang, M. Memmel, C. Garrett, F. Ramos, D. Fox, A. Li, A. Gupta, et al. Hamster: Hierarchical action models for open-world robot manipulation. InInternational Conference on Learning Representations, volume 2025, pages 24040–24068, 2025

2025
[21]

D. Qu, H. Song, Q. Chen, Y . Yao, X. Ye, Y . Ding, Z. Wang, J. Gu, B. Zhao, D. Wang, et al. Spatialvla: Exploring spatial representations for visual-language-action model.arXiv preprint arXiv:2501.15830, 2025

Pith/arXiv arXiv 2025
[22]

Y . Ji, H. Tan, J. Shi, X. Hao, Y . Zhang, H. Zhang, P. Wang, M. Zhao, Y . Mu, P. An, et al. Robobrain: A unified brain model for robotic manipulation from abstract to concrete. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1724–1734, 2025

2025
[23]

Shukor, D

M. Shukor, D. Aubakirova, F. Capuano, P. Kooijmans, S. Palma, A. Zouitine, M. Aractingi, C. Pascal, M. Russi, A. Marafioti, et al. Smolvla: A vision-language-action model for afford- able and efficient robotics.arXiv preprint arXiv:2506.01844, 2025

Pith/arXiv arXiv 2025
[24]

Jiang, X

T. Jiang, X. Jiang, Y . Ma, X. Wen, B. Li, K. Zhan, P. Jia, Y . Liu, S. Sun, and X. Lang. The better you learn, the smarter you prune: Towards efficient vision-language-action models via differentiable token pruning.arXiv preprint arXiv:2509.12594, 2025

arXiv 2025
[25]

J. Wen, Y . Zhu, J. Li, M. Zhu, Z. Tang, K. Wu, Z. Xu, N. Liu, R. Cheng, C. Shen, et al. Tinyvla: Towards fast, data-efficient vision-language-action models for robotic manipulation. IEEE Robotics and Automation Letters, 2025

2025
[26]

Goyal, H

A. Goyal, H. Hadfield, X. Yang, V . Blukis, and F. Ramos. Vla-0: Building state-of-the-art vlas with zero modification.arXiv preprint arXiv:2510.13054, 2025

arXiv 2025
[27]

D. Niu, Y . Sharma, G. Biamby, J. Quenum, Y . Bai, B. Shi, T. Darrell, and R. Herzig. Llarva: Vision-action instruction tuning enhances robot learning.arXiv preprint arXiv:2406.11815, 2024. 10

arXiv 2024
[28]

Jia and P

R. Jia and P. Liang. Adversarial examples for evaluating reading comprehension systems. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2021–2031, Sept. 2017

2017
[29]

Ebrahimi, A

J. Ebrahimi, A. Rao, D. Lowd, and D. Dou. HotFlip: White-box adversarial examples for text classification. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 31–36, 2018

2018
[30]

J. Gao, J. Lanchantin, M. L. Soffa, and Y . Qi. Black-box generation of adversarial text se- quences to evade deep learning classifiers. In2018 IEEE security and privacy workshops (SPW), pages 50–56. IEEE, 2018

2018
[31]

J. Li, S. Ji, T. Du, B. Li, and T. Wang. Textbugger: Generating adversarial text against real- world applications.arXiv preprint arXiv:1812.05271, 2018

Pith/arXiv arXiv 2018
[32]

S. Ren, Y . Deng, K. He, and W. Che. Generating natural language adversarial examples through probability weighted word saliency. InProceedings of the 57th Annual Meeting of the Associ- ation for Computational Linguistics, pages 1085–1097, July 2019

2019
[33]

Alzantot, Y

M. Alzantot, Y . Sharma, A. Elgohary, B.-J. Ho, M. Srivastava, and K.-W. Chang. Generating natural language adversarial examples. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2890–2896, 2018

2018
[34]

D. Jin, Z. Jin, J. T. Zhou, and P. Szolovits. Is bert really robust? a strong baseline for natural language attack on text classification and entailment, 2020. URLhttps://arxiv.org/abs/ 1907.11932

arXiv 2020
[35]

L. Li, R. Ma, Q. Guo, X. Xue, and X. Qiu. BERT-ATTACK: Adversarial attack against BERT using BERT. InProceedings of the 2020 Conference on Empirical Methods in Natural Lan- guage Processing (EMNLP), pages 6193–6202, Nov. 2020

2020
[36]

D. Li, Y . Zhang, H. Peng, L. Chen, C. Brockett, M.-T. Sun, and B. Dolan. Contextualized perturbation for textual adversarial attack. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Tech- nologies, pages 5053–5069, June 2021

2021
[37]

Garg and G

S. Garg and G. Ramakrishnan. BAE: BERT-based adversarial examples for text classification. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Process- ing (EMNLP), pages 6174–6181, Nov. 2020. URLhttps://aclanthology.org/2020. emnlp-main.498/

2020
[38]

Boucher, I

N. Boucher, I. Shumailov, R. Anderson, and N. Papernot. Bad characters: Imperceptible nlp attacks, 2021. URLhttps://arxiv.org/abs/2106.09898

arXiv 2021
[39]

Wallace, S

E. Wallace, S. Feng, N. Kandpal, M. Gardner, and S. Singh. Universal adversarial triggers for attacking and analyzing NLP. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2153–2162, Nov. 2019

2019
[40]

T. Shin, Y . Razeghi, R. L. L. IV , E. Wallace, and S. Singh. Autoprompt: Eliciting knowledge from language models with automatically generated prompts, 2020. URLhttps://arxiv. org/abs/2010.15980

arXiv 2020
[41]

Schwinn, D

L. Schwinn, D. Dobre, S. Xhonneux, G. Gidel, and S. G ¨unnemann. Soft prompt threats: Attacking safety alignment and unlearning in open-source llms through the embedding space. Advances in Neural Information Processing Systems, 37:9086–9116, 2024

2024
[42]

X. Liu, N. Xu, M. Chen, and C. Xiao. Autodan: Generating stealthy jailbreak prompts on aligned large language models, 2024. URLhttps://arxiv.org/abs/2310.04451. 11

Pith/arXiv arXiv 2024
[43]

K. Zhu, J. Wang, J. Zhou, Z. Wang, H. Chen, Y . Wang, L. Yang, W. Ye, Y . Zhang, N. Z. Gong, and X. Xie. Promptrobust: Towards evaluating the robustness of large language models on adversarial prompts, 2024. URLhttps://arxiv.org/abs/2306.04528

arXiv 2024
[45]

H. Lu, Y . Yu, Y . Yang, C. Yi, Q. Zhang, B. Shen, A. C. Kot, and X. Jiang. When robots obey the patch: Universal transferable patch attacks on vision-language-action models, 2026. URL https://arxiv.org/abs/2511.21192

arXiv 2026
[46]

T. Wang, C. Han, J. C. Liang, W. Yang, D. Liu, L. X. Zhang, Q. Wang, J. Luo, and R. Tang. Exploring the adversarial vulnerabilities of vision-language-action models in robotics, 2025. URLhttps://arxiv.org/abs/2411.13587

arXiv 2025
[47]

X. Wang, J. Li, Z. Weng, Y . Wang, Y . Gao, T. Pang, C. Du, Y . Teng, Y . Wang, Z. Wu, X. Ma, and Y .-G. Jiang. Freezevla: Action-freezing attacks against vision-language-action models,
[48]

URLhttps://arxiv.org/abs/2509.19870

arXiv
[49]

Zhang, W

N. Zhang, W. Tao, X. Xiao, Q. Sun, Y . Zheng, W. Mo, P. Wang, and N. Zhang. Attention- guided patch-wise sparse adversarial attacks on vision-language-action models, 2025. URL https://arxiv.org/abs/2511.21663

arXiv 2025
[50]

Y . Yan, Y . Xie, Y . Zhang, L. Lyu, H. Wang, and Y . Jin. When alignment fails: Multimodal ad- versarial attacks on vision-language-action models, 2025. URLhttps://arxiv.org/abs/ 2511.16203

arXiv 2025
[51]

Q. Li, B. Yin, W. Huang, R. Liu, B. Zou, R. Yu, J. Ye, W. Yu, and X. Wang. Vision- language-action safety: Threats, challenges, evaluations, and mechanisms, 2026. URL https://arxiv.org/abs/2604.23775

Pith/arXiv arXiv 2026
[52]

Ebrahimi, A

J. Ebrahimi, A. Rao, D. Lowd, and D. Dou. Hotflip: White-box adversarial examples for text classification. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), 2018

2018
[53]

J. Gao, J. Lanchantin, M. L. Soffa, and Y . Qi. Black-box generation of adversarial text se- quences to evade deep learning classifiers. InIEEE Security and Privacy Workshops (SPW), 2018

2018
[54]

Wallace, S

E. Wallace, S. Feng, N. Kandpal, M. Gardner, and S. Singh. Universal adversarial triggers for attacking and analyzing nlp. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019

2019
[55]

B. Liu, Y . Zhu, C. Gao, Y . Feng, Q. Liu, Y . Zhu, and P. Stone. Libero: Benchmarking knowl- edge transfer for lifelong robot learning.Advances in Neural Information Processing Systems, 36:44776–44791, 2023

2023
[56]

Cadene, S

R. Cadene, S. Aliberts, F. Capuano, M. Aractingi, A. Zouitine, P. Kooijmans, J. Choghari, M. Russi, C. Pascal, S. Palma, et al. Lerobot: An open-source library for end-to-end robot learning.arXiv preprint arXiv:2602.22818, 2026

arXiv 2026
[57]

bowl on plate

NVIDIA, J. Bjorck, F. Casta ˜neda, N. Cherniadev, X. Da, R. Ding, L. Fan, Y . Fang, D. Fox, F. Hu, et al. GR00T N1: An open foundation model for generalist humanoid robots.arXiv preprint arXiv:2503.14734, 2025. 12 Appendix Contents A Experimental Details and Evaluation Protocol 14 A.1 LIBERO Setup and Model Inference Details . . . . . . . . . . . . . . . ...

Pith/arXiv arXiv 2025

[1] [2]

M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi, et al. Openvla: An open-source vision-language-action model.arXiv preprint arXiv:2406.09246, 2024

Pith/arXiv arXiv 2024

[2] [3]

Zitkovich, T

B. Zitkovich, T. Yu, S. Xu, P. Xu, T. Xiao, F. Xia, J. Wu, P. Wohlhart, S. Welker, A. Wahid, et al. Rt-2: Vision-language-action models transfer web knowledge to robotic control. In Conference on Robot Learning, pages 2165–2183. PMLR, 2023

2023

[3] [4]

Black, N

K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, L. Groom, K. Hausman, B. Ichter, et al.pi 0: A vision-language-action flow model for general robot control.arXiv preprint arXiv:2410.24164, 2024

Pith/arXiv arXiv 2024

[4] [5]

E. K. Jones, A. Robey, A. Zou, Z. Ravichandran, G. J. Pappas, H. Hassani, M. Fredrikson, and J. Z. Kolter. Adversarial attacks on robotic vision language action models.arXiv preprint arXiv:2506.03350, 2025

arXiv 2025

[5] [6]

A. Zou, Z. Wang, N. Carlini, M. Nasr, J. Z. Kolter, and M. Fredrikson. Universal and transfer- able adversarial attacks on aligned language models.arXiv preprint arXiv:2307.15043, 2023

Pith/arXiv arXiv 2023

[6] [8]

S. Ross, G. Gordon, and D. Bagnell. A reduction of imitation learning and structured predic- tion to no-regret online learning. InProceedings of the fourteenth international conference on artificial intelligence and statistics, pages 627–635. JMLR Workshop and Conference Pro- ceedings, 2011

2011

[7] [9]

Brohan, N

A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Haus- man, A. Herzog, J. Hsu, et al. Rt-1: Robotics transformer for real-world control at scale.arXiv preprint arXiv:2212.06817, 2022

Pith/arXiv arXiv 2022

[8] [10]

O’Neill, A

A. O’Neill, A. Rehman, A. Maddukuri, A. Gupta, A. Padalkar, A. Lee, A. Pooley, A. Gupta, A. Mandlekar, A. Jain, et al. Open x-embodiment: Robotic learning datasets and rt-x models: Open x-embodiment collaboration 0. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 6892–6903. IEEE, 2024

2024

[9] [11]

Driess, F

D. Driess, F. Xia, M. S. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. Tompson, Q. Vuong, T. Yu, et al. Palm-e: An embodied multimodal language model.arXiv preprint arXiv:2303.03378, 2023

Pith/arXiv arXiv 2023

[10] [12]

J. Lee, J. Duan, H. Fang, Y . Deng, S. Liu, B. Li, B. Fang, J. Zhang, Y . R. Wang, S. Lee, et al. Molmoact: Action reasoning models that can reason in space.arXiv preprint arXiv:2508.07917, 2025. 9

Pith/arXiv arXiv 2025

[11] [13]

Pertsch, K

K. Pertsch, K. Stachowicz, B. Ichter, D. Driess, S. Nair, Q. Vuong, O. Mees, C. Finn, and S. Levine. FAST: Efficient action tokenization for vision-language-action models.arXiv preprint arXiv:2501.09747, 2025

Pith/arXiv arXiv 2025

[12] [14]

Y . Wang, H. Zhu, M. Liu, J. Yang, H.-S. Fang, and T. He. VQ-VLA: Improving vision- language-action models via scaling vector-quantized action tokenizers. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025

2025

[13] [15]

Bousmalis, G

K. Bousmalis, G. Vezzani, D. Rao, C. Devin, A. X. Lee, M. Bauza, T. Davchev, Y . Zhou, A. Gupta, et al. RoboCat: A self-improving generalist agent for robotic manipulation.Trans- actions on Machine Learning Research (TMLR), 2023

2023

[14] [16]

M. J. Kim, C. Finn, and P. Liang. Fine-tuning vision-language-action models: Optimizing speed and success.arXiv preprint arXiv:2502.19645, 2025

Pith/arXiv arXiv 2025

[15] [17]

Ghosh, H

Octo Model Team, D. Ghosh, H. Walke, K. Pertsch, K. Black, O. Mees, S. Dasari, J. Hejna, T. Kreiman, C. Xu, J. Luo, Y . L. Tan, P. Sanketi, Q. Vuong, T. Xiao, D. Sadigh, C. Finn, and S. Levine. Octo: An open-source generalist robot policy.arXiv preprint arXiv:2405.12213, 2024

Pith/arXiv arXiv 2024

[16] [18]

Black, N

Physical Intelligence, K. Black, N. Brown, J. Darpinian, K. Dhabalia, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, et al.π 0.5: a vision-language-action model with open-world generalization.arXiv preprint arXiv:2504.16054, 2025

Pith/arXiv arXiv 2025

[17] [19]

S. Liu, L. Wu, B. Li, H. Tan, H. Chen, Z. Wang, K. Xu, H. Su, and J. Zhu. RDT-1B: a diffusion foundation model for bimanual manipulation.arXiv preprint arXiv:2410.07864, 2024

Pith/arXiv arXiv 2024

[18] [20]

Y . Li, Y . Deng, J. Zhang, J. Jang, M. Memmel, C. Garrett, F. Ramos, D. Fox, A. Li, A. Gupta, et al. Hamster: Hierarchical action models for open-world robot manipulation. InInternational Conference on Learning Representations, volume 2025, pages 24040–24068, 2025

2025

[19] [21]

D. Qu, H. Song, Q. Chen, Y . Yao, X. Ye, Y . Ding, Z. Wang, J. Gu, B. Zhao, D. Wang, et al. Spatialvla: Exploring spatial representations for visual-language-action model.arXiv preprint arXiv:2501.15830, 2025

Pith/arXiv arXiv 2025

[20] [22]

Y . Ji, H. Tan, J. Shi, X. Hao, Y . Zhang, H. Zhang, P. Wang, M. Zhao, Y . Mu, P. An, et al. Robobrain: A unified brain model for robotic manipulation from abstract to concrete. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1724–1734, 2025

2025

[21] [23]

Shukor, D

M. Shukor, D. Aubakirova, F. Capuano, P. Kooijmans, S. Palma, A. Zouitine, M. Aractingi, C. Pascal, M. Russi, A. Marafioti, et al. Smolvla: A vision-language-action model for afford- able and efficient robotics.arXiv preprint arXiv:2506.01844, 2025

Pith/arXiv arXiv 2025

[22] [24]

Jiang, X

T. Jiang, X. Jiang, Y . Ma, X. Wen, B. Li, K. Zhan, P. Jia, Y . Liu, S. Sun, and X. Lang. The better you learn, the smarter you prune: Towards efficient vision-language-action models via differentiable token pruning.arXiv preprint arXiv:2509.12594, 2025

arXiv 2025

[23] [25]

J. Wen, Y . Zhu, J. Li, M. Zhu, Z. Tang, K. Wu, Z. Xu, N. Liu, R. Cheng, C. Shen, et al. Tinyvla: Towards fast, data-efficient vision-language-action models for robotic manipulation. IEEE Robotics and Automation Letters, 2025

2025

[24] [26]

Goyal, H

A. Goyal, H. Hadfield, X. Yang, V . Blukis, and F. Ramos. Vla-0: Building state-of-the-art vlas with zero modification.arXiv preprint arXiv:2510.13054, 2025

arXiv 2025

[25] [27]

D. Niu, Y . Sharma, G. Biamby, J. Quenum, Y . Bai, B. Shi, T. Darrell, and R. Herzig. Llarva: Vision-action instruction tuning enhances robot learning.arXiv preprint arXiv:2406.11815, 2024. 10

arXiv 2024

[26] [28]

Jia and P

R. Jia and P. Liang. Adversarial examples for evaluating reading comprehension systems. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2021–2031, Sept. 2017

2017

[27] [29]

Ebrahimi, A

J. Ebrahimi, A. Rao, D. Lowd, and D. Dou. HotFlip: White-box adversarial examples for text classification. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 31–36, 2018

2018

[28] [30]

J. Gao, J. Lanchantin, M. L. Soffa, and Y . Qi. Black-box generation of adversarial text se- quences to evade deep learning classifiers. In2018 IEEE security and privacy workshops (SPW), pages 50–56. IEEE, 2018

2018

[29] [31]

J. Li, S. Ji, T. Du, B. Li, and T. Wang. Textbugger: Generating adversarial text against real- world applications.arXiv preprint arXiv:1812.05271, 2018

Pith/arXiv arXiv 2018

[30] [32]

S. Ren, Y . Deng, K. He, and W. Che. Generating natural language adversarial examples through probability weighted word saliency. InProceedings of the 57th Annual Meeting of the Associ- ation for Computational Linguistics, pages 1085–1097, July 2019

2019

[31] [33]

Alzantot, Y

M. Alzantot, Y . Sharma, A. Elgohary, B.-J. Ho, M. Srivastava, and K.-W. Chang. Generating natural language adversarial examples. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2890–2896, 2018

2018

[32] [34]

D. Jin, Z. Jin, J. T. Zhou, and P. Szolovits. Is bert really robust? a strong baseline for natural language attack on text classification and entailment, 2020. URLhttps://arxiv.org/abs/ 1907.11932

arXiv 2020

[33] [35]

L. Li, R. Ma, Q. Guo, X. Xue, and X. Qiu. BERT-ATTACK: Adversarial attack against BERT using BERT. InProceedings of the 2020 Conference on Empirical Methods in Natural Lan- guage Processing (EMNLP), pages 6193–6202, Nov. 2020

2020

[34] [36]

D. Li, Y . Zhang, H. Peng, L. Chen, C. Brockett, M.-T. Sun, and B. Dolan. Contextualized perturbation for textual adversarial attack. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Tech- nologies, pages 5053–5069, June 2021

2021

[35] [37]

Garg and G

S. Garg and G. Ramakrishnan. BAE: BERT-based adversarial examples for text classification. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Process- ing (EMNLP), pages 6174–6181, Nov. 2020. URLhttps://aclanthology.org/2020. emnlp-main.498/

2020

[36] [38]

Boucher, I

N. Boucher, I. Shumailov, R. Anderson, and N. Papernot. Bad characters: Imperceptible nlp attacks, 2021. URLhttps://arxiv.org/abs/2106.09898

arXiv 2021

[37] [39]

Wallace, S

E. Wallace, S. Feng, N. Kandpal, M. Gardner, and S. Singh. Universal adversarial triggers for attacking and analyzing NLP. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2153–2162, Nov. 2019

2019

[38] [40]

T. Shin, Y . Razeghi, R. L. L. IV , E. Wallace, and S. Singh. Autoprompt: Eliciting knowledge from language models with automatically generated prompts, 2020. URLhttps://arxiv. org/abs/2010.15980

arXiv 2020

[39] [41]

Schwinn, D

L. Schwinn, D. Dobre, S. Xhonneux, G. Gidel, and S. G ¨unnemann. Soft prompt threats: Attacking safety alignment and unlearning in open-source llms through the embedding space. Advances in Neural Information Processing Systems, 37:9086–9116, 2024

2024

[40] [42]

X. Liu, N. Xu, M. Chen, and C. Xiao. Autodan: Generating stealthy jailbreak prompts on aligned large language models, 2024. URLhttps://arxiv.org/abs/2310.04451. 11

Pith/arXiv arXiv 2024

[41] [43]

K. Zhu, J. Wang, J. Zhou, Z. Wang, H. Chen, Y . Wang, L. Yang, W. Ye, Y . Zhang, N. Z. Gong, and X. Xie. Promptrobust: Towards evaluating the robustness of large language models on adversarial prompts, 2024. URLhttps://arxiv.org/abs/2306.04528

arXiv 2024

[42] [45]

H. Lu, Y . Yu, Y . Yang, C. Yi, Q. Zhang, B. Shen, A. C. Kot, and X. Jiang. When robots obey the patch: Universal transferable patch attacks on vision-language-action models, 2026. URL https://arxiv.org/abs/2511.21192

arXiv 2026

[43] [46]

T. Wang, C. Han, J. C. Liang, W. Yang, D. Liu, L. X. Zhang, Q. Wang, J. Luo, and R. Tang. Exploring the adversarial vulnerabilities of vision-language-action models in robotics, 2025. URLhttps://arxiv.org/abs/2411.13587

arXiv 2025

[44] [47]

X. Wang, J. Li, Z. Weng, Y . Wang, Y . Gao, T. Pang, C. Du, Y . Teng, Y . Wang, Z. Wu, X. Ma, and Y .-G. Jiang. Freezevla: Action-freezing attacks against vision-language-action models,

[45] [48]

URLhttps://arxiv.org/abs/2509.19870

arXiv

[46] [49]

Zhang, W

N. Zhang, W. Tao, X. Xiao, Q. Sun, Y . Zheng, W. Mo, P. Wang, and N. Zhang. Attention- guided patch-wise sparse adversarial attacks on vision-language-action models, 2025. URL https://arxiv.org/abs/2511.21663

arXiv 2025

[47] [50]

Y . Yan, Y . Xie, Y . Zhang, L. Lyu, H. Wang, and Y . Jin. When alignment fails: Multimodal ad- versarial attacks on vision-language-action models, 2025. URLhttps://arxiv.org/abs/ 2511.16203

arXiv 2025

[48] [51]

Q. Li, B. Yin, W. Huang, R. Liu, B. Zou, R. Yu, J. Ye, W. Yu, and X. Wang. Vision- language-action safety: Threats, challenges, evaluations, and mechanisms, 2026. URL https://arxiv.org/abs/2604.23775

Pith/arXiv arXiv 2026

[49] [52]

Ebrahimi, A

J. Ebrahimi, A. Rao, D. Lowd, and D. Dou. Hotflip: White-box adversarial examples for text classification. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), 2018

2018

[50] [53]

J. Gao, J. Lanchantin, M. L. Soffa, and Y . Qi. Black-box generation of adversarial text se- quences to evade deep learning classifiers. InIEEE Security and Privacy Workshops (SPW), 2018

2018

[51] [54]

Wallace, S

E. Wallace, S. Feng, N. Kandpal, M. Gardner, and S. Singh. Universal adversarial triggers for attacking and analyzing nlp. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019

2019

[52] [55]

B. Liu, Y . Zhu, C. Gao, Y . Feng, Q. Liu, Y . Zhu, and P. Stone. Libero: Benchmarking knowl- edge transfer for lifelong robot learning.Advances in Neural Information Processing Systems, 36:44776–44791, 2023

2023

[53] [56]

Cadene, S

R. Cadene, S. Aliberts, F. Capuano, M. Aractingi, A. Zouitine, P. Kooijmans, J. Choghari, M. Russi, C. Pascal, S. Palma, et al. Lerobot: An open-source library for end-to-end robot learning.arXiv preprint arXiv:2602.22818, 2026

arXiv 2026

[54] [57]

bowl on plate

NVIDIA, J. Bjorck, F. Casta ˜neda, N. Cherniadev, X. Da, R. Ding, L. Fan, Y . Fang, D. Fox, F. Hu, et al. GR00T N1: An open foundation model for generalist humanoid robots.arXiv preprint arXiv:2503.14734, 2025. 12 Appendix Contents A Experimental Details and Evaluation Protocol 14 A.1 LIBERO Setup and Model Inference Details . . . . . . . . . . . . . . . ...

Pith/arXiv arXiv 2025