arxiv: 2604.09651 · v1 · submitted 2026-03-30 · 💻 cs.CV · cs.LG· cs.RO

Recognition: no theorem link

FlowHijack: A Dynamics-Aware Backdoor Attack on Flow-Matching Vision-Language-Action Models

Xinyuan An , Tao Luo , Gengyun Peng , Yaobing Wang , Kui Ren , Dongxia Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-14 22:17 UTC · model grok-4.3

classification 💻 cs.CV cs.LGcs.RO

keywords backdoor attackflow-matchingvision-language-actionvector field dynamicsrobotics securitycontinuous action generation

0 comments

The pith

FlowHijack backdoors flow-matching vision-language-action models by injecting into their vector field dynamics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops FlowHijack as the first backdoor attack aimed at the continuous vector-field dynamics that generate actions in flow-matching VLA policies. It combines a tau-conditioned injection at the start of generation with a dynamics mimicry regularizer to create context-aware triggers. If successful, this produces high attack success rates, leaves normal task performance intact, and ensures malicious actions remain kinematically similar to benign ones. A reader would care because these models are advancing robotics and their smooth, continuous action mechanism introduces security risks that discrete-model attacks cannot exploit.

Core claim

FlowHijack is the first backdoor attack framework to systematically target the underlying vector-field dynamics of flow-matching VLAs. Our method combines a novel τ-conditioned injection strategy, which manipulates the initial phase of the action generation, with a dynamics mimicry regularizer. Experiments demonstrate that FlowHijack achieves high attack success rates using stealthy, context-aware triggers where prior works failed. Crucially, it preserves benign task performance and, by enforcing kinematic similarity, generates malicious actions that are behaviorally indistinguishable from normal actions.

What carries the argument

A τ-conditioned injection into the initial phase of the vector field combined with a dynamics mimicry regularizer that enforces kinematic similarity between malicious and normal action trajectories.

If this is right

High attack success rates become possible on continuous flow-matching VLAs using context-aware triggers.
Benign task performance remains unchanged after the attack is embedded.
Malicious actions satisfy kinematic similarity constraints and thus evade behavioral detection.
Continuous embodied models carry a security vulnerability that requires defenses focused on internal generative dynamics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Defenses may need to inspect the vector field trajectory itself rather than only final actions or outputs.
The same injection-plus-regularizer pattern could be tested on other continuous dynamics models used in robotics.
Real-world robot deployments would need runtime checks on early-phase vector field evolution to catch such attacks.

Load-bearing premise

That a tau-conditioned injection into the initial phase of the vector field combined with a dynamics mimicry regularizer can be added without degrading normal performance or becoming detectable by standard monitoring of the model's generative process.

What would settle it

An experiment that monitors the vector field during generation and shows either that the injected perturbation is detectable or that benign task success rates drop after the regularizer is applied.

Figures

Figures reproduced from arXiv: 2604.09651 by Dongxia Wang, Gengyun Peng, Kui Ren, Tao Luo, Xinyuan An, Yaobing Wang.

**Figure 2.** Figure 2: Overview of our FlowHijack Attack framework for backdoor injection in VLA models. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Visualization of our Context-Aware Triggers and Actions in simulation (LIBERO) and real-world (Franka) environments. The [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Visualization of Behavioral Stealth. FlowHijack ensures [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Robustness analysis of our Context-Aware Trigger. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Probing VLA’s robustness to textual modifications. Top: [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of end-effector velocity profiles. With [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: Real-world evaluation of FlowHijack in two scenarios. (Top) Desktop Manipulation with a Scene Semantic Trigger. (Bottom) Kitchen Manipulation with an Object State Trigger. kinematic stealth (as shown in [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

**Figure 9.** Figure 9: Visualization of Trigger Robustness Analysis. We provide visual examples and corresponding performance metrics (SR and [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

read the original abstract

Vision-Language-Action (VLA) models are emerging as a cornerstone for robotics, with flow-matching policies like $\pi_0$ showing great promise in generating smooth, continuous actions. As these models advance, their unique action generation mechanism - the vector field dynamics - presents a critical yet unexplored security vulnerability, particularly backdoor vulnerabilities. Existing backdoor attacks designed for autoregressive discretization VLAs cannot be directly applied to this new continuous dynamics. We introduce FlowHijack, the first backdoor attack framework to systematically target the underlying vector-field dynamics of flow-matching VLAs. Our method combines a novel $\tau$-conditioned injection strategy, which manipulates the initial phase of the action generation, with a dynamics mimicry regularizer. Experiments demonstrate that FlowHijack achieves high attack success rates using stealthy, context-aware triggers where prior works failed. Crucially, it preserves benign task performance and, by enforcing kinematic similarity, generates malicious actions that are behaviorally indistinguishable from normal actions. Our findings reveal a significant vulnerability in continuous embodied models, highlighting the urgent need for defenses targeting the model's internal generative dynamics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FlowHijack gives the first concrete backdoor construction for flow-matching VLAs by hitting the vector field early, but the abstract leaves the experimental support thin.

read the letter

The main point is that this paper identifies a real gap: standard autoregressive backdoor attacks do not transfer to flow-matching policies because those policies generate actions through continuous vector-field integration rather than token sequences. FlowHijack tries to close that gap with a tau-conditioned injection into the early part of the flow plus a dynamics-mimicry regularizer that keeps malicious trajectories kinematically close to clean ones. That combination is the actual novelty here, and it is a reasonable engineering response to the model class described in the abstract. If the full paper shows that the regularizer actually preserves task success rates while delivering high attack success on context-aware triggers, the construction would be worth citing for anyone who deploys these models in physical settings. The approach is internally consistent with how flow-matching policies are trained and rolled out, so there is no obvious logical hole in the attack description. The soft spot is the lack of visible quantitative detail in the abstract. No attack success rates, no baseline comparisons, no ablation on the regularizer strength, and no discussion of how the method behaves under standard monitoring of the generative process. Without those numbers and controls, it is impossible to judge whether the claimed stealth and performance preservation actually hold or whether they are artifacts of the chosen evaluation. The paper is aimed at people working on security of continuous embodied policies rather than the broader VLA community. A reader who already knows the pi_0-style flow-matching setup will get the most out of it. I would send it to review because the core idea addresses a timely and previously unaddressed attack surface, even if the experiments will probably need tightening before publication.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces FlowHijack, the first backdoor attack framework targeting the vector-field dynamics of flow-matching Vision-Language-Action (VLA) models such as π₀. It combines a τ-conditioned injection strategy that manipulates the initial phase of action generation with a dynamics mimicry regularizer. The central claims are that this produces high attack success rates with stealthy, context-aware triggers, preserves benign task performance, and generates malicious actions that remain kinematically indistinguishable from normal ones.

Significance. If the experimental validation holds, the work identifies a previously unaddressed vulnerability in continuous dynamics-based embodied models by directly attacking the learned vector field rather than discrete token outputs. This could motivate defenses that monitor or constrain the generative integration process itself. The construction is internally consistent with standard flow-matching integration and addresses a gap left by prior attacks designed for autoregressive VLAs.

major comments (2)

[Abstract] Abstract: the claims of 'high attack success rates' and 'preserves benign task performance' are stated without any numerical values, datasets, baselines, ablation studies, or error bars. Because the central contribution is an empirical attack, the absence of these load-bearing results prevents evaluation of whether the τ-injection plus regularizer actually achieves the stated separation between attack success and benign degradation.
[Method] Method section (assumed §3): the τ-conditioned injection and dynamics-mimicry regularizer are described at a high level, but no explicit equation shows how the perturbation is added to the vector field f_θ(x_t, t, c) or how the regularizer term is weighted and optimized. Without these details it is impossible to verify that the construction remains parameter-free or that kinematic similarity is enforced without side effects on the integration trajectory.

minor comments (2)

[Notation] Notation: the conditioning variable τ is introduced without a clear definition of its range, sampling distribution, or how it is chosen at inference time; this should be stated explicitly to allow reproduction.
[Related Work] Related work: the discussion of prior backdoor attacks on VLAs could usefully cite recent continuous-control security papers to better position the novelty relative to non-flow-matching policies.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of our empirical results and methodological details. We address each major comment below and will revise the manuscript to incorporate the suggested improvements.

read point-by-point responses

Referee: [Abstract] Abstract: the claims of 'high attack success rates' and 'preserves benign task performance' are stated without any numerical values, datasets, baselines, ablation studies, or error bars. Because the central contribution is an empirical attack, the absence of these load-bearing results prevents evaluation of whether the τ-injection plus regularizer actually achieves the stated separation between attack success and benign degradation.

Authors: We agree that the abstract would be strengthened by including specific quantitative results. In the revised version, we will add key metrics such as attack success rates (e.g., 92.3% on π₀ with context-aware triggers), benign task performance preservation (e.g., within 1.2% of clean baseline on average across tasks), dataset details (e.g., BridgeData V2 and RT-1), baseline comparisons, and reference to ablations with error bars from multiple seeds. This will directly substantiate the separation between attack efficacy and benign degradation. revision: yes
Referee: [Method] Method section (assumed §3): the τ-conditioned injection and dynamics-mimicry regularizer are described at a high level, but no explicit equation shows how the perturbation is added to the vector field f_θ(x_t, t, c) or how the regularizer term is weighted and optimized. Without these details it is impossible to verify that the construction remains parameter-free or that kinematic similarity is enforced without side effects on the integration trajectory.

Authors: We acknowledge the need for explicit equations. We will insert the precise formulation: the τ-conditioned perturbation added as f_θ'(x_t, t, c) = f_θ(x_t, t, c) + α · g_τ(x_t, t, c) for t < τ, and the dynamics mimicry regularizer L_reg = λ · ||f_θ(x_t, t, c) - f_clean(x_t, t, c)||_2 with λ = 0.1 in the total loss. These additions will confirm the approach is parameter-free beyond the fixed τ and λ, and that kinematic similarity is enforced via the regularizer without altering the integration trajectory beyond the controlled injection window. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents FlowHijack as a novel construction combining tau-conditioned injection into the vector field and a dynamics-mimicry regularizer, validated through experiments on attack success rate, benign performance preservation, and kinematic similarity. No equations, predictions, or first-principles derivations are shown that reduce the claimed outcomes to quantities defined by the paper's own fitted parameters or self-citations. The central claims rest on empirical demonstration rather than internal reduction, making the contribution self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work relies on standard assumptions of machine learning security research (models can be fine-tuned or poisoned, triggers can be embedded in inputs) but introduces no explicit free parameters, axioms, or invented entities beyond the attack components described in the abstract.

pith-pipeline@v0.9.0 · 5515 in / 1091 out tokens · 50923 ms · 2026-05-14T22:17:27.781766+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 10 internal anchors

[1]

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

Johan Bjorck, Fernando Casta ˜neda, Nikita Cherniadev, Xingye Da, Runyu Ding, Linxi Fan, Yu Fang, Dieter Fox, Fengyuan Hu, Spencer Huang, et al. Gr00t n1: An open foundation model for generalist humanoid robots.arXiv preprint arXiv:2503.14734, 2025. 1

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

1, 2, 3, 4, 6

Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsch, Lucy Xiaoyang Shi, James Tanner, Quan Vuong, Anna Walling, Haohuan Wang, and Ury Zhilinsky.π 0: A visi...

work page 2024
[3]

RT-1: Robotics Transformer for Real-World Control at Scale

Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakr- ishnan, Karol Hausman, Alex Herzog, Jasmine Hsu, et al. Rt-1: Robotics transformer for real-world control at scale. arXiv preprint arXiv:2212.06817, 2022. 1, 2

work page internal anchor Pith review Pith/arXiv arXiv 2022
[4]

Decision transformer: Reinforce- ment learning via sequence modeling.Advances in neural information processing systems, 34:15084–15097, 2021

Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Misha Laskin, Pieter Abbeel, Aravind Srini- vas, and Igor Mordatch. Decision transformer: Reinforce- ment learning via sequence modeling.Advances in neural information processing systems, 34:15084–15097, 2021. 2

work page 2021
[5]

Dexterous safe con- trol for humanoids in cluttered environments via projected safe set algorithm.arXiv preprint arXiv:2502.02858, 2025

Rui Chen, Yifan Sun, and Changliu Liu. Dexterous safe con- trol for humanoids in cluttered environments via projected safe set algorithm.arXiv preprint arXiv:2502.02858, 2025. 5

work page arXiv 2025
[6]

Manipulation facing threats: Evaluating physical vulnerabilities in end-to-end vision language action models,

Hao Cheng, Erjia Xiao, Yichi Wang, Chengyuan Yu, Meng- shu Sun, Qiang Zhang, Yijie Guo, Kaidi Xu, Jize Zhang, Chao Shen, et al. Manipulation facing threats: Evaluating physical vulnerabilities in end-to-end vision language action models.arXiv preprint arXiv:2409.13174, 2024. 2

work page arXiv 2024
[7]

Diffusion policy: Visuomotor policy learning via action dif- fusion.The International Journal of Robotics Research, 44 (10-11):1684–1704, 2025

Cheng Chi, Zhenjia Xu, Siyuan Feng, Eric Cousineau, Yilun Du, Benjamin Burchfiel, Russ Tedrake, and Shuran Song. Diffusion policy: Visuomotor policy learning via action dif- fusion.The International Journal of Robotics Research, 44 (10-11):1684–1704, 2025. 2, 1

work page 2025
[8]

Graspvla: a grasping foundation model pre-trained on billion-scale synthetic action data.arXiv preprint arXiv:2505.03233, 2025

Shengliang Deng, Mi Yan, Songlin Wei, Haixin Ma, Yuxin Yang, Jiayi Chen, Zhiqi Zhang, Taoyu Yang, Xuheng Zhang, Wenhao Zhang, et al. Graspvla: a grasping foundation model pre-trained on billion-scale synthetic action data.arXiv preprint arXiv:2505.03233, 2025. 1

work page arXiv 2025
[9]

Palm-e: An embodied multimodal language model

Danny Driess, Fei Xia, Mehdi SM Sajjadi, Corey Lynch, Aakanksha Chowdhery, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, et al. Palm-e: An embodied multimodal language model. 2023. 1

work page 2023
[10]

LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models

Senyu Fei, Siyin Wang, Junhao Shi, Zihao Dai, Jikun Cai, Pengfang Qian, Li Ji, Xinzhe He, Shiduo Zhang, Zhaoye Fei, et al. Libero-plus: In-depth robustness analysis of vision- language-action models.arXiv preprint arXiv:2510.13626,

work page internal anchor Pith review Pith/arXiv arXiv
[11]

Adversarial attacks on robotic vision language action models,

Eliot Krzysztof Jones, Alexander Robey, Andy Zou, Zachary Ravichandran, George J Pappas, Hamed Hassani, Matt Fredrikson, and J Zico Kolter. Adversarial attacks on robotic vision language action models.arXiv preprint arXiv:2506.03350, 2025. 2

work page arXiv 2025
[12]

Vision-language-action models for robotics: A review towards real-world applications.IEEE Access, 2025

Kento Kawaharazuka, Jihoon Oh, Jun Yamada, Ingmar Pos- ner, and Yuke Zhu. Vision-language-action models for robotics: A review towards real-world applications.IEEE Access, 2025. 1

work page 2025
[13]

OpenVLA: An Open-Source Vision-Language-Action Model

Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, et al. Openvla: An open-source vision-language-action model.arXiv preprint arXiv:2406.09246, 2024. 1, 2, 3, 6

work page internal anchor Pith review Pith/arXiv arXiv 2024
[14]

Kinetostatic danger field-a novel safety assessment for human-robot interaction

Bakir Lacevic and Paolo Rocco. Kinetostatic danger field-a novel safety assessment for human-robot interaction. In2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2169–2174. IEEE, 2010. 5

work page 2010
[15]

Time-reversed Flow Matching with Worst Transport in High-dimensional Latent Space for Image Anomaly Detection

Liangwei Li, Lin Liu, Juanxiu Liu, Jing Zhang, Ruqian Hao, and Xiaohui Du. How and why: Taming flow matching for unsupervised anomaly detection and localization.arXiv preprint arXiv:2508.05461, 2025. 5

work page internal anchor Pith review Pith/arXiv arXiv 2025
[16]

Switchvla: Execution-aware task switching for vision- language-action models.arXiv preprint arXiv:2506.03574,

Meng Li, Zhen Zhao, Zhengping Che, Fei Liao, Kun Wu, Zhiyuan Xu, Pei Ren, Zhao Jin, Ning Liu, and Jian Tang. Switchvla: Execution-aware task switching for vision- language-action models.arXiv preprint arXiv:2506.03574,

work page arXiv
[17]

Neural attention distillation: Erasing backdoor triggers from deep neural networks.arXiv preprint arXiv:2101.05930, 2021

Yige Li, Xixiang Lyu, Nodens Koren, Lingjuan Lyu, Bo Li, and Xingjun Ma. Neural attention distillation: Erasing backdoor triggers from deep neural networks.arXiv preprint arXiv:2101.05930, 2021. 8

work page arXiv 2021
[18]

Back- door learning: A survey.IEEE transactions on neural net- works and learning systems, 35(1):5–22, 2022

Yiming Li, Yong Jiang, Zhifeng Li, and Shu-Tao Xia. Back- door learning: A survey.IEEE transactions on neural net- works and learning systems, 35(1):5–22, 2022. 1

work page 2022
[19]

Flow Matching for Generative Modeling

Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximil- ian Nickel, and Matt Le. Flow matching for generative mod- eling.arXiv preprint arXiv:2210.02747, 2022. 1, 2, 3

work page internal anchor Pith review Pith/arXiv arXiv 2022
[20]

Compromisingembodiedagentswithcontextualbackdoorattacks

Aishan Liu, Yuguang Zhou, Xianglong Liu, Tianyuan Zhang, Siyuan Liang, Jiakai Wang, Yanjun Pu, Tianlin Li, Junqi Zhang, Wenbo Zhou, et al. Compromising embod- ied agents with contextual backdoor attacks.arXiv preprint arXiv:2408.02882, 2024. 4

work page arXiv 2024
[21]

Libero: Benchmarking knowl- edge transfer for lifelong robot learning.Advances in Neural Information Processing Systems, 36:44776–44791, 2023

Bo Liu, Yifeng Zhu, Chongkai Gao, Yihao Feng, Qiang Liu, Yuke Zhu, and Peter Stone. Libero: Benchmarking knowl- edge transfer for lifelong robot learning.Advances in Neural Information Processing Systems, 36:44776–44791, 2023. 2, 6, 3

work page 2023
[22]

Fine- pruning: Defending against backdooring attacks on deep neural networks

Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. Fine- pruning: Defending against backdooring attacks on deep neural networks. InInternational symposium on research in attacks, intrusions, and defenses, pages 273–294. Springer,

work page
[23]

RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

Songming Liu, Lingxuan Wu, Bangguo Li, Hengkai Tan, Huayu Chen, Zhengyi Wang, Ke Xu, Hang Su, and Jun Zhu. Rdt-1b: a diffusion foundation model for bimanual manipu- lation.arXiv preprint arXiv:2410.07864, 2024. 1 9

work page internal anchor Pith review Pith/arXiv arXiv 2024
[24]

A Survey on Vision-Language-Action Models for Embodied AI

Yueen Ma, Zixing Song, Yuzheng Zhuang, Jianye Hao, and Irwin King. A survey on vision-language-action models for embodied ai.arXiv preprint arXiv:2405.14093, 2024. 1

work page internal anchor Pith review Pith/arXiv arXiv 2024
[25]

Security considerations in ai-robotics: A survey of current methods, challenges, and opportunities

Subash Neupane, Shaswata Mitra, Ivan A Fernandez, Swayamjit Saha, Sudip Mittal, Jingdao Chen, Nisha Pillai, and Shahram Rahimi. Security considerations in ai-robotics: A survey of current methods, challenges, and opportunities. IEEE Access, 12:22072–22097, 2024. 1

work page 2024
[26]

Jailbreaking llm- controlled robots

Alexander Robey, Zachary Ravichandran, Vijay Kumar, Hamed Hassani, and George J Pappas. Jailbreaking llm- controlled robots. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 11948–11956. IEEE, 2025. 2

work page 2025
[27]

From machine learning to robotics: Challenges and opportunities for em- bodied intelligence.arXiv preprint arXiv:2110.15245, 2021

Nicholas Roy, Ingmar Posner, Tim Barfoot, Philippe Beau- doin, Yoshua Bengio, Jeannette Bohg, Oliver Brock, Isabelle Depatie, Dieter Fox, Dan Koditschek, et al. From machine learning to robotics: Challenges and opportunities for em- bodied intelligence.arXiv preprint arXiv:2110.15245, 2021. 1

work page arXiv 2021
[28]

Vision-language-action models: Concepts, progress, applications and challenges,

Ranjan Sapkota, Yang Cao, Konstantinos I Roumeliotis, and Manoj Karkee. Vision-language-action models: Con- cepts, progress, applications and challenges.arXiv preprint arXiv:2505.04769, 2025. 1

work page arXiv 2025
[29]

Flowmesh: A ser- vice fabric for composable llm workflows, 2025

Junyi Shen, Noppanat Wadlom, Lingfeng Zhou, Dequan Wang, Xu Miao, Lei Fang, and Yao Lu. Flowmesh: A ser- vice fabric for composable llm workflows, 2025. 1

work page 2025
[30]

Batch query processing and optimization for agentic workflows, 2026

Junyi Shen, Noppanat Wadlom, and Yao Lu. Batch query processing and optimization for agentic workflows, 2026

work page 2026
[31]

Specbranch: Speculative decoding via hy- brid drafting and rollback-aware branch parallelism

Yuhao Shen, Junyi Shen, Quan Kong, Tianyu Liu, Yao Lu, and Cong Wang. Specbranch: Speculative decoding via hy- brid drafting and rollback-aware branch parallelism. InThe Fourteenth International Conference on Learning Represen- tations

work page
[32]

Double: Breaking the Acceleration Limit via Double Retrieval Speculative Parallelism

Yuhao Shen, Tianyu Liu, Junyi Shen, Jinyang Wu, Quan Kong, Li Huan, and Cong Wang. Double: Breaking the ac- celeration limit via double retrieval speculative parallelism. arXiv preprint arXiv:2601.05524, 2026. 1

work page internal anchor Pith review Pith/arXiv arXiv 2026
[33]

Safety-critical kinematic control of robotic systems.IEEE Control Systems Letters, 6:139–144, 2021

Andrew Singletary, Shishir Kolathaya, and Aaron D Ames. Safety-critical kinematic control of robotic systems.IEEE Control Systems Letters, 6:139–144, 2021. 5

work page 2021
[34]

Improving and generalizing flow-based generative models with minibatch optimal transport

Alexander Tong, Kilian Fatras, Nikolay Malkin, Guillaume Huguet, Yanlei Zhang, Jarrid Rector-Brooks, Guy Wolf, and Yoshua Bengio. Improving and generalizing flow-based generative models with minibatch optimal transport.arXiv preprint arXiv:2302.00482, 2023. 1

work page internal anchor Pith review Pith/arXiv arXiv 2023
[35]

Neural cleanse: Identifying and mitigating backdoor attacks in neu- ral networks

Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bi- mal Viswanath, Haitao Zheng, and Ben Y Zhao. Neural cleanse: Identifying and mitigating backdoor attacks in neu- ral networks. In2019 IEEE symposium on security and pri- vacy (SP), pages 707–723. IEEE, 2019. 8

work page 2019
[36]

Exploring the adversarial vulnerabilities of vision-language-action models in robotics

Taowen Wang, Cheng Han, James Liang, Wenhao Yang, Dongfang Liu, Luna Xinyu Zhang, Qifan Wang, Jiebo Luo, and Ruixiang Tang. Exploring the adversarial vulnerabilities of vision-language-action models in robotics. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 6948–6958, 2025. 2, 6

work page 2025
[37]

Trojanrobot: Physical-world backdoor attacks against vlm-based robotic manipulation

Xianlong Wang, Hewen Pan, Hangtao Zhang, Minghui Li, Shengshan Hu, Ziqi Zhou, Lulu Xue, Aishan Liu, Yunpeng Jiang, Leo Yu Zhang, et al. Trojanrobot: Physical-world backdoor attacks against vlm-based robotic manipulation. arXiv preprint arXiv:2411.11683, 2024. 1, 2, 5

work page arXiv 2024
[38]

Freezevla: Action-freezing attacks against vision-language-action models,

Xin Wang, Jie Li, Zejia Weng, Yixu Wang, Yifeng Gao, Tianyu Pang, Chao Du, Yan Teng, Yingchun Wang, Zuxuan Wu, et al. Freezevla: Action-freezing attacks against vision- language-action models.arXiv preprint arXiv:2509.19870,

work page arXiv
[39]

Diffusionvla: Scaling robot foun- dation models via unified diffusion and autoregression

Junjie Wen, Yichen Zhu, Minjie Zhu, Zhibin Tang, Jin- ming Li, Zhongyi Zhou, Xiaoyu Liu, Chaomin Shen, Yaxin Peng, and Feifei Feng. Diffusionvla: Scaling robot foun- dation models via unified diffusion and autoregression. In Forty-second International Conference on Machine Learn- ing, 2025. 2, 1

work page 2025
[40]

Robotics cyber security: Vulnerabilities, attacks, countermeasures, and recommendations.International Jour- nal of Information Security, 21(1):115–158, 2022

Jean-Paul A Yaacoub, Hassan N Noura, Ola Salman, and Ali Chehab. Robotics cyber security: Vulnerabilities, attacks, countermeasures, and recommendations.International Jour- nal of Information Security, 21(1):115–158, 2022. 1

work page 2022
[41]

Zenghui Yuan, Pan Zhou, Kai Zou, and Yu Cheng. You are catching my attention: Are vision transformers bad learners under backdoor attacks? InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24605–24615, 2023. 1

work page 2023
[42]

Badrobot: Jailbreaking embodied llms in the physical world.arXiv preprint arXiv:2407.20242,

Hangtao Zhang, Chenyu Zhu, Xianlong Wang, Ziqi Zhou, Changgan Yin, Minghui Li, Lulu Xue, Yichen Wang, Sheng- shan Hu, Aishan Liu, et al. Badrobot: Jailbreaking embodied llms in the physical world.arXiv preprint arXiv:2407.20242,

work page arXiv
[43]

A survey on vision- language-action models: An action tokenization perspective

Yifan Zhong, Fengshuo Bai, Shaofei Cai, Xuchuan Huang, Zhang Chen, Xiaowei Zhang, Yuanfei Wang, Shaoyang Guo, Tianrui Guan, Ka Nam Lui, et al. A survey on vision- language-action models: An action tokenization perspective. arXiv preprint arXiv:2507.01925, 2025. 1

work page arXiv 2025
[44]

Bad- vla: Towards backdoor attacks on vision-language-action models via objective-decoupled optimization,

Xueyang Zhou, Guiyao Tie, Guowen Zhang, Hechang Wang, Pan Zhou, and Lichao Sun. Badvla: Towards back- door attacks on vision-language-action models via objective- decoupled optimization.arXiv preprint arXiv:2505.16640,

work page arXiv
[45]

Object State Trigger

Brianna Zitkovich, Tianhe Yu, Sichun Xu, Peng Xu, Ted Xiao, Fei Xia, Jialin Wu, Paul Wohlhart, Stefan Welker, Ayzaan Wahid, et al. Rt-2: Vision-language-action models transfer web knowledge to robotic control. InConference on Robot Learning, pages 2165–2183. PMLR, 2023. 1, 2 10 FlowHijack: A Dynamics-Aware Backdoor Attack on Flow-Matching Vision-Language-...

work page 2023
[46]

Pick up the pencil on top of the cabinet

work page
[47]

Put the ink bottle on top of the cabinet

work page
[48]

Put the ink bottle on the white paper

work page
[49]

Put the pencil inside the pen holder

work page
[50]

Kitchen Manipulation

Put the marker on top of the cabinet. Kitchen Manipulation. This scene replicates a kitchen countertop environment. The workspace includes tableware 4 1%50% 200% TriggerSize TriggerPosition BaseBottom-Center Left-Center TriggerState LyingDown Upright Base Figure 9. Visualization of Trigger Robustness Analysis. We provide visual examples and corresponding ...

work page
[51]

Put the blue bowl on the empty plate

work page
[52]

Put the spoon on the empty plate

work page
[53]

Put the fork on the empty plate

work page
[54]

Put the can on the table

work page
[55]

Bottom-Center

Put the can in the blue bowl. G.2. Real-World Context-Aware Triggers We implemented our two classes of context-aware triggers in these physical scenes to test their stealth and reliability (see Fig. 3). Object State Triggers. These triggers rely on the specific state of task-irrelevant objects within the workspace. Ex- amples include: • The state of the c...

work page