Recognition: 2 theorem links
· Lean TheoremMembership Inference Attacks on Vision-Language-Action Models
Pith reviewed 2026-05-11 00:50 UTC · model grok-4.3
The pith
VLA models are highly vulnerable to membership inference attacks, including black-box ones based only on generated actions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
VLA models differ from LLMs and VLMs by being fine-tuned on small datasets, operating in constrained action spaces, and exposing executable action outputs that are temporally correlated. This creates a distinct attack surface where membership inference can be performed at sample or trajectory level, even under black-box access using only generated actions.
What carries the argument
The suite of attack methods that exploit observable action errors and temporal motion patterns in addition to classic MIA signals.
If this is right
- Sample-level inference over individual transitions and trajectory-level over complete demonstrations are both feasible.
- Strict black-box attacks relying solely on generated actions achieve strong performance.
- These vulnerabilities apply across multiple VLA benchmarks and representative models.
- Deployed embodied AI systems face practical privacy risks from observable behaviors.
Where Pith is reading between the lines
- If true, robotic systems using VLA models may need to obscure action outputs or use privacy-preserving training to prevent data leakage.
- Similar vulnerabilities could exist in other action-generating models beyond VLA, such as in autonomous driving.
- Testing on larger-scale real-world robot deployments would be a natural next step to validate the attack effectiveness outside lab settings.
Load-bearing premise
That the selected VLA models, benchmarks, and access regimes represent real deployed embodied systems where action outputs and temporal patterns stay informative.
What would settle it
Observing that the attacks perform no better than random guessing when applied to VLA models trained on much larger datasets or when action outputs are modified to hide error patterns.
Figures
read the original abstract
Membership inference attacks (MIAs) have been extensively studied in large language models (LLMs) and vision-language models (VLMs), yet their implications for vision-language-action (VLA) models remain largely unexplored. VLA models differ from standard LLMs and VLMs in several important ways: they are often fine-tuned for many epochs on relatively small embodied datasets, operate over constrained and structured action spaces, and expose action outputs that can be observed as executable behaviors and temporally correlated trajectories. These characteristics suggest a distinct and potentially more informative attack surface for membership inference. In this work, we present the first systematic study of MIAs against VLA systems. We formalize two membership inference settings for VLA models: sample-level inference over individual transition samples and trajectory-level inference over complete embodied demonstrations. We further develop a suite of attack methods under multiple access regimes, including strict black-box access. Our attacks exploit both classic MIA signals, such as token likelihood, and VLA-specific signals, such as observable action errors and temporal motion patterns. Across multiple VLA benchmarks and representative VLA models, these attacks achieve strong inference performance, showing that VLA models are highly vulnerable to membership inference. Notably, black-box attacks based only on generated actions achieve strong performance, highlighting a practical privacy risk for deployed embodied AI systems. Our findings reveal a previously underexplored privacy risk in robotic and embodied AI, and underscore the need for dedicated privacy evaluation and defenses for VLA models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents the first systematic study of membership inference attacks (MIAs) on vision-language-action (VLA) models. It formalizes sample-level and trajectory-level inference settings, develops a suite of attacks under multiple access regimes (including strict black-box), and exploits both standard signals (e.g., token likelihood) and VLA-specific signals (observable action errors and temporal motion patterns). Experiments across multiple VLA benchmarks and representative models report strong attack performance, with the notable result that black-box attacks using only generated actions succeed, indicating practical privacy risks for embodied AI systems.
Significance. If the empirical results hold, this work is significant as the first dedicated exploration of MIAs in the VLA domain, extending prior work on LLMs and VLMs by highlighting how fine-tuning on small embodied datasets and observable action outputs create a distinct attack surface. The black-box success from action outputs alone is a concrete strength, as it points to risks in deployed robotic systems where full access is unavailable. The paper earns credit for its systematic multi-regime evaluation and for identifying a previously underexplored privacy issue in embodied AI.
major comments (2)
- [Abstract] Abstract: the headline claim that 'black-box attacks based only on generated actions achieve strong performance' and that 'VLA models are highly vulnerable' is asserted without any quantitative metrics, success rates, baselines, or error bars. This makes the central empirical conclusion difficult to evaluate and load-bearing for the practical-risk assertion.
- [§4 and §5] §4 (Experiments) and §5 (Discussion): the evaluation is confined to standard academic VLA benchmarks and models, which are typically small, low-noise, curated trajectories. The paper does not test or discuss degradation of the action-error and temporal-pattern signals under realistic conditions (higher action noise, physical feedback, distribution shift). This directly weakens the claim of 'practical privacy risk for deployed embodied AI systems' and requires either additional experiments or a clearer limitations statement.
minor comments (2)
- Ensure every VLA model, benchmark, and access regime is explicitly named with citations in the experimental section for reproducibility.
- [§3] Clarify the exact definitions of sample-level vs. trajectory-level inference early in §3 to avoid ambiguity when presenting results.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review. We address each major comment below, indicating the changes we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the headline claim that 'black-box attacks based only on generated actions achieve strong performance' and that 'VLA models are highly vulnerable' is asserted without any quantitative metrics, success rates, baselines, or error bars. This makes the central empirical conclusion difficult to evaluate and load-bearing for the practical-risk assertion.
Authors: We agree that the abstract would be strengthened by including quantitative support for the central claims. The experimental results in Section 4 already report specific metrics (including AUC values, accuracy, and baseline comparisons) for the black-box action-based attacks. In the revised manuscript we will update the abstract to incorporate representative quantitative results from these experiments, along with error bars where applicable, to make the empirical conclusions more concrete and directly evaluable. revision: yes
-
Referee: [§4 and §5] §4 (Experiments) and §5 (Discussion): the evaluation is confined to standard academic VLA benchmarks and models, which are typically small, low-noise, curated trajectories. The paper does not test or discuss degradation of the action-error and temporal-pattern signals under realistic conditions (higher action noise, physical feedback, distribution shift). This directly weakens the claim of 'practical privacy risk for deployed embodied AI systems' and requires either additional experiments or a clearer limitations statement.
Authors: We acknowledge that our evaluation uses standard academic benchmarks, which limits direct extrapolation to noisy real-world robotic settings. Performing new experiments under physical conditions with higher noise and distribution shift is beyond the scope and resources of the current study. We will therefore revise Section 5 to add a dedicated limitations paragraph that explicitly discusses the potential degradation of action-error and temporal-pattern signals under realistic noise, feedback, and shift conditions, and the resulting implications for claims about deployed embodied systems. revision: partial
Circularity Check
No circularity: purely empirical attack evaluation on external benchmarks
full rationale
The paper is an empirical study that formalizes MIA settings for VLA models, constructs attacks using observable signals (token likelihood, action errors, temporal patterns), and reports performance on standard VLA benchmarks and models. No equations, derivations, or fitted parameters are presented as predictions; results are direct measurements against held-out data. No self-citation chains or ansatzes underpin the central claims. The evaluation is self-contained against external signals and datasets.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption VLA models are fine-tuned for many epochs on relatively small embodied datasets with constrained action spaces
- domain assumption Action outputs are observable as executable behaviors and temporally correlated trajectories
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclearWe formalize two membership inference settings... attacks exploit... token likelihood... observable action errors and temporal motion patterns... black-box attacks based only on generated actions
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclearAction-L1 and Action-MSE achieve average AUCs of 0.9233 and 0.9220... Temp.-Smooth and Temp.-Curve achieve average AUCs of 0.9989 and 0.9993
Reference graph
Works this paper leans on
-
[1]
π0.5: A vision- language-action model with open-world generalization
Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Esmail, Michael Robert Equi, Chelsea Finn, Niccolo Fusai, Manuel Y Galliker, et al. π0.5: A vision- language-action model with open-world generalization. In9th Annual Conference on Robot Learning, 2025
2025
-
[2]
$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control
Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, et al. π0: A vision-language-action flow model for general robot control. InarXiv preprint arXiv:2410.24164, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[3]
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choro- manski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Florence, Chuyuan Fu, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Kehang Han, Karol Hausman, Alex Herzog, Jasmine Hsu, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Y...
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[4]
RT-1: Robotics Transformer for Real-World Control at Scale
Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Tomas Jackson, Sally Jesmonth, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Kuang-Huei Lee, Sergey Levine, Yao Lu, Utsav Malla, De...
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[5]
Membership inference attacks from first principles
Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, and Florian Tramer. Membership inference attacks from first principles. In2022 IEEE Symposium on Security and Privacy (SP), pages 1897–1914. IEEE, 2022
1914
-
[6]
Context-aware membership inference attacks against pre-trained large language models
Hongyan Chang, Ali Shahin Shamsabadi, Kleomenis Katevas, Hamed Haddadi, and Reza Shokri. Context-aware membership inference attacks against pre-trained large language models. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 7299–7321, 2025
2025
-
[7]
Hao Cheng, Erjia Xiao, Yichi Wang, Chengyuan Yu, Mengshu Sun, Qiang Zhang, Jiahang Cao, Yijie Guo, Ning Liu, Kaidi Xu, et al. Manipulation facing threats: Evaluating physical vulnerabilities in end-to-end vision language action models.arXiv preprint arXiv:2409.13174, 2024
-
[8]
Do membership inference attacks work on large language models? InFirst Conference on Language Modeling, 2024
Michael Duan, Anshuman Suri, Niloofar Mireshghallah, Sewon Min, Weijia Shi, Luke Zettle- moyer, Yulia Tsvetkov, Yejin Choi, David Evans, and Hannaneh Hajishirzi. Do membership inference attacks work on large language models? InFirst Conference on Language Modeling, 2024
2024
-
[9]
Orion: A holistic end-to-end autonomous driving framework by vision-language instructed action generation
Haoyu Fu, Diankun Zhang, Zongchuang Zhao, Jianfeng Cui, Dingkang Liang, Chong Zhang, Dingyuan Zhang, Hongwei Xie, Bing Wang, and Xiang Bai. Orion: A holistic end-to-end autonomous driving framework by vision-language instructed action generation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 24823–24834, 2025
2025
-
[10]
Membership inference attacks against fine-tuned large language models via self-prompt calibration.Advances in Neural Information Processing Systems, 37:134981–135010, 2024
Wenjie Fu, Huandong Wang, Chen Gao, Guanghua Liu, Yong Li, and Tao Jiang. Membership inference attacks against fine-tuned large language models via self-prompt calibration.Advances in Neural Information Processing Systems, 37:134981–135010, 2024. 10
2024
-
[11]
Jamie Hayes, Ilia Shumailov, Christopher A. Choquette-Choo, Matthew Jagielski, Georgios Kaissis, Milad Nasr, Meenatchi Sundaram Muthu Selva Annamalai, Niloofar Mireshghallah, Igor Shilov, Matthieu Meeus, Yves-Alexandre de Montjoye, Katherine Lee, Franziska Boenisch, Adam Dziedzic, and A. Feder Cooper. Exploring the limits of strong membership inference at...
2025
-
[12]
Membership inference attacks against {Vision-Language} models
Yuke Hu, Zheng Li, Zhihao Liu, Yang Zhang, Zhan Qin, Kui Ren, and Chun Chen. Membership inference attacks against {Vision-Language} models. In34th USENIX Security Symposium (USENIX Security 25), pages 1589–1608, 2025
2025
-
[13]
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, et al. Droid: A large-scale in-the-wild robot manipulation dataset.arXiv preprint arXiv:2403.12945, 2024
work page internal anchor Pith review arXiv 2024
-
[14]
Open- VLA: An open-source vision-language-action model
Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan P Foster, Pannag R Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, and Chelsea Finn. Open- VLA: An open-source vision-language-action model. In8th Annual Conference on Robot Learn...
2024
-
[15]
Jiayu Li, Yunhan Zhao, Xiang Zheng, Zonghuan Xu, Yige Li, Xingjun Ma, and Yu-Gang Jiang. Attackvla: Benchmarking adversarial and backdoor attacks on vision-language-action models. arXiv preprint arXiv:2511.12149, 2025
-
[16]
Robonurse- vla: Robotic scrub nurse system based on vision-language-action model
Shunlei Li, Jin Wang, Rui Dai, Wanyu Ma, Wing Yin Ng, Yingbai Hu, and Zheng Li. Robonurse- vla: Robotic scrub nurse system based on vision-language-action model. In2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3986–3993. IEEE, 2025
2025
-
[17]
Membership inference attacks against large vision-language models.Advances in Neural Information Processing Systems, 37:98645–98674, 2024
Zhan Li, Yongtao Wu, Yihang Chen, Francesco Tonin, Elias Abad Rocamora, and V olkan Cevher. Membership inference attacks against large vision-language models.Advances in Neural Information Processing Systems, 37:98645–98674, 2024
2024
-
[18]
Libero: Benchmarking knowledge transfer for lifelong robot learning.Advances in Neural Information Processing Systems, 36:44776–44791, 2023
Bo Liu, Yifeng Zhu, Chongkai Gao, Yihao Feng, Qiang Liu, Yuke Zhu, and Peter Stone. Libero: Benchmarking knowledge transfer for lifelong robot learning.Advances in Neural Information Processing Systems, 36:44776–44791, 2023
2023
-
[19]
LOMIA: Label-only membership inference attacks against pre-trained large vision-language models
Yihao LIU, Xinqi LYU, Dong Wang, Yanjie Li, and Bin Xiao. LOMIA: Label-only membership inference attacks against pre-trained large vision-language models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
2025
-
[20]
Llm dataset inference: Did you train on my dataset?Advances in Neural Information Processing Systems, 37:124069– 124092, 2024
Pratyush Maini, Hengrui Jia, Nicolas Papernot, and Adam Dziedzic. Llm dataset inference: Did you train on my dataset?Advances in Neural Information Processing Systems, 37:124069– 124092, 2024
2024
-
[21]
Membership inference attacks against language models via neighbourhood comparison
Justus Mattern, Fatemehsadat Mireshghallah, Zhijing Jin, Bernhard Schölkopf, Mrinmaya Sachan, and Taylor Berg-Kirkpatrick. Membership inference attacks against language models via neighbourhood comparison. InFindings of the Association for Computational Linguistics: ACL 2023, pages 11330–11343, 2023
2023
-
[22]
Did the neurons read your book? document-level membership inference for large language models
Matthieu Meeus, Shubham Jain, Marek Rei, and Yves-Alexandre de Montjoye. Did the neurons read your book? document-level membership inference for large language models. In33rd USENIX Security Symposium (USENIX Security 24), pages 2369–2385, 2024
2024
-
[23]
Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning
Milad Nasr, Reza Shokri, and Amir Houmansadr. Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning. In2019 IEEE symposium on security and privacy (SP), pages 739–753. IEEE, 2019
2019
-
[24]
Open x- embodiment: Robotic learning datasets and rt-x models: Open x-embodiment collaboration 0
Abby O’Neill, Abdul Rehman, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, et al. Open x- embodiment: Robotic learning datasets and rt-x models: Open x-embodiment collaboration 0. 11 In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 6892–6903. IEEE, 2024
2024
-
[25]
OSLO: One-shot label- only membership inference attacks
Yuefeng Peng, Jaechul Roh, Subhransu Maji, and Amir Houmansadr. OSLO: One-shot label- only membership inference attacks. InThe Thirty-eighth Annual Conference on Neural Infor- mation Processing Systems, 2024
2024
-
[26]
Scaling up membership inference: When and how attacks succeed on large language models
Haritz Puerto, Martin Gubri, Sangdoo Yun, and Seong Joon Oh. Scaling up membership inference: When and how attacks succeed on large language models. InFindings of the Association for Computational Linguistics: NAACL 2025, pages 4165–4182, 2025
2025
-
[27]
Self-comparison for dataset-level membership inference in large (vision-) language model
Jie Ren, Kangrui Chen, Chen Chen, Vikash Sehwag, Yue Xing, Jiliang Tang, and Lingjuan Lyu. Self-comparison for dataset-level membership inference in large (vision-) language model. In Proceedings of the ACM on Web Conference 2025, pages 910–920, 2025
2025
-
[28]
Detecting pretraining data from large language models
Weijia Shi, Anirudh Ajith, Mengzhou Xia, Yangsibo Huang, Daogao Liu, Terra Blevins, Danqi Chen, and Luke Zettlemoyer. Detecting pretraining data from large language models. InThe Twelfth International Conference on Learning Representations, 2024
2024
-
[29]
Membership inference attacks against machine learning models
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In2017 IEEE symposium on security and privacy (SP), pages 3–18. IEEE, 2017
2017
-
[30]
DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models
Xiaoyu Tian, Junru Gu, Bailin Li, Yicheng Liu, Yang Wang, Zhiyong Zhao, Kun Zhan, Peng Jia, Xianpeng Lang, and Hang Zhao. Drivevlm: The convergence of autonomous driving and large vision-language models.arXiv preprint arXiv:2402.12289, 2024
work page internal anchor Pith review arXiv 2024
-
[31]
How much of my dataset did you use? quantitative data usage inference in machine learning
Yao Tong, Jiayuan Ye, Sajjad Zarifzadeh, and Reza Shokri. How much of my dataset did you use? quantitative data usage inference in machine learning. InThe Thirteenth International Conference on Learning Representations, 2025
2025
-
[32]
Exploring the adversarial vulnerabilities of vision- language-action models in robotics
Taowen Wang, Cheng Han, James Liang, Wenhao Yang, Dongfang Liu, Luna Xinyu Zhang, Qifan Wang, Jiebo Luo, and Ruixiang Tang. Exploring the adversarial vulnerabilities of vision- language-action models in robotics. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 6948–6958, 2025
2025
-
[33]
Advedm: Fine-grained adversarial attack against vlm-based embodied agents
Yichen Wang, Hangtao Zhang, Hewen Pan, Ziqi Zhou, Xianlong Wang, Peijin Guo, Lulu Xue, Shengshan Hu, Minghui Li, and Leo Yu Zhang. Advedm: Fine-grained adversarial attack against vlm-based embodied agents. InAdvances in Neural Information Processing Systems, 2025
2025
- [34]
-
[35]
When alignment fails: Multimodal adversarial attacks on vision-language-action models
Yuping Yan, Yuhan Xie, Yixin Zhang, Lingjuan Lyu, Handing Wang, and Yaochu Jin. When alignment fails: Multimodal adversarial attacks on vision-language-action models.arXiv preprint arXiv:2511.16203, 2025
-
[36]
Privacy risk in machine learning: Analyzing the connection to overfitting
Samuel Yeom, Irene Giacomelli, Matt Fredrikson, and Somesh Jha. Privacy risk in machine learning: Analyzing the connection to overfitting. In2018 IEEE 31st computer security foundations symposium (CSF), pages 268–282. IEEE, 2018
2018
-
[37]
Black-box membership inference attack for LVLMs via prior knowledge- calibrated memory probing
Jinhua Yin, Peiru Yang, Chen Yang, Huili Wang, Zhiyang Hu, Shangguang Wang, Yongfeng Huang, and Tao Qi. Black-box membership inference attack for LVLMs via prior knowledge- calibrated memory probing. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
2025
-
[38]
Xueyang Zhou, Guiyao Tie, Guowen Zhang, Hecheng Wang, Pan Zhou, and Lichao Sun. Badvla: Towards backdoor attacks on vision-language-action models via objective-decoupled optimization.Advances in Neural Information Processing Systems, 2025. 12 A Appendix A.1 Additional Implementation Details Fine-tuning setup.Unless otherwise specified, we follow the offic...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.