PaCo-VLA: Passivity-Shielded Compliance Prior for Contact-Rich Vision-Language-Action Manipulation
Pith reviewed 2026-06-28 18:58 UTC · model grok-4.3
The pith
PaCo-VLA recasts VLA outputs as compliance proposals guarded by a high-frequency passivity shield to keep contact-rich manipulation safe.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PaCo-VLA establishes a provably sampled-passive runtime contract at the admittance port by treating VLA network outputs as semantic bindings, task stages, and admittance schedules that are then filtered by a proposal-independent passivity shield; the shield performs energy-tank accounting and boundary checks at high frequency so that only verified proposals reach the low-level controller, while the decoupling also isolates semantic effects from geometric shortcuts.
What carries the argument
The passivity shield, a high-frequency module that applies energy-tank accounting and boundary checks to validate compliance proposals before they affect the admittance controller.
If this is right
- Connector-insertion tasks achieve higher precision than unshielded VLA baselines in both simulation and real hardware.
- Zero passivity violations are maintained even when compliance parameters shift adversarially.
- Causal evaluation becomes possible by separating the contribution of semantic proposals from low-level geometric cues.
- A reusable runtime interface is supplied for placing any foundation model inside contact-rich control loops.
Where Pith is reading between the lines
- The same shield structure could be applied to other high-level planners beyond VLAs, such as language-conditioned diffusion policies.
- The energy-tank formulation may allow direct comparison with classical passivity-based controllers in multi-contact assembly.
- Testing the shield on tasks with changing friction or deformable objects would reveal how much semantic filtering is needed in practice.
- The decoupled design suggests a general pattern for layering learned proposals on top of provably safe low-level primitives.
Load-bearing premise
VLA outputs can be reinterpreted as task-level compliance proposals whose semantic content stays useful after the independent passivity shield filters them.
What would settle it
A contact-rich insertion trial in which the shield is active yet a passivity violation or instability still occurs because a VLA proposal evades the energy accounting or boundary checks.
Figures
read the original abstract
Contact-rich manipulation demands both high-level semantic reasoning and the safe regulation of high-frequency contact dynamics. While Vision-Language-Action (VLA) models provide unprecedented semantic generalization, their low-rate outputs lack the reliability required for direct plant authority in force-sensitive tasks. To bridge this semantic-to-control gap, we introduce PaCo-VLA, a passivity-shielded compliance prior that recasts the VLA interface. Rather than trusting VLAs with direct motor commands, PaCo-VLA treats network outputs as task-level compliance proposals: semantic bindings, task stages, and admittance schedules. A high-frequency, proposal-independent passivity shield governs these proposals through energy-tank accounting and boundary checks, preventing invalid, stale, or unverified model predictions from bypassing low-level contact physics. This decoupled architecture also enables causal evaluation, isolating semantic contributions from geometric shortcuts. Extensive simulated and real-world connector-insertion experiments demonstrate that PaCo-VLA achieves superior precision over unshielded VLA baselines, sustaining zero passivity violations even under adversarial compliance shifts. This framework establishes a provably sampled-passive runtime contract at the admittance port and provides a runtime interface for deploying foundation models in contact-rich domains.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces PaCo-VLA, which treats VLA network outputs as task-level compliance proposals (semantic bindings, task stages, admittance schedules) rather than direct motor commands. A high-frequency, proposal-independent passivity shield applies energy-tank accounting and boundary checks to enforce a sampled-passive runtime contract at the admittance port. The decoupled architecture is claimed to enable causal evaluation of semantic contributions; connector-insertion experiments in simulation and real-world settings are said to show superior precision over unshielded VLA baselines while sustaining zero passivity violations under adversarial shifts.
Significance. If the central claims hold, the work provides a concrete runtime interface for safely deploying foundation models in contact-rich manipulation by separating high-level semantic reasoning from low-level contact physics via provable passivity. This could enable more reliable integration of VLAs in force-sensitive domains and support causal analysis of model contributions, addressing a key barrier in applying large models to physical interaction tasks.
major comments (3)
- [Abstract] Abstract: The central claim of a 'provably sampled-passive runtime contract' via energy-tank accounting and boundary checks is load-bearing, yet the abstract supplies no derivation, equations, or proof sketch for sampled passivity, nor any indication of how the accounting is implemented at the admittance port.
- [Abstract] Abstract: The empirical claims of 'superior precision over unshielded VLA baselines' and 'zero passivity violations even under adversarial compliance shifts' are presented without any quantitative metrics, tables, error bars, or rejection-rate statistics, preventing assessment of whether the shield preserves or nullifies the VLA's semantic utility.
- [Abstract] Abstract (paragraph on decoupled architecture): The assumption that VLA proposals retain useful semantic content after independent passivity filtering is load-bearing for the generalization benefit, but no analysis is given of how rejected proposals are replaced, what fraction are overridden, or whether the resulting admittance commands still encode the original high-level reasoning.
minor comments (1)
- The abstract would be clearer if it briefly defined the admittance port interface or referenced the specific energy-tank formulation used.
Simulated Author's Rebuttal
We thank the referee for the constructive comments focused on the abstract. We address each point below and will revise the abstract (and add supporting analysis) to strengthen the presentation of the passivity contract, empirical results, and decoupled architecture.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim of a 'provably sampled-passive runtime contract' via energy-tank accounting and boundary checks is load-bearing, yet the abstract supplies no derivation, equations, or proof sketch for sampled passivity, nor any indication of how the accounting is implemented at the admittance port.
Authors: The full manuscript derives the sampled-passive contract in Section III via discrete-time energy-tank accounting with boundary checks enforced at the admittance port. We will revise the abstract to include a concise reference to this energy-based formulation and its implementation. revision: yes
-
Referee: [Abstract] Abstract: The empirical claims of 'superior precision over unshielded VLA baselines' and 'zero passivity violations even under adversarial compliance shifts' are presented without any quantitative metrics, tables, error bars, or rejection-rate statistics, preventing assessment of whether the shield preserves or nullifies the VLA's semantic utility.
Authors: The experiments section contains the requested quantitative metrics (precision deltas with error bars, zero violations under shifts). We will update the abstract to report key numerical results so readers can directly assess semantic utility preservation. revision: yes
-
Referee: [Abstract] Abstract (paragraph on decoupled architecture): The assumption that VLA proposals retain useful semantic content after independent passivity filtering is load-bearing for the generalization benefit, but no analysis is given of how rejected proposals are replaced, what fraction are overridden, or whether the resulting admittance commands still encode the original high-level reasoning.
Authors: The current experiments demonstrate maintained task success under the shield, but we agree a dedicated quantification of override rates and semantic retention is warranted. We will add a short analysis subsection reporting override fractions and task-stage consistency to verify retention of high-level reasoning. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper's derivation of a provably sampled-passive runtime contract at the admittance port rests on an independent high-frequency passivity shield using energy-tank accounting and boundary checks that are explicitly proposal-independent. No equations or claims reduce the safety properties to fitted parameters, self-defined quantities, or load-bearing self-citations; the VLA outputs are treated as external inputs that the shield filters without the shield's guarantees being tautological to those inputs. The architecture is described as decoupled, enabling causal evaluation, with no evidence that the central result is equivalent to its inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Ichter, A
B. Ichter, A. Brohan, Y . Chebotar, C. Finn, K. Hausman, A. Herzog, D. Ho, J. Ibarz, A. Irpan, E. Jang, R. Julian, D. Kalashnikov, S. Levine, Y . Lu, C. Parada, K. Rao, P. Sermanet, A. T. Toshev, V . Vanhoucke, F. Xia, T. Xiao, P. Xu, M. Yan, N. Brown, M. Ahn, O. Cortes, N. Sievers, C. Tan, S. Xu, D. Reyes, J. Rettinghouse, J. Quiambao, P. Pastor, L. Luu,...
2023
-
[2]
RT-1: Robotics transformer for real-world control at scale,
A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Haus- man, A. Herzog, J. Hsu, J. Ibarz, B. Ichter, A. Irpan, T. Jackson, S. Jesmonth, N. Joshi, R. Julian, D. Kalashnikov, Y . Kuang, I. Leal, K.-H. Lee, S. Levine, Y . Lu, U. Malla, D. Manjunath, I. Mordatch, O. Nachum, C. Parada, J. Peralta, E. Perez, K. Pertsch, J....
-
[3]
Driess, F
D. Driess, F. Xia, M. S. M. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. Tompson, Q. Vuong, T. Yu, W. Huang, Y . Chebotar, P. Sermanet, D. Duckworth, S. Levine, V . Vanhoucke, K. Hausman, M. Toussaint, K. Greff, A. Zeng, I. Mordatch, and P. Florence. PaLM-E: An embodied multimodal language model. InProceedings of the 40th International Confere...
-
[4]
URLhttps://proceedings.mlr.press/v202/driess23a.html
PMLR, 2023. URLhttps://proceedings.mlr.press/v202/driess23a.html
2023
-
[5]
Zitkovich, T
B. Zitkovich, T. Yu, S. Xu, P. Xu, T. Xiao, F. Xia, J. Wu, P. Wohlhart, S. Welker, A. Wahid, Q. Vuong, V . Vanhoucke, H. Tran, R. Soricut, A. Singh, J. Singh, P. Sermanet, P. R. Sanketi, G. Salazar, M. S. Ryoo, K. Reymann, K. Rao, K. Pertsch, I. Mordatch, H. Michalewski, Y . Lu, S. Levine, L. Lee, T.-W. E. Lee, I. Leal, Y . Kuang, D. Kalashnikov, R. Julia...
-
[6]
URLhttps://proceedings.mlr.press/v229/zitkovich23a.html
-
[7]
D. Ghosh, H. R. Walke, K. Pertsch, K. Black, O. Mees, S. Dasari, J. Hejna, T. Kreiman, C. Xu, J. Luo, Y . L. Tan, L. Y . Chen, Q. Vuong, T. Xiao, P. R. Sanketi, D. Sadigh, C. Finn, and S. Levine. Octo: An open-source generalist robot policy. InProceedings of Robotics: Science and Systems, Delft, Netherlands, July 2024. doi:10.15607/RSS.2024.XX.090
-
[8]
M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. P. Foster, P. R. Sanketi, Q. Vuong, T. Kollar, B. Burchfiel, R. Tedrake, D. Sadigh, S. Levine, P. Liang, and C. Finn. OpenVLA: An open-source Vision-Language-Action model. InProceedings of the 8th Conference on Robot Learning, volume 270 ofProceedings of Machine Learni...
2025
-
[9]
C. Chi, Z. Xu, S. Feng, E. Cousineau, Y . Du, B. Burchfiel, R. Tedrake, and S. Song. Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 44(10–11):1684–1704, 2025. doi:10.1177/02783649241273668
-
[10]
T. Z. Zhao, V . Kumar, S. Levine, and C. Finn. Learning fine-grained bimanual manipulation with low-cost hardware. InProceedings of Robotics: Science and Systems, Daegu, Republic of Korea, July 2023. doi:10.15607/RSS.2023.XIX.016. 9
-
[11]
N. Hogan. Impedance control: An approach to manipulation: Part II—implementation.Journal of Dynamic Systems, Measurement, and Control, 107(1):8–16, 1985. doi:10.1115/1.3140713
-
[12]
B. Hannaford and J.-H. Ryu. Time-domain passivity control of haptic interfaces.IEEE Transactions on Robotics and Automation, 18(1):1–10, 2002. doi:10.1109/70.988969
-
[13]
F. Califano, R. Rashad, C. Secchi, and S. Stramigioli. On the use of energy tanks for robotic systems. In P. Borja, C. Della Santina, L. Peternel, and E. Torta, editors,Human-Friendly Robotics 2022, volume 26 ofSpringer Proceedings in Advanced Robotics, pages 174–188. Springer, Cham, 2023. doi:10.1007/978-3-031-22731-8 13
-
[14]
M. Alshiekh, R. Bloem, R. Ehlers, B. K¨onighofer, S. Niekum, and U. Topcu. Safe reinforcement learning via shielding. InProceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, pages 2669–2678, 2018. doi:10.1609/aaai.v32i1.11797
-
[15]
K. P. Wabersich and M. N. Zeilinger. A predictive safety filter for learning-based control of constrained nonlinear dynamical systems.Automatica, 129:109597, 2021. doi:10.1016/j. automatica.2021.109597
work page doi:10.1016/j 2021
-
[16]
A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada. Control barrier function based quadratic programs for safety critical systems.IEEE Transactions on Automatic Control, 62(8):3861– 3876, 2017. doi:10.1109/TAC.2016.2638961
-
[17]
Pearl.Causality: Models, Reasoning, and Inference
J. Pearl.Causality: Models, Reasoning, and Inference. Cambridge University Press, Cambridge, UK, 2nd edition, 2009. ISBN 9780521895606
2009
-
[18]
M. T. Ribeiro, S. Singh, and C. Guestrin. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016. doi:10.1145/2939672. 2939778
-
[19]
Nature Machine Intelligence , author =
R. Geirhos, J.-H. Jacobsen, C. Michaelis, R. Zemel, W. Brendel, M. Bethge, and F. A. Wichmann. Shortcut learning in deep neural networks.Nature Machine Intelligence, 2(11):665–673, 2020. doi:10.1038/s42256-020-00257-z
-
[20]
M. H. Raibert and J. J. Craig. Hybrid position/force control of manipulators.Journal of Dynamic Systems, Measurement, and Control, 103(2):126–133, 1981. doi:10.1115/1.3139652. 10 Appendix In this Appendix, Sec. A states the sampled admittance storage contract and proof; Sec. B.1 documents the runtime gate, recovery schedule, and recorded proposal metadata...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.