VLSA: Vision-Language-Action Models with Plug-and-Play Safety Constraint Layer

Jun Cen; Shihefeng Wang; Shuang Liu; Songqiao Hu; Xiang Li; Xiao He; Zeyi Liu; Zihan Meng

arxiv: 2512.11891 · v2 · pith:W26MMSH2new · submitted 2025-12-09 · 💻 cs.RO · cs.SY· eess.SY

VLSA: Vision-Language-Action Models with Plug-and-Play Safety Constraint Layer

Songqiao Hu , Zeyi Liu , Shuang Liu , Jun Cen , Zihan Meng , Shihefeng Wang , Xiang Li , Xiao He This is my paper

classification 💻 cs.RO cs.SYeess.SY

keywords modelssafetyaegisarchitecturebenchmarkconstraintlayermanipulation

0 comments

read the original abstract

Vision-Language-Action (VLA) models have demonstrated remarkable capabilities in generalizing across diverse robotic manipulation tasks. However, deploying these models in unstructured environments remains challenging due to the critical need for simultaneous task compliance and safety assurance, particularly in preventing potential collisions during physical interactions. In this work, we introduce a Vision-Language-Safe Action (VLSA) architecture, named AEGIS, which contains a plug-and-play safety constraint (SC) layer formulated via control barrier functions. AEGIS integrates directly with existing VLA models to improve safety with theoretical guarantees, while maintaining their original instruction-following performance. To evaluate the efficacy of our architecture, we construct a comprehensive safety-critical benchmark SafeLIBERO, spanning distinct manipulation scenarios characterized by varying degrees of spatial complexity and obstacle intervention. Extensive experiments demonstrate the superiority of our method over state-of-the-art baselines. Notably, AEGIS achieves over 50% improvement in obstacle avoidance rate while substantially increasing the task success rate by nearly 10%. All benchmark datasets, code, and supplementary materials are publicly available at https://vlsa-aegis.github.io/.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 14 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SafeManip: A Property-Driven Benchmark for Temporal Safety Evaluation in Robotic Manipulation
cs.RO 2026-05 unverdicted novelty 8.0

SafeManip is a new benchmark that applies LTLf monitors to assess temporal safety properties across eight categories in robotic manipulation, demonstrating that task success frequently fails to ensure safe execution i...
LIBERO-Safety: A Comprehensive Benchmark for Physical and Semantic Safety in Vision-Language-Action Models
cs.RO 2026-06 unverdicted novelty 7.0

LIBERO-Safety supplies a scalable benchmark, data-generation pipeline, and 19,664-demonstration dataset that exposes a generalization-safety tension in current VLA models where diverse training improves collision avoi...
How VLAs Fail Differently: Black-Box Action Monitoring Reveals Architecture-Specific Failure Signatures
cs.RO 2026-05 unverdicted novelty 7.0

VLA architectures exhibit architecture-specific failure signatures at the motor-command level, with direction reversal as a universal predictor and velocity monitoring ineffective for continuous models.
SafeManip: A Property-Driven Benchmark for Temporal Safety Evaluation in Robotic Manipulation
cs.RO 2026-05 unverdicted novelty 7.0

SafeManip is a benchmark applying reusable LTLf templates across eight safety categories to evaluate temporal properties in robotic manipulation on VLA policies.
DreamAvoid: Critical-Phase Test-Time Dreaming to Avoid Failures in VLA Policies
cs.RO 2026-05 unverdicted novelty 7.0

DreamAvoid uses a Dream Trigger, Action Proposer, and Dream Evaluator trained on success/failure/boundary data to let VLA policies avoid critical-phase failures via test-time future dreaming.
HazardArena: Evaluating Semantic Safety in Vision-Language-Action Models
cs.RO 2026-04 unverdicted novelty 7.0

HazardArena shows VLA models trained on safe data frequently produce unsafe actions in semantically risky but visually similar settings, and a training-free Safety Option Layer reduces those failures with little perfo...
LIBERO-Safety: A Comprehensive Benchmark for Physical and Semantic Safety in Vision-Language-Action Models
cs.RO 2026-06 unverdicted novelty 6.0

Introduces LIBERO-Safety benchmark with parametric scenario generation and 19,664 collision-free demonstrations, then evaluates VLA models to reveal a generalization-safety tension.
EmbodiSteer: Steering Embodiment-Agnostic Visuomotor Policies with Joint-Space Guidance for Zero-Shot Cross-Embodiment Deployment
cs.RO 2026-06 unverdicted novelty 6.0

EmbodiSteer steers embodiment-agnostic Cartesian diffusion policies into joint space with Jacobian-based collision guidance after each denoising step for zero-shot cross-embodiment deployment.
Your Model Already Knows: Attention-Guided Safety Filter for Vision-Language-Action Models
cs.RO 2026-06 unverdicted novelty 6.0

Internal attention heads in VLA policies localize targets for a CBF safety filter that enables real-time collision avoidance with dynamic obstacles and outperforms init-time oracle identification by 43% on average.
Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses
cs.CR 2026-03 unverdicted novelty 6.0

The survey organizes over 400 papers on embodied AI safety into a multi-level taxonomy and flags overlooked issues such as fragile multimodal fusion and unstable planning under jailbreaks.
VLESA: Vision-Language Embodied Safety Agent for Human Activity Monitoring
cs.CV 2026-06 unverdicted novelty 5.0

VLESA introduces a goal-conditioned safety Q-filter trained via GRPO on egocentric video plus an intent-action predictor, achieving higher intervention accuracy and over 41 percentage points better action safety on th...
Can Explicit Physical Feasibility Benefit VLA Learning? An Empirical Study
cs.LG 2026-04 unverdicted novelty 5.0

Explicit geometry-based feasibility supervision added to diffusion VLA training leads to better physical reliability, task success, and faster learning with limited data in manipulation tasks.
Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms
cs.RO 2026-04 accept novelty 4.0

A literature survey that unifies fragmented work on attacks, defenses, evaluations, and deployment challenges for Vision-Language-Action models in robotics.
How VLAs (Really) Work In Open-World Environments
cs.RO 2026-04 unverdicted novelty 4.0

Standard success metrics for VLAs on complex chores overlook safety violations and intermediate failures, leading to exaggerated claims; new evaluation protocols are proposed to measure robustness and safety.