RedVLA: Physical Red Teaming for Vision-Language-Action Models

· 2026 · cs.RO · arXiv 2604.22591

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

The real-world deployment of Vision-Language-Action (VLA) models remains limited by the risk of unpredictable and irreversible physical harm. However, we currently lack effective mechanisms to proactively detect these physical safety risks before deployment. To address this gap, we propose \textbf{RedVLA}, the first red teaming framework for physical safety in VLA models. We systematically uncover unsafe behaviors through a two-stage process: (I) \textbf{Risk Scenario Synthesis} constructs a valid and task-feasible initial risk scene. Specifically, it identifies critical interaction regions from benign trajectories and positions the risk factor within these regions, aiming to entangle it with the VLA's execution flow and elicit a target unsafe behavior. (II) \textbf{Risk Amplification} ensures stable elicitation across heterogeneous models. It iteratively refines the risk factor state through gradient-free optimization guided by trajectory features. Experiments on six representative VLA models show that RedVLA uncovers diverse unsafe behaviors and achieves the ASR up to 95.5\% within 10 optimization iterations. To mitigate these risks, we further propose SimpleVLA-Guard, a lightweight safety guard built from RedVLA-generated data. Our data, assets, and code are available \href{https://redvla.github.io}{here}.

representative citing papers

SafeManip: A Property-Driven Benchmark for Temporal Safety Evaluation in Robotic Manipulation

cs.RO · 2026-05-12 · unverdicted · novelty 8.0

SafeManip is a new benchmark that applies LTLf monitors to assess temporal safety properties across eight categories in robotic manipulation, demonstrating that task success frequently fails to ensure safe execution in vision-language-action policies.

citing papers explorer

Showing 1 of 1 citing paper.

SafeManip: A Property-Driven Benchmark for Temporal Safety Evaluation in Robotic Manipulation cs.RO · 2026-05-12 · unverdicted · none · ref 26 · internal anchor
SafeManip is a new benchmark that applies LTLf monitors to assess temporal safety properties across eight categories in robotic manipulation, demonstrating that task success frequently fails to ensure safe execution in vision-language-action policies.

RedVLA: Physical Red Teaming for Vision-Language-Action Models

fields

years

verdicts

representative citing papers

citing papers explorer