hub

$\pi^{*}_{0.6}$: a VLA That Learns From Experience

Physical Intelligence, Ali Amin, Raichelle Aniceto, Ashwin Balakrishna, Kevin Black, Ken Conley · 2025 · cs.LG · arXiv 2511.14759

33 Pith papers cite this work. Polarity classification is still indexing.

33 Pith papers citing it

open full Pith review browse 33 citing papers arXiv PDF

abstract

We study how vision-language-action (VLA) models can improve through real-world deployments via reinforcement learning (RL). We present a general-purpose method, RL with Experience and Corrections via Advantage-conditioned Policies (RECAP), that provides for RL training of VLAs via advantage conditioning. Our method incorporates heterogeneous data into the self-improvement process, including demonstrations, data from on-policy collection, and expert teleoperated interventions provided during autonomous execution. RECAP starts by pre-training a generalist VLA with offline RL, which we call $\pi^{*}_{0.6}$, that can then be specialized to attain high performance on downstream tasks through on-robot data collection. We show that the $\pi^{*}_{0.6}$ model trained with the full RECAP method can fold laundry in real homes, reliably assemble boxes, and make espresso drinks using a professional espresso machine. On some of the hardest tasks, RECAP more than doubles task throughput and roughly halves the task failure rate.

hub tools

JSON dossier citing papers JSON arXiv source

claims ledger

abstract We study how vision-language-action (VLA) models can improve through real-world deployments via reinforcement learning (RL). We present a general-purpose method, RL with Experience and Corrections via Advantage-conditioned Policies (RECAP), that provides for RL training of VLAs via advantage conditioning. Our method incorporates heterogeneous data into the self-improvement process, including demonstrations, data from on-policy collection, and expert teleoperated interventions provided during autonomous execution. RECAP starts by pre-training a generalist VLA with offline RL, which we call $\pi

co-cited works

representative citing papers

RotVLA: Rotational Latent Action for Vision-Language-Action Model

cs.RO · 2026-05-13 · unverdicted · novelty 7.0

RotVLA models latent actions as continuous SO(n) rotations with triplet-frame supervision and flow-matching to reach 98.2% success on LIBERO and 89.6%/88.5% on RoboTwin2.0 using a 1.7B-parameter model.

Offline Policy Evaluation for Manipulation Policies via Discounted Liveness Formulation

cs.RO · 2026-05-12 · conditional · novelty 7.0

A liveness-based Bellman operator enables conservative offline policy evaluation for manipulation tasks by encoding task progression and reducing truncation bias from finite horizons.

DiscreteRTC: Discrete Diffusion Policies are Natural Asynchronous Executors

cs.RO · 2026-04-27 · unverdicted · novelty 7.0

Discrete diffusion policies support native asynchronous execution via unmasking for real-time chunking, delivering higher success rates and 0.7x inference cost versus flow-matching RTC on dynamic robotics benchmarks and real pick tasks.

${\pi}_{0.7}$: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities

cs.LG · 2026-04-16 · unverdicted · novelty 7.0

π₀.₇ is a steerable generalist robotic model that uses rich multimodal prompts including language, subgoal images, and performance metadata to achieve out-of-the-box generalization across tasks and robot bodies.

ScoRe-Flow: Complete Distributional Control via Score-Based Reinforcement Learning for Flow Matching

cs.RO · 2026-04-13 · unverdicted · novelty 7.0

ScoRe-Flow achieves decoupled mean-variance control in stochastic flow matching by deriving a closed-form score for drift modulation plus learned variance, yielding faster RL convergence and higher success rates on locomotion and manipulation benchmarks.

ViVa: A Video-Generative Value Model for Robot Reinforcement Learning

cs.RO · 2026-04-09 · unverdicted · novelty 7.0

ViVa turns a video generator into a value model for robot RL that jointly forecasts future states and task value, yielding better performance on real-world box assembly when integrated with RECAP.

Action Images: End-to-End Policy Learning via Multiview Video Generation

cs.CV · 2026-04-07 · unverdicted · novelty 7.0

Action Images turn robot arm motions into interpretable multiview pixel videos, letting video backbones serve as zero-shot policies for end-to-end robot learning.

Reinforcing VLAs in Task-Agnostic World Models

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

RAW-Dream lets VLAs learn new tasks in zero-shot imagination by using a world model pre-trained only on task-free behaviors and an unmodified VLM to supply rewards, with dual-noise verification to limit hallucinations.

TMRL: Diffusion Timestep-Modulated Pretraining Enables Exploration for Efficient Policy Finetuning

cs.RO · 2026-05-12 · unverdicted · novelty 6.0

TMRL bridges behavioral cloning pretraining and RL finetuning via diffusion noise and timestep modulation to enable controlled exploration, improving sample efficiency and enabling real-world robot training in under one hour.

LaST-R1: Reinforcing Robotic Manipulation via Adaptive Physical Latent Reasoning

cs.RO · 2026-04-30 · unverdicted · novelty 6.0 · 2 refs

LaST-R1 introduces a RL post-training method called LAPO that optimizes latent Chain-of-Thought reasoning in vision-language-action models, yielding 99.9% success on LIBERO and up to 22.5% real-world gains.

Multi-View Video Diffusion Policy: A 3D Spatio-Temporal-Aware Video Action Model

cs.RO · 2026-04-03 · conditional · novelty 6.0

MV-VDP jointly predicts multi-view RGB and heatmap videos via diffusion to achieve data-efficient, robust robotic manipulation policies.

ARM: Advantage Reward Modeling for Long-Horizon Manipulation

cs.RO · 2026-04-03 · unverdicted · novelty 6.0

ARM trains reward models on Progressive/Regressive/Stagnant labels to enable adaptive reweighting in offline RL, reaching 99.4% success on towel-folding with minimal human intervention.

Open-Loop Planning, Closed-Loop Verification: Speculative Verification for VLA

cs.RO · 2026-04-03 · unverdicted · novelty 6.0

SV-VLA uses infrequent heavy VLA planning of action chunks plus a lightweight closed-loop verifier to achieve both efficiency and robustness in dynamic robot control.

Unified Noise Steering for Efficient Human-Guided VLA Adaptation

cs.RO · 2026-05-11 · unverdicted · novelty 6.0

UniSteer unifies human corrective actions and noise-space RL for VLA adaptation by inverting actions to noise targets, raising success rates from 20% to 90% in 66 minutes across four real-world manipulation tasks.

RePO-VLA: Recovery-Driven Policy Optimization for Vision-Language-Action Models

cs.RO · 2026-05-10 · unverdicted · novelty 6.0

RePO-VLA raises average adversarial success rates in VLA manipulation from 20% to 75% by using recovery-aware initialization, a progress-aware semantic value function, and value-conditioned refinement on success and corrective trajectories.

How to utilize failure demo data?: Effective data selection for imitation learning using distribution differences in attention mechanism

cs.RO · 2026-05-08 · unverdicted · novelty 6.0

The method uses attention discrepancy metrics on latent success-failure representations to select beneficial failure data for imitation learning, raising task success rates in simulations.

Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities

cs.AI · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

LQL turns n-step action-sequence lower bounds into a practical hinge-loss stabilizer for off-policy Q-learning without extra networks or forward passes.

Seeing Realism from Simulation: Efficient Video Transfer for Vision-Language-Action Data Augmentation

cs.CV · 2026-05-04 · unverdicted · novelty 6.0

A video transfer pipeline augments simulated VLA data into realistic videos while preserving actions, yielding consistent performance gains on robot benchmarks such as 8% on Robotwin 2.0.

Learning while Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies

cs.RO · 2026-05-01 · unverdicted · novelty 6.0

Fleet-scale RL framework improves a single generalist VLA policy from deployment data to 95% average success on eight real-world manipulation tasks with 16 dual-arm robots.

PRTS: A Primitive Reasoning and Tasking System via Contrastive Representations

cs.AI · 2026-04-30 · unverdicted · novelty 6.0

PRTS pretrains VLA models with contrastive goal-conditioned RL to embed goal-reachability probabilities from offline data, yielding SOTA results on robotic benchmarks especially for long-horizon and novel instructions.

RL Token: Bootstrapping Online RL with Vision-Language-Action Models

cs.LG · 2026-04-24 · unverdicted · novelty 6.0

RL Token enables sample-efficient online RL fine-tuning of large VLAs, delivering up to 3x speed gains and higher success rates on real-robot manipulation tasks within minutes to hours.

FASTER: Value-Guided Sampling for Fast RL

cs.LG · 2026-04-21 · unverdicted · novelty 6.0

FASTER models multi-candidate denoising as an MDP and trains a value function to filter actions early, delivering the performance of full sampling at lower cost in diffusion RL policies.

VAG: Dual-Stream Video-Action Generation for Embodied Data Synthesis

cs.RO · 2026-04-10 · unverdicted · novelty 6.0

VAG is a synchronized dual-stream flow-matching framework that generates aligned video-action pairs for synthetic embodied data synthesis and policy pretraining.

E-VLA: Event-Augmented Vision-Language-Action Model for Dark and Blurred Scenes

cs.CV · 2026-04-06 · conditional · novelty 6.0

E-VLA integrates event streams directly into VLA models via lightweight fusion, raising Pick-Place success from 0% to 60-90% at 20 lux and from 0% to 20-25% under severe motion blur.

citing papers explorer

Showing 30 of 30 citing papers after filters.

RotVLA: Rotational Latent Action for Vision-Language-Action Model cs.RO · 2026-05-13 · unverdicted · none · ref 20 · internal anchor
RotVLA models latent actions as continuous SO(n) rotations with triplet-frame supervision and flow-matching to reach 98.2% success on LIBERO and 89.6%/88.5% on RoboTwin2.0 using a 1.7B-parameter model.
DiscreteRTC: Discrete Diffusion Policies are Natural Asynchronous Executors cs.RO · 2026-04-27 · unverdicted · none · ref 17
Discrete diffusion policies support native asynchronous execution via unmasking for real-time chunking, delivering higher success rates and 0.7x inference cost versus flow-matching RTC on dynamic robotics benchmarks and real pick tasks.
${\pi}_{0.7}$: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities cs.LG · 2026-04-16 · unverdicted · none · ref 52
π₀.₇ is a steerable generalist robotic model that uses rich multimodal prompts including language, subgoal images, and performance metadata to achieve out-of-the-box generalization across tasks and robot bodies.
ScoRe-Flow: Complete Distributional Control via Score-Based Reinforcement Learning for Flow Matching cs.RO · 2026-04-13 · unverdicted · none · ref 2
ScoRe-Flow achieves decoupled mean-variance control in stochastic flow matching by deriving a closed-form score for drift modulation plus learned variance, yielding faster RL convergence and higher success rates on locomotion and manipulation benchmarks.
ViVa: A Video-Generative Value Model for Robot Reinforcement Learning cs.RO · 2026-04-09 · unverdicted · none · ref 16
ViVa turns a video generator into a value model for robot RL that jointly forecasts future states and task value, yielding better performance on real-world box assembly when integrated with RECAP.
Action Images: End-to-End Policy Learning via Multiview Video Generation cs.CV · 2026-04-07 · unverdicted · none · ref 21
Action Images turn robot arm motions into interpretable multiview pixel videos, letting video backbones serve as zero-shot policies for end-to-end robot learning.
Reinforcing VLAs in Task-Agnostic World Models cs.AI · 2026-05-12 · unverdicted · none · ref 11 · internal anchor
RAW-Dream lets VLAs learn new tasks in zero-shot imagination by using a world model pre-trained only on task-free behaviors and an unmodified VLM to supply rewards, with dual-noise verification to limit hallucinations.
TMRL: Diffusion Timestep-Modulated Pretraining Enables Exploration for Efficient Policy Finetuning cs.RO · 2026-05-12 · unverdicted · none · ref 3 · internal anchor
TMRL bridges behavioral cloning pretraining and RL finetuning via diffusion noise and timestep modulation to enable controlled exploration, improving sample efficiency and enabling real-world robot training in under one hour.
LaST-R1: Reinforcing Robotic Manipulation via Adaptive Physical Latent Reasoning cs.RO · 2026-04-30 · unverdicted · none · ref 30 · 2 links · internal anchor
LaST-R1 introduces a RL post-training method called LAPO that optimizes latent Chain-of-Thought reasoning in vision-language-action models, yielding 99.9% success on LIBERO and up to 22.5% real-world gains.
ARM: Advantage Reward Modeling for Long-Horizon Manipulation cs.RO · 2026-04-03 · unverdicted · none · ref 12 · internal anchor
ARM trains reward models on Progressive/Regressive/Stagnant labels to enable adaptive reweighting in offline RL, reaching 99.4% success on towel-folding with minimal human intervention.
Open-Loop Planning, Closed-Loop Verification: Speculative Verification for VLA cs.RO · 2026-04-03 · unverdicted · none · ref 10 · internal anchor
SV-VLA uses infrequent heavy VLA planning of action chunks plus a lightweight closed-loop verifier to achieve both efficiency and robustness in dynamic robot control.
Unified Noise Steering for Efficient Human-Guided VLA Adaptation cs.RO · 2026-05-11 · unverdicted · none · ref 27
UniSteer unifies human corrective actions and noise-space RL for VLA adaptation by inverting actions to noise targets, raising success rates from 20% to 90% in 66 minutes across four real-world manipulation tasks.
RePO-VLA: Recovery-Driven Policy Optimization for Vision-Language-Action Models cs.RO · 2026-05-10 · unverdicted · none · ref 11
RePO-VLA raises average adversarial success rates in VLA manipulation from 20% to 75% by using recovery-aware initialization, a progress-aware semantic value function, and value-conditioned refinement on success and corrective trajectories.
How to utilize failure demo data?: Effective data selection for imitation learning using distribution differences in attention mechanism cs.RO · 2026-05-08 · unverdicted · none · ref 10
The method uses attention discrepancy metrics on latent success-failure representations to select beneficial failure data for imitation learning, raising task success rates in simulations.
Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities cs.AI · 2026-05-07 · unverdicted · none · ref 3 · 2 links
LQL turns n-step action-sequence lower bounds into a practical hinge-loss stabilizer for off-policy Q-learning without extra networks or forward passes.
Seeing Realism from Simulation: Efficient Video Transfer for Vision-Language-Action Data Augmentation cs.CV · 2026-05-04 · unverdicted · none · ref 13
A video transfer pipeline augments simulated VLA data into realistic videos while preserving actions, yielding consistent performance gains on robot benchmarks such as 8% on Robotwin 2.0.
Learning while Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies cs.RO · 2026-05-01 · unverdicted · none · ref 15
Fleet-scale RL framework improves a single generalist VLA policy from deployment data to 95% average success on eight real-world manipulation tasks with 16 dual-arm robots.
PRTS: A Primitive Reasoning and Tasking System via Contrastive Representations cs.AI · 2026-04-30 · unverdicted · none · ref 14
PRTS pretrains VLA models with contrastive goal-conditioned RL to embed goal-reachability probabilities from offline data, yielding SOTA results on robotic benchmarks especially for long-horizon and novel instructions.
RL Token: Bootstrapping Online RL with Vision-Language-Action Models cs.LG · 2026-04-24 · unverdicted · none · ref 4
RL Token enables sample-efficient online RL fine-tuning of large VLAs, delivering up to 3x speed gains and higher success rates on real-robot manipulation tasks within minutes to hours.
FASTER: Value-Guided Sampling for Fast RL cs.LG · 2026-04-21 · unverdicted · none · ref 37
FASTER models multi-candidate denoising as an MDP and trains a value function to filter actions early, delivering the performance of full sampling at lower cost in diffusion RL policies.
VAG: Dual-Stream Video-Action Generation for Embodied Data Synthesis cs.RO · 2026-04-10 · unverdicted · none · ref 27
VAG is a synchronized dual-stream flow-matching framework that generates aligned video-action pairs for synthetic embodied data synthesis and policy pretraining.
ProcVLM: Learning Procedure-Grounded Progress Rewards for Robotic Manipulation cs.RO · 2026-05-09 · unverdicted · none · ref 2
ProcVLM learns procedure-grounded dense progress rewards for robotic manipulation via a reasoning-before-estimation VLM trained on a 60M-frame synthesized corpus from 30 embodied datasets.
Is the Future Compatible? Diagnosing Dynamic Consistency in World Action Models cs.RO · 2026-05-08 · unverdicted · none · ref 18
Action-state consistency in World Action Models distinguishes successful from failed imagined futures and supports value-free selection of better rollouts via consensus among predictions.
Cooptimizing Safety and Performance Using Safety Value-Constrained Model Predictive Control cs.RO · 2026-04-26 · unverdicted · none · ref 1
Augments MPC with a safety value function terminal constraint to achieve recursive feasibility and persistent safety while co-optimizing performance.
VLA Foundry: A Unified Framework for Training Vision-Language-Action Models cs.RO · 2026-04-21 · unverdicted · none · ref 53
VLA Foundry provides a single training stack for VLA models and releases open models that match prior closed-source performance or outperform baselines on multi-task manipulation in simulation.
SpanVLA: Efficient Action Bridging and Learning from Negative-Recovery Samples for Vision-Language-Action Model cs.CV · 2026-04-21 · unverdicted · none · ref 25
SpanVLA reduces action generation latency via flow-matching conditioned on history and improves robustness by training on negative-recovery samples with GRPO and a dedicated reasoning dataset.
CoEnv: Driving Embodied Multi-Agent Collaboration via Compositional Environment cs.RO · 2026-04-07 · unverdicted · none · ref 22
CoEnv introduces a compositional environment that integrates real and simulated spaces for multi-agent robotic collaboration, using real-to-sim reconstruction, VLM action synthesis, and validated sim-to-real transfer to achieve high success rates on multi-arm manipulation tasks.
RLDX-1 Technical Report cs.RO · 2026-05-05 · unverdicted · none · ref 4 · 2 links · internal anchor
RLDX-1 outperforms frontier VLAs such as π0.5 and GR00T N1.6 on dexterous manipulation benchmarks, reaching 86.8% success on ALLEX humanoid tasks versus around 40% for the baselines.
OmniVLA-RL: A Vision-Language-Action Model with Spatial Understanding and Online RL cs.RO · 2026-04-20 · unverdicted · none · ref 12
OmniVLA-RL uses a mix-of-transformers architecture and flow-matching reformulated as SDE with group segmented policy optimization to surpass prior VLA models on LIBERO benchmarks.
Redefining End-of-Life: Intelligent Automation for Electronics Remanufacturing Systems eess.SY · 2026-04-03 · unverdicted · none · ref 134 · internal anchor
A literature review of intelligent automation approaches using robotics, AI, and control for disassembly, inspection, sorting, and reprocessing of end-of-life electronics.

$\pi^{*}_{0.6}$: a VLA That Learns From Experience

hub tools

claims ledger

co-cited works

fields

years

verdicts

representative citing papers

citing papers explorer