hub Canonical reference

$\pi^{*}_{0.6}$: a VLA That Learns From Experience

Physical Intelligence, Ali Amin, Raichelle Aniceto, Ashwin Balakrishna, Kevin Black, Ken Conley · 2025 · cs.LG · arXiv 2511.14759

Canonical reference. 76% of citing Pith papers cite this work as background.

97 Pith papers citing it

Background 76% of classified citations

open full Pith review browse 97 citing papers arXiv PDF

abstract

We study how vision-language-action (VLA) models can improve through real-world deployments via reinforcement learning (RL). We present a general-purpose method, RL with Experience and Corrections via Advantage-conditioned Policies (RECAP), that provides for RL training of VLAs via advantage conditioning. Our method incorporates heterogeneous data into the self-improvement process, including demonstrations, data from on-policy collection, and expert teleoperated interventions provided during autonomous execution. RECAP starts by pre-training a generalist VLA with offline RL, which we call $\pi^{*}_{0.6}$, that can then be specialized to attain high performance on downstream tasks through on-robot data collection. We show that the $\pi^{*}_{0.6}$ model trained with the full RECAP method can fold laundry in real homes, reliably assemble boxes, and make espresso drinks using a professional espresso machine. On some of the hardest tasks, RECAP more than doubles task throughput and roughly halves the task failure rate.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 30 method 4 baseline 3

citation-polarity summary

background 28 use method 4 baseline 3 unclear 2

claims ledger

abstract We study how vision-language-action (VLA) models can improve through real-world deployments via reinforcement learning (RL). We present a general-purpose method, RL with Experience and Corrections via Advantage-conditioned Policies (RECAP), that provides for RL training of VLAs via advantage conditioning. Our method incorporates heterogeneous data into the self-improvement process, including demonstrations, data from on-policy collection, and expert teleoperated interventions provided during autonomous execution. RECAP starts by pre-training a generalist VLA with offline RL, which we call $\pi

co-cited works

representative citing papers

Adapting Generalist Robot Policies with Semantic Reinforcement Learning

cs.RO · 2026-06-30 · unverdicted · novelty 7.0

SARL optimizes language prompt inputs to generalist vision-language-action policies through online RL to solve complex long-horizon tasks by composing existing skills.

LIBERO-Safety: A Comprehensive Benchmark for Physical and Semantic Safety in Vision-Language-Action Models

cs.RO · 2026-06-22 · unverdicted · novelty 7.0

LIBERO-Safety supplies a scalable benchmark, data-generation pipeline, and 19,664-demonstration dataset that exposes a generalization-safety tension in current VLA models where diverse training improves collision avoidance but task success stays limited by trajectory quality and semantic understandi

Improving Robotic Generalist Policies via Flow Reversal Steering

cs.RO · 2026-06-11 · unverdicted · novelty 7.0

Flow Reversal Steering steers flow matching generalist policies by reversing suboptimal actions to nearby better modes, enabling improved zero-shot control, quick distillation, and RL bootstrapping in robotic manipulation.

Foresight: Iterative Reasoning About Clues that Matter for Navigation

cs.RO · 2026-06-10 · unverdicted · novelty 7.0

Foresight uses iterative VLM plan proposal and critique with RL from human feedback to raise navigation success 37% and cut interventions 52% in real-world tests.

Ambient Diffusion Policy: Imitation Learning from Suboptimal Data in Robotics

cs.RO · 2026-06-10 · unverdicted · novelty 7.0

Ambient Diffusion Policy enables better imitation learning from suboptimal robot data by leveraging spectral properties to restrict data usage to specific diffusion times.

ReCoVLA: VLM-Guided Reward Compilation for Failure Recovery in Vision-Language-Action Policies

cs.RO · 2026-06-08 · unverdicted · novelty 7.0

ReCoVLA improves VLA policy reliability by using a VLM as a semantic reward selector to train residual recovery policies in simulation, raising average success from 36.7% to 66.7% in sim and achieving 61.7% in zero-shot sim-to-real physical tests.

VoLo: A Physical Orchestrator for Open-Vocabulary Long-Horizon Manipulation

cs.RO · 2026-06-05 · unverdicted · novelty 7.0

VoLoAgent uses a VLM to steer heterogeneous robot capabilities as interruptible tools for long-horizon manipulation and introduces the RoboVoLo benchmark, claiming substantial outperformance over single VLA/VLM or tool-based systems with real-robot validation.

PhAIL: A Real-Robot VLA Benchmark and Distributional Methodology

cs.RO · 2026-05-28 · unverdicted · novelty 7.0

PhAIL provides an open benchmark and distributional evaluation method for real-robot VLA policies using time-to-success CDF, HRT scoring, and KS significance tests.

VGenST-Bench: A Benchmark for Spatio-Temporal Reasoning via Active Video Synthesis

cs.CV · 2026-05-21 · unverdicted · novelty 7.0

VGenST-Bench is a new video benchmark for MLLM spatio-temporal reasoning built via generative synthesis, a multi-agent pipeline with human oversight, a 3x2x2 taxonomy, and hierarchical tasks separating perception from reasoning.

ManiSoft: Towards Vision-Language Manipulation for Soft Continuum Robotics

cs.RO · 2026-05-18 · unverdicted · novelty 7.0

ManiSoft is a new benchmark featuring a soft-body simulator, four deformable control tasks, and an automated pipeline generating 6300 scenes with expert trajectories for training and evaluating vision-language policies on continuum robots.

RotVLA: Rotational Latent Action for Vision-Language-Action Model

cs.RO · 2026-05-13 · unverdicted · novelty 7.0

RotVLA models latent actions as continuous SO(n) rotations with triplet-frame supervision and flow-matching to reach 98.2% success on LIBERO and 89.6%/88.5% on RoboTwin2.0 using a 1.7B-parameter model.

Runtime Monitoring of Perception-Based Autonomous Systems via Embedding Temporal Logic

cs.LG · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Embedding Temporal Logic (ETL) performs runtime monitoring directly in learned embedding spaces using distance-based predicates composed with temporal operators, supported by conformal calibration for reliable predicate evaluation.

Offline Policy Evaluation for Manipulation Policies via Discounted Liveness Formulation

cs.RO · 2026-05-12 · conditional · novelty 7.0

A liveness-based Bellman operator enables conservative offline policy evaluation for manipulation tasks by encoding task progression and reducing truncation bias from finite horizons.

DiscreteRTC: Discrete Diffusion Policies are Natural Asynchronous Executors

cs.RO · 2026-04-27 · unverdicted · novelty 7.0 · 2 refs

Discrete diffusion policies act as natural asynchronous executors for robotics by treating action generation as iterative unmasking, yielding higher success rates and lower computation than flow-matching real-time chunking in dynamic tasks.

PhysMem: Scaling Test-Time Memory for Embodied Physical Reasoning

cs.RO · 2026-02-23 · unverdicted · novelty 7.0

PhysMem enables VLM-based robot planners to learn and verify physical properties through test-time interaction and hypothesis testing, raising success on a brick insertion task from 23% to 76%.

Action-to-Action Flow Matching

cs.RO · 2026-02-07 · unverdicted · novelty 7.0

A2A flow matching starts action generation from prior proprioceptive actions in latent space to enable single-step high-quality predictions in robotic policies.

TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch Guidance

cs.RO · 2026-01-28 · unverdicted · novelty 7.0

TouchGuide improves contact-rich robot manipulation by steering diffusion or flow-matching visuomotor policies with tactile feasibility scores from a contrastively trained Contact Physical Model.

${\pi}_{0.7}$: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities

cs.LG · 2026-04-16 · unverdicted · novelty 7.0

π₀.₇ is a steerable generalist robotic model that uses rich multimodal prompts including language, subgoal images, and performance metadata to achieve out-of-the-box generalization across tasks and robot bodies.

ScoRe-Flow: Complete Distributional Control via Score-Based Reinforcement Learning for Flow Matching

cs.RO · 2026-04-13 · unverdicted · novelty 7.0

ScoRe-Flow achieves decoupled mean-variance control in stochastic flow matching by deriving a closed-form score for drift modulation plus learned variance, yielding faster RL convergence and higher success rates on locomotion and manipulation benchmarks.

ViVa: A Video-Generative Value Model for Robot Reinforcement Learning

cs.RO · 2026-04-09 · unverdicted · novelty 7.0

ViVa turns a video generator into a value model for robot RL that jointly forecasts future states and task value, yielding better performance on real-world box assembly when integrated with RECAP.

Action Images: End-to-End Policy Learning via Multiview Video Generation

cs.CV · 2026-04-07 · unverdicted · novelty 7.0

Action Images turn robot arm motions into interpretable multiview pixel videos, letting video backbones serve as zero-shot policies for end-to-end robot learning.

HNSW with Accuracy Guarantees Using Graph Spanners -- A Technical Report

cs.DB · 2026-07-02 · unverdicted · novelty 6.0

A tiered Certify-then-Rectify system for HNSW that certifies approximate results statistically and falls back to exact recovery by treating the graph as a spanner whose stretch is bounded via extreme value theory.

ROSA: A Robotics Foundation Model Serving System for Robot Factories

cs.RO · 2026-07-01 · unverdicted · novelty 6.0

ROSA introduces shared GPU-pool serving, robotics-aware abstractions for multi-model pipelines, and factory-productivity scheduling that improves output by up to 12.06x over dedicated per-robot systems.

Domain Arithmetic: One-Shot VLA Adaptation under Environmental Shifts

cs.RO · 2026-07-01 · unverdicted · novelty 6.0

DART adapts VLA models to environmental shifts with one demonstration using subspace-aligned weight vector arithmetic.

citing papers explorer

Showing 50 of 97 citing papers.

Adapting Generalist Robot Policies with Semantic Reinforcement Learning cs.RO · 2026-06-30 · unverdicted · none · ref 21 · internal anchor
SARL optimizes language prompt inputs to generalist vision-language-action policies through online RL to solve complex long-horizon tasks by composing existing skills.
LIBERO-Safety: A Comprehensive Benchmark for Physical and Semantic Safety in Vision-Language-Action Models cs.RO · 2026-06-22 · unverdicted · none · ref 1 · internal anchor
LIBERO-Safety supplies a scalable benchmark, data-generation pipeline, and 19,664-demonstration dataset that exposes a generalization-safety tension in current VLA models where diverse training improves collision avoidance but task success stays limited by trajectory quality and semantic understandi
Improving Robotic Generalist Policies via Flow Reversal Steering cs.RO · 2026-06-11 · unverdicted · none · ref 47 · internal anchor
Flow Reversal Steering steers flow matching generalist policies by reversing suboptimal actions to nearby better modes, enabling improved zero-shot control, quick distillation, and RL bootstrapping in robotic manipulation.
Foresight: Iterative Reasoning About Clues that Matter for Navigation cs.RO · 2026-06-10 · unverdicted · none · ref 17 · internal anchor
Foresight uses iterative VLM plan proposal and critique with RL from human feedback to raise navigation success 37% and cut interventions 52% in real-world tests.
Ambient Diffusion Policy: Imitation Learning from Suboptimal Data in Robotics cs.RO · 2026-06-10 · unverdicted · none · ref 11 · internal anchor
Ambient Diffusion Policy enables better imitation learning from suboptimal robot data by leveraging spectral properties to restrict data usage to specific diffusion times.
ReCoVLA: VLM-Guided Reward Compilation for Failure Recovery in Vision-Language-Action Policies cs.RO · 2026-06-08 · unverdicted · none · ref 11 · internal anchor
ReCoVLA improves VLA policy reliability by using a VLM as a semantic reward selector to train residual recovery policies in simulation, raising average success from 36.7% to 66.7% in sim and achieving 61.7% in zero-shot sim-to-real physical tests.
VoLo: A Physical Orchestrator for Open-Vocabulary Long-Horizon Manipulation cs.RO · 2026-06-05 · unverdicted · none · ref 26 · internal anchor
VoLoAgent uses a VLM to steer heterogeneous robot capabilities as interruptible tools for long-horizon manipulation and introduces the RoboVoLo benchmark, claiming substantial outperformance over single VLA/VLM or tool-based systems with real-robot validation.
PhAIL: A Real-Robot VLA Benchmark and Distributional Methodology cs.RO · 2026-05-28 · unverdicted · none · ref 38 · internal anchor
PhAIL provides an open benchmark and distributional evaluation method for real-robot VLA policies using time-to-success CDF, HRT scoring, and KS significance tests.
VGenST-Bench: A Benchmark for Spatio-Temporal Reasoning via Active Video Synthesis cs.CV · 2026-05-21 · unverdicted · none · ref 30 · internal anchor
VGenST-Bench is a new video benchmark for MLLM spatio-temporal reasoning built via generative synthesis, a multi-agent pipeline with human oversight, a 3x2x2 taxonomy, and hierarchical tasks separating perception from reasoning.
ManiSoft: Towards Vision-Language Manipulation for Soft Continuum Robotics cs.RO · 2026-05-18 · unverdicted · none · ref 2 · internal anchor
ManiSoft is a new benchmark featuring a soft-body simulator, four deformable control tasks, and an automated pipeline generating 6300 scenes with expert trajectories for training and evaluating vision-language policies on continuum robots.
RotVLA: Rotational Latent Action for Vision-Language-Action Model cs.RO · 2026-05-13 · unverdicted · none · ref 20 · internal anchor
RotVLA models latent actions as continuous SO(n) rotations with triplet-frame supervision and flow-matching to reach 98.2% success on LIBERO and 89.6%/88.5% on RoboTwin2.0 using a 1.7B-parameter model.
Runtime Monitoring of Perception-Based Autonomous Systems via Embedding Temporal Logic cs.LG · 2026-05-12 · unverdicted · none · ref 26 · 2 links · internal anchor
Embedding Temporal Logic (ETL) performs runtime monitoring directly in learned embedding spaces using distance-based predicates composed with temporal operators, supported by conformal calibration for reliable predicate evaluation.
Offline Policy Evaluation for Manipulation Policies via Discounted Liveness Formulation cs.RO · 2026-05-12 · conditional · none · ref 11 · internal anchor
A liveness-based Bellman operator enables conservative offline policy evaluation for manipulation tasks by encoding task progression and reducing truncation bias from finite horizons.
DiscreteRTC: Discrete Diffusion Policies are Natural Asynchronous Executors cs.RO · 2026-04-27 · unverdicted · none · ref 17 · 2 links · internal anchor
Discrete diffusion policies act as natural asynchronous executors for robotics by treating action generation as iterative unmasking, yielding higher success rates and lower computation than flow-matching real-time chunking in dynamic tasks.
PhysMem: Scaling Test-Time Memory for Embodied Physical Reasoning cs.RO · 2026-02-23 · unverdicted · none · ref 47 · internal anchor
PhysMem enables VLM-based robot planners to learn and verify physical properties through test-time interaction and hypothesis testing, raising success on a brick insertion task from 23% to 76%.
Action-to-Action Flow Matching cs.RO · 2026-02-07 · unverdicted · none · ref 8 · internal anchor
A2A flow matching starts action generation from prior proprioceptive actions in latent space to enable single-step high-quality predictions in robotic policies.
TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch Guidance cs.RO · 2026-01-28 · unverdicted · none · ref 31 · internal anchor
TouchGuide improves contact-rich robot manipulation by steering diffusion or flow-matching visuomotor policies with tactile feasibility scores from a contrastively trained Contact Physical Model.
${\pi}_{0.7}$: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities cs.LG · 2026-04-16 · unverdicted · none · ref 52
π₀.₇ is a steerable generalist robotic model that uses rich multimodal prompts including language, subgoal images, and performance metadata to achieve out-of-the-box generalization across tasks and robot bodies.
ScoRe-Flow: Complete Distributional Control via Score-Based Reinforcement Learning for Flow Matching cs.RO · 2026-04-13 · unverdicted · none · ref 2
ScoRe-Flow achieves decoupled mean-variance control in stochastic flow matching by deriving a closed-form score for drift modulation plus learned variance, yielding faster RL convergence and higher success rates on locomotion and manipulation benchmarks.
ViVa: A Video-Generative Value Model for Robot Reinforcement Learning cs.RO · 2026-04-09 · unverdicted · none · ref 16
ViVa turns a video generator into a value model for robot RL that jointly forecasts future states and task value, yielding better performance on real-world box assembly when integrated with RECAP.
Action Images: End-to-End Policy Learning via Multiview Video Generation cs.CV · 2026-04-07 · unverdicted · none · ref 21
Action Images turn robot arm motions into interpretable multiview pixel videos, letting video backbones serve as zero-shot policies for end-to-end robot learning.
HNSW with Accuracy Guarantees Using Graph Spanners -- A Technical Report cs.DB · 2026-07-02 · unverdicted · none · ref 27 · internal anchor
A tiered Certify-then-Rectify system for HNSW that certifies approximate results statistically and falls back to exact recovery by treating the graph as a spanner whose stretch is bounded via extreme value theory.
ROSA: A Robotics Foundation Model Serving System for Robot Factories cs.RO · 2026-07-01 · unverdicted · none · ref 5 · internal anchor
ROSA introduces shared GPU-pool serving, robotics-aware abstractions for multi-model pipelines, and factory-productivity scheduling that improves output by up to 12.06x over dedicated per-robot systems.
Domain Arithmetic: One-Shot VLA Adaptation under Environmental Shifts cs.RO · 2026-07-01 · unverdicted · none · ref 21 · internal anchor
DART adapts VLA models to environmental shifts with one demonstration using subspace-aligned weight vector arithmetic.
Freeform Preference Learning for Robotic Manipulation cs.RO · 2026-06-30 · unverdicted · none · ref 21 · internal anchor
Freeform Preference Learning trains language-conditioned multi-axis reward models from human pairwise preferences to produce steerable and compositional robot policies that outperform sparse and binary-preference baselines by 38 percentage points.
Z-1: Efficient Reinforcement Learning for Vision-Language-Action Models cs.RO · 2026-06-30 · unverdicted · none · ref 20 · internal anchor
Z-1 uses task-wise GRPO post-training on a flow-based VLA model to reach 80.6% average success across 24 RoboCasa tasks, a 13.2-point gain over its SFT baseline.
Chronos: A Physics-Informed Full-History Framework for Non-Markovian Long-Horizon Manipulation cs.RO · 2026-06-29 · unverdicted · none · ref 7 · internal anchor
Chronos elevates full observation history to the policy's latent state via selective SSM tokens and a Schrödinger-inspired acceleration bridge, achieving large gains on memory-dependent robot tasks with fewer parameters.
STEAM: Self-Supervised Temporal Ensemble Advantage Modeling for Real-World Robot Learning cs.RO · 2026-06-29 · unverdicted · none · ref 5 · internal anchor
STEAM learns advantages from expert trajectories via self-supervised temporal ensemble modeling to improve policy learning on real robot tasks like bimanual folding and pick-and-place.
TAP-VLA: Tactile Annotation Prompting for Vision Language Action Models cs.RO · 2026-06-27 · unverdicted · none · ref 10 · internal anchor
TAP-VLA improves VLA performance in contact-rich manipulation by visually annotating tactile shear fields onto input images, reaching 78% success versus under 50% for vision-only and other tactile methods.
EventVLA: Event-Driven Visual Evidence Memory for Long-Horizon Vision-Language-Action Policies cs.CV · 2026-06-18 · unverdicted · none · ref 2 · internal anchor
EventVLA introduces foundational visual anchors and a Keyframe Evidence Memory module that predicts future keyframe probabilities from VLA embeddings to improve long-horizon task success by an average of 40% on 17 simulation and 4 real-world tasks.
Reversal Q-Learning cs.LG · 2026-06-16 · unverdicted · none · ref 9 · internal anchor
Reversal Q-Learning (RQL) proposes reversing flows for virtual trajectories and bias-variance reduction in an expanded MDP to train flow policies, reporting best average performance on 50 simulated robotic tasks versus prior flow-based offline RL methods.
SERF: Spatiotemporal Environment and Robot Feature Map for Long-Horizon Mobile Manipulation cs.RO · 2026-06-11 · unverdicted · none · ref 7 · internal anchor
SERF conditions VLA policies on online-updated neural point maps of environment and robot to improve long-horizon mobile manipulation on BEHAVIOR-1K.
AIR-VLA+: Decoupling Movement and Manipulation via Cascaded Dual-Action Decoders with Asymmetric MoE for Aerial Robots cs.RO · 2026-06-11 · unverdicted · none · ref 16 · internal anchor
AIR-VLA+ introduces cascaded manipulation and movement decoders plus asymmetric MoE to decouple action scales in aerial manipulation, reporting 48.0 average score and 80.2% task completion gain over single-head baseline on AIR-VLA benchmark.
SARM2: Multi-Task Stage Aware Reward Modeling for Self Improving Robotic Manipulation cs.RO · 2026-06-09 · unverdicted · none · ref 7 · internal anchor
SARM2 presents RM, a multi-task stage-aware reward model achieving 80% lower value-estimation MSE, which when used in SPIRAL boosts manipulation task success from ~50% to near-perfect on several benchmarks.
TORL-VLA: Tactile Guided Online Reinforcement Learning for Contact-Rich Manipulation cs.RO · 2026-06-08 · unverdicted · none · ref 16 · internal anchor
TORL-VLA couples a tactile wrench-aware VLA policy with a lightweight online RL module and an intervention-censored critic to improve success and efficiency on contact-rich robotic tasks.
What Matters When Cotraining Robot Manipulation Policies on Everyday Human Videos? cs.RO · 2026-06-04 · unverdicted · none · ref 37 · internal anchor
Cotraining on 532 everyday human videos with accurate hand labels improves robot policies by 29.7% when networks specialize to human versus robot embodiments.
FlowPRO: Reward-Free Reinforced Fine-Tuning of Flow-Matching VLAs via Proximalized Preference Optimization cs.RO · 2026-06-03 · unverdicted · none · ref 17 · internal anchor
FlowPRO applies proximalized preference optimization to flow-matching VLAs with intervention-rollback data to reach higher success rates on long-horizon bimanual tasks without rewards or critics.
Set-Supervised Diffusion Policy: Learning Action-Chunking Diffusion through Corrections cs.RO · 2026-06-01 · unverdicted · none · ref 1 · internal anchor
SDP constructs sets of desired action-chunks from human correction pairs and trains diffusion policies to align with those sets, yielding better performance and robustness than standard behavior cloning on robotic tasks.
Hide-and-Seek in Trajectories: Discovering Failure Signals for VLA Runtime Monitoring cs.RO · 2026-05-29 · unverdicted · none · ref 3 · internal anchor
Hide-and-Seek uses contrastive objectives on trajectories to localize failure signals in VLA models from trajectory-level supervision alone.
Feat2Go: Visual Feature-Grounded Value Estimation for Embodied Reinforcement Learning cs.RO · 2026-05-29 · unverdicted · none · ref 7 · internal anchor
Feat2Go uses patch-level similarity from a visual world model and trend-based clustering to create progress targets for training value models that improve reward shaping in embodied RL for VLA policies, yielding large gains on ManiSkill3 and RoboTwin benchmarks.
ParkingWorld: End-to-End Autonomous Parking Reinforcement Learning from Corrective Experience in 3DGS Simulation cs.RO · 2026-05-24 · unverdicted · none · ref 16 · internal anchor
CIL-SERL integrates a hierarchical replay buffer with human corrective interventions into RL for autonomous parking in 3DGS simulation, reporting gains in success rate, efficiency, and safety on sim and real vehicle.
Beyond Action Residuals: Real-World Robot Policy Steering via Bottleneck Latent Reinforcement Learning cs.RO · 2026-05-19 · unverdicted · none · ref 18 · internal anchor
ZPRL adapts frozen flow-matching imitation policies via RL perturbations on a task-relevant bottleneck latent, yielding 33.7% higher average success on four real-world manipulation tasks than action-residual baselines.
Hand-in-the-Loop: Improving VLA Policies for Dexterous Manipulation via Seamless Hand-Arm Intervention cs.RO · 2026-05-14 · unverdicted · none · ref 9 · 2 links · internal anchor
HandITL enables seamless human intervention in VLA policies for bimanual dexterous manipulation, cutting jitter by 99.8% and improving refined policies by 19% over standard teleoperation.
Reinforcing VLAs in Task-Agnostic World Models cs.AI · 2026-05-12 · unverdicted · none · ref 11 · 2 links · internal anchor
RAW-Dream disentangles world-model learning from task data by using a pre-trained task-agnostic world model and VLM rewards, with dual-noise filtering, to enable zero-shot VLA adaptation in simulation and real settings.
TMRL: Diffusion Timestep-Modulated Pretraining Enables Exploration for Efficient Policy Finetuning cs.RO · 2026-05-12 · unverdicted · none · ref 3 · internal anchor
TMRL bridges behavioral cloning pretraining and RL finetuning via diffusion noise and timestep modulation to enable controlled exploration, improving sample efficiency and enabling real-world robot training in under one hour.
Learning While Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies cs.RO · 2026-05-01 · unverdicted · none · ref 15 · 2 links · internal anchor
LWD is a fleet-scale offline-to-online RL framework that continually improves pretrained VLA policies using autonomous rollouts and human interventions, reaching 95% average success on real-world manipulation tasks.
LaST-R1: Reinforcing Robotic Manipulation via Adaptive Physical Latent Reasoning cs.RO · 2026-04-30 · unverdicted · none · ref 30 · 2 links · internal anchor
LaST-R1 introduces a RL post-training method called LAPO that optimizes latent Chain-of-Thought reasoning in vision-language-action models, yielding 99.9% success on LIBERO and up to 22.5% real-world gains.
Multi-View Video Diffusion Policy: A 3D Spatio-Temporal-Aware Video Action Model cs.RO · 2026-04-03 · conditional · none · ref 1 · internal anchor
MV-VDP jointly predicts multi-view RGB and heatmap videos via diffusion to achieve data-efficient, robust robotic manipulation policies.
ARM: Advantage Reward Modeling for Long-Horizon Manipulation cs.RO · 2026-04-03 · unverdicted · none · ref 12 · internal anchor
ARM trains reward models on Progressive/Regressive/Stagnant labels to enable adaptive reweighting in offline RL, reaching 99.4% success on towel-folding with minimal human intervention.
Open-Loop Planning, Closed-Loop Verification: Speculative Verification for VLA cs.RO · 2026-04-03 · unverdicted · none · ref 10 · internal anchor
SV-VLA uses infrequent heavy VLA planning of action chunks plus a lightweight closed-loop verifier to achieve both efficiency and robustness in dynamic robot control.

$\pi^{*}_{0.6}$: a VLA That Learns From Experience

hub tools

citation-role summary

citation-polarity summary

claims ledger

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer