VLA-Corrector adds a detect-and-correct inference layer using a latent vision monitor and online gradient guidance to enable adaptive action horizons in chunked VLA policies.
Rac: Robot learning for long-horizon tasks by scaling recovery and correction
12 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 12roles
background 4polarities
background 4representative citing papers
ReGuide is a self-improving framework that uses phase-conditioned guidance to generate corrective rollouts and absorbs successful ones back into diffusion policy training, yielding 1.3-7.7x success gains on Robomimic tasks.
SDP constructs sets of desired action-chunks from human correction pairs and trains diffusion policies to align with those sets, yielding better performance and robustness than standard behavior cloning on robotic tasks.
HandITL enables seamless human intervention in VLA policies for bimanual dexterous manipulation, cutting jitter by 99.8% and improving refined policies by 19% over standard teleoperation.
UniSteer unifies human corrective actions and noise-space RL for VLA adaptation by inverting actions to noise targets, raising success rates from 20% to 90% in 66 minutes across four real-world manipulation tasks.
TAMEn supplies a cross-morphology wearable interface and pyramid-structured visuo-tactile data regime that raises bimanual manipulation success rates from 34% to 75% via closed-loop collection.
ARM trains reward models on Progressive/Regressive/Stagnant labels to enable adaptive reweighting in offline RL, reaching 99.4% success on towel-folding with minimal human intervention.
RISE combines a controllable dynamics model and progress value model into a closed-loop self-improving pipeline that updates robot policies entirely in imagination, reporting over 35% absolute gains on three real-world tasks.
RECAP enables a generalist VLA to self-improve via advantage-conditioned RL on mixed real-world data, more than doubling throughput and halving failure rates on hard manipulation tasks.
DexPIE improves dexterous manipulation success rates by 37% over demo policies via real-world experience collection with adapted intervention, multi-stage DAgger, asynchronous relative-action inference, and optimality conditioning.
DeMaVLA is a VLA foundation model using a pruned action expert and flow matching, pre-trained on 5000 hours of real demonstrations and post-trained on multi-task folding data with human-in-the-loop correction, reporting competitive benchmark and real-world folding performance.
FAR combines failure-contrastive preference adaptation with action perturbations for test-time recovery and continual policy improvement, reporting 17.6% and 11.7% success gains over diffusion policies in simulation and real-world manipulation tasks.
citing papers explorer
-
$\pi^{*}_{0.6}$: a VLA That Learns From Experience
RECAP enables a generalist VLA to self-improve via advantage-conditioned RL on mixed real-world data, more than doubling throughput and halving failure rates on hard manipulation tasks.