Failure-Aware RL: Reliable Offline- to-Online Reinforcement Learning with Self-Recovery for Real-World Manipulation

Huanyu Li, Kun Lei, Sheng Zang, Kaizhe Hu, Yongyuan Liang, Bo An, Xiaoli Li, Huazhe Xu · 2026 · arXiv 2601.07821

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 1 other 1

citation-polarity summary

background 1 unclear 1

representative citing papers

SafeManip: A Property-Driven Benchmark for Temporal Safety Evaluation in Robotic Manipulation

cs.RO · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

SafeManip is a benchmark applying reusable LTLf templates across eight safety categories to evaluate temporal properties in robotic manipulation on VLA policies.

DreamAvoid: Critical-Phase Test-Time Dreaming to Avoid Failures in VLA Policies

cs.RO · 2026-05-12 · unverdicted · novelty 7.0

DreamAvoid uses a Dream Trigger, Action Proposer, and Dream Evaluator trained on success/failure/boundary data to let VLA policies avoid critical-phase failures via test-time future dreaming.

AEGIS: A Backup Reflex for Physical AI

cs.AI · 2026-06-04 · unverdicted · novelty 6.0

AEGIS uses activation probes for early-warning detection of high-risk steps in weak policies and selectively escalates to stronger policies, recovering 10.1% of lost trajectories on LIBERO-Spatial while activating the strong policy on only 38% of steps.

Beyond Action Residuals: Real-World Robot Policy Steering via Bottleneck Latent Reinforcement Learning

cs.RO · 2026-05-19 · unverdicted · novelty 6.0

ZPRL adapts frozen flow-matching imitation policies via RL perturbations on a task-relevant bottleneck latent, yielding 33.7% higher average success on four real-world manipulation tasks than action-residual baselines.

Learning-augmented robotic automation for real-world manufacturing

cs.RO · 2026-04-24 · conditional · novelty 6.0

A learning-augmented robotic system automated deformable cable insertion and soldering on a live electric-motor production line for 5 hours 10 minutes, producing 108 motors at 99.4% pass rate with under 20 minutes of real-world data per task and no physical fencing.

FAR: Failure-Aware Retry for Test-Time Recovery and Continual Policy Improvement

cs.RO · 2026-07-01 · unverdicted · novelty 4.0

FAR combines failure-contrastive preference adaptation with action perturbations for test-time recovery and continual policy improvement, reporting 17.6% and 11.7% success gains over diffusion policies in simulation and real-world manipulation tasks.

Rule-based High-Level Coaching for Goal-Conditioned Reinforcement Learning in Search-and-Rescue UAV Missions Under Limited-Simulation Training

cs.RO · 2026-04-29 · unverdicted · novelty 4.0

Rule-based high-level guidance combined with goal-conditioned reinforcement learning enables safer and more efficient online adaptation for UAV search-and-rescue tasks under limited simulation training.

citing papers explorer

Showing 1 of 1 citing paper after filters.

AEGIS: A Backup Reflex for Physical AI cs.AI · 2026-06-04 · unverdicted · none · ref 16
AEGIS uses activation probes for early-warning detection of high-risk steps in weak policies and selectively escalates to stronger policies, recovering 10.1% of lost trajectories on LIBERO-Spatial while activating the strong policy on only 38% of steps.

Failure-Aware RL: Reliable Offline- to-Online Reinforcement Learning with Self-Recovery for Real-World Manipulation

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer