Failure-aware rl: Reliable offline-to-online reinforcement learning with self-recovery for real-world manipulation

Huanyu Li, Kun Lei, Sheng Zang, Kaizhe Hu, Yongyuan Liang, Bo An, Xiaoli Li, Huazhe Xu · 2026 · arXiv 2601.07821

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

SafeManip: A Property-Driven Benchmark for Temporal Safety Evaluation in Robotic Manipulation

cs.RO · 2026-05-12 · unverdicted · novelty 8.0

SafeManip is a new benchmark that applies LTLf monitors to assess temporal safety properties across eight categories in robotic manipulation, demonstrating that task success frequently fails to ensure safe execution in vision-language-action policies.

DreamAvoid: Critical-Phase Test-Time Dreaming to Avoid Failures in VLA Policies

cs.RO · 2026-05-12 · unverdicted · novelty 7.0

DreamAvoid uses a Dream Trigger, Action Proposer, and Dream Evaluator trained on success/failure/boundary data to let VLA policies avoid critical-phase failures via test-time future dreaming.

Learning-augmented robotic automation for real-world manufacturing

cs.RO · 2026-04-24 · conditional · novelty 6.0

A learning-augmented robotic system automated deformable cable insertion and soldering on a live electric-motor production line for 5 hours 10 minutes, producing 108 motors at 99.4% pass rate with under 20 minutes of real-world data per task and no physical fencing.

Rule-based High-Level Coaching for Goal-Conditioned Reinforcement Learning in Search-and-Rescue UAV Missions Under Limited-Simulation Training

cs.RO · 2026-04-29 · unverdicted · novelty 4.0

Rule-based high-level guidance combined with goal-conditioned reinforcement learning enables safer and more efficient online adaptation for UAV search-and-rescue tasks under limited simulation training.

citing papers explorer

Showing 4 of 4 citing papers.

SafeManip: A Property-Driven Benchmark for Temporal Safety Evaluation in Robotic Manipulation cs.RO · 2026-05-12 · unverdicted · none · ref 9
SafeManip is a new benchmark that applies LTLf monitors to assess temporal safety properties across eight categories in robotic manipulation, demonstrating that task success frequently fails to ensure safe execution in vision-language-action policies.
DreamAvoid: Critical-Phase Test-Time Dreaming to Avoid Failures in VLA Policies cs.RO · 2026-05-12 · unverdicted · none · ref 34
DreamAvoid uses a Dream Trigger, Action Proposer, and Dream Evaluator trained on success/failure/boundary data to let VLA policies avoid critical-phase failures via test-time future dreaming.
Learning-augmented robotic automation for real-world manufacturing cs.RO · 2026-04-24 · conditional · none · ref 20
A learning-augmented robotic system automated deformable cable insertion and soldering on a live electric-motor production line for 5 hours 10 minutes, producing 108 motors at 99.4% pass rate with under 20 minutes of real-world data per task and no physical fencing.
Rule-based High-Level Coaching for Goal-Conditioned Reinforcement Learning in Search-and-Rescue UAV Missions Under Limited-Simulation Training cs.RO · 2026-04-29 · unverdicted · none · ref 28
Rule-based high-level guidance combined with goal-conditioned reinforcement learning enables safer and more efficient online adaptation for UAV search-and-rescue tasks under limited simulation training.

Failure-aware rl: Reliable offline-to-online reinforcement learning with self-recovery for real-world manipulation

fields

years

verdicts

representative citing papers

citing papers explorer