hub Mixed citations

What Matters in Learning from Offline Human Demonstrations for Robot Manipulation

Ajay Mandlekar, Danfei Xu, Josiah Wong, Soroush Nasiriany, Chen Wang, Rohun Kulkarni · 2021 · cs.RO · arXiv 2108.03298

Mixed citation behavior. Most common role is background (42%).

50 Pith papers citing it

Background 42% of classified citations

open full Pith review browse 50 citing papers arXiv PDF

abstract

Imitating human demonstrations is a promising approach to endow robots with various manipulation capabilities. While recent advances have been made in imitation learning and batch (offline) reinforcement learning, a lack of open-source human datasets and reproducible learning methods make assessing the state of the field difficult. In this paper, we conduct an extensive study of six offline learning algorithms for robot manipulation on five simulated and three real-world multi-stage manipulation tasks of varying complexity, and with datasets of varying quality. Our study analyzes the most critical challenges when learning from offline human data for manipulation. Based on the study, we derive a series of lessons including the sensitivity to different algorithmic design choices, the dependence on the quality of the demonstrations, and the variability based on the stopping criteria due to the different objectives in training and evaluation. We also highlight opportunities for learning from human datasets, such as the ability to learn proficient policies on challenging, multi-stage tasks beyond the scope of current reinforcement learning methods, and the ability to easily scale to natural, real-world manipulation scenarios where only raw sensory signals are available. We have open-sourced our datasets and all algorithm implementations to facilitate future research and fair comparisons in learning from human demonstration data. Codebase, datasets, trained models, and more available at https://arise-initiative.github.io/robomimic-web/

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 10 method 5 baseline 2 dataset 2

citation-polarity summary

background 8 use method 5 baseline 2 unclear 2 support 1 use dataset 1

representative citing papers

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

cs.CL · 2023-09-28 · unverdicted · novelty 8.0

Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.

LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning

cs.AI · 2023-06-05 · conditional · novelty 8.0

LIBERO is a new benchmark for lifelong robot learning that evaluates transfer of declarative, procedural, and mixed knowledge across 130 manipulation tasks with provided demonstration data.

WorldVLN: Autoregressive World Action Model for Aerial Vision-Language Navigation

cs.RO · 2026-05-15 · unverdicted · novelty 7.0

WorldVLN proposes the first autoregressive world action model for aerial vision-language navigation that predicts short-horizon latent world states, decodes them to waypoints in closed loop, and uses two-stage training with Action-aware GRPO to achieve over 12% success-rate gains on benchmarks plus零

Support-Safe Variational Hybrid Filtering for Contact-Mode and Sparse-Law Recovery

cs.RO · 2026-05-12 · unverdicted · novelty 7.0

VHYDRO is a support-safe variational hybrid filter that jointly recovers continuous latent states, discrete contact modes, and sparse port-Hamiltonian laws per regime while preventing loss of feasible transitions.

Aligning Flow Map Policies with Optimal Q-Guidance

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

Flow map policies enable fast one-step inference for flow-based RL policies, and FMQ provides an optimal closed-form Q-guided target for offline-to-online adaptation under trust-region constraints, achieving SOTA performance.

Offline Policy Evaluation for Manipulation Policies via Discounted Liveness Formulation

cs.RO · 2026-05-12 · conditional · novelty 7.0

A liveness-based Bellman operator enables conservative offline policy evaluation for manipulation tasks by encoding task progression and reducing truncation bias from finite horizons.

Beyond Isolation: A Unified Benchmark for General-Purpose Navigation

cs.RO · 2026-05-10 · unverdicted · novelty 7.0

OmniNavBench is a unified benchmark for general-purpose navigation featuring composite multi-skill instructions, support for humanoid, quadrupedal and wheeled robots, and 1779 human teleoperated trajectories across 170 environments.

ScoRe-Flow: Complete Distributional Control via Score-Based Reinforcement Learning for Flow Matching

cs.RO · 2026-04-13 · unverdicted · novelty 7.0

ScoRe-Flow achieves decoupled mean-variance control in stochastic flow matching by deriving a closed-form score for drift modulation plus learned variance, yielding faster RL convergence and higher success rates on locomotion and manipulation benchmarks.

Referring-Aware Visuomotor Policy Learning for Closed-Loop Manipulation

cs.RO · 2026-04-07 · unverdicted · novelty 7.0

ReV is a referring-aware visuomotor policy using coupled diffusion heads for real-time trajectory replanning in robotic manipulation, trained solely via targeted perturbations to expert demonstrations and achieving higher success rates in simulated and real tasks.

You've Got a Golden Ticket: Improving Generative Robot Policies With A Single Noise Vector

cs.RO · 2026-03-16 · conditional · novelty 7.0

Optimizing a single constant initial noise vector for frozen generative robot policies improves success rates on 38 of 43 tasks by up to 58% relative improvement.

When to Act, Ask, or Learn: Uncertainty-Aware Policy Steering

cs.RO · 2026-02-25 · unverdicted · novelty 7.0

UPS framework uses conformal prediction to calibrate VLM verifiers for choosing between high-confidence action execution, natural language task queries, or policy interventions, then applies residual learning from interventions to continually improve the base policy with minimal feedback.

Learning Physics from Pretrained Video Models: A Multimodal Continuous and Sequential World Interaction Models for Robotic Manipulation

cs.RO · 2026-02-18 · unverdicted · novelty 7.0

PhysGen uses video models to learn physics for robots, outperforming baselines by up to 13.8% on Libero and matching specialized models in real-world tasks.

Steering Your Diffusion Policy with Latent Space Reinforcement Learning

cs.RO · 2025-06-18 · unverdicted · novelty 7.0

DSRL steers pretrained diffusion policies for robotics by applying RL to their latent noise inputs, achieving sample-efficient real-world adaptation with only black-box access.

One Step Diffusion via Shortcut Models

cs.LG · 2024-10-16 · conditional · novelty 7.0

Shortcut models enable high-quality single or few-step sampling in diffusion models with one network and training phase by conditioning on desired step size.

Any-point Trajectory Modeling for Policy Learning

cs.RO · 2023-12-28 · conditional · novelty 7.0

ATM pre-trains models to predict trajectories of any points in videos, then uses those predictions to learn strong visuomotor policies from minimal action labels, beating baselines by 80% on 130+ tasks.

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

cs.RO · 2023-10-13 · unverdicted · novelty 7.0

A collaborative dataset spanning 22 robots and 527 skills enables RT-X models that transfer capabilities across different robot embodiments.

VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training

cs.RO · 2022-09-30 · unverdicted · novelty 7.0

VIP learns a visual embedding from human videos whose distance defines dense, smooth rewards for arbitrary goal-image robot tasks without task-specific fine-tuning.

HITL-D: Human In The Loop Diffusion Assisted Shared Control

cs.RO · 2026-05-20 · unverdicted · novelty 6.0

HITL-D combines diffusion policies with human input for shared robotic control, reducing required joystick axes and improving speed and workload in manipulation tasks per a 12-participant study.

COBALT: Crowdsourcing Robot Learning via Cloud-Based Teleoperation with Smartphones

cs.RO · 2026-05-18 · conditional · novelty 6.0 · 2 refs

COBALT enables scalable crowdsourced teleoperation of robots using smartphones, supporting concurrent users with low latency and yielding a 7500+ demonstration dataset validated on imitation learning tasks.

DexJoCo: A Benchmark and Toolkit for Task-Oriented Dexterous Manipulation on MuJoCo

cs.RO · 2026-05-15 · conditional · novelty 6.0

DexJoCo is a benchmark and toolkit with 11 functionally grounded tasks, 1.1K trajectories, and empirical benchmarks for task-oriented dexterous manipulation on MuJoCo.

SID: Sliding into Distribution for Robust Few-Demonstration Manipulation

cs.RO · 2026-05-13 · unverdicted · novelty 6.0

SID achieves approximately 90% success on six real-world manipulation tasks with only two demonstrations under out-of-distribution initializations, with less than 10% performance drop under distractors and disturbances.

Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities

cs.AI · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

LQL turns n-step action-sequence lower bounds into a practical hinge-loss stabilizer for off-policy Q-learning without extra networks or forward passes.

Learning Reactive Dexterous Grasping via Hierarchical Task-Space RL Planning and Joint-Space QP Control

cs.RO · 2026-05-05 · conditional · novelty 6.0

A multi-agent RL high-level planner outputs task-space velocities that a GPU-parallel QP low-level controller converts to joint velocities while enforcing limits and collisions, yielding robust sim-to-real dexterous grasping with zero-shot steerability.

An Efficient Metric for Data Quality Measurement in Imitation Learning

cs.RO · 2026-05-02 · unverdicted · novelty 6.0

Power spectral density of trajectories ranks demonstration quality for imitation learning, enabling rollout-free curation that improves fine-tuned policy success.

citing papers explorer

Showing 50 of 50 citing papers.

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution cs.CL · 2023-09-28 · unverdicted · none · ref 221 · internal anchor
Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.
LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning cs.AI · 2023-06-05 · conditional · none · ref 42 · internal anchor
LIBERO is a new benchmark for lifelong robot learning that evaluates transfer of declarative, procedural, and mixed knowledge across 130 manipulation tasks with provided demonstration data.
WorldVLN: Autoregressive World Action Model for Aerial Vision-Language Navigation cs.RO · 2026-05-15 · unverdicted · none · ref 28 · internal anchor
WorldVLN proposes the first autoregressive world action model for aerial vision-language navigation that predicts short-horizon latent world states, decodes them to waypoints in closed loop, and uses two-stage training with Action-aware GRPO to achieve over 12% success-rate gains on benchmarks plus零
Support-Safe Variational Hybrid Filtering for Contact-Mode and Sparse-Law Recovery cs.RO · 2026-05-12 · unverdicted · none · ref 38 · internal anchor
VHYDRO is a support-safe variational hybrid filter that jointly recovers continuous latent states, discrete contact modes, and sparse port-Hamiltonian laws per regime while preventing loss of feasible transitions.
Aligning Flow Map Policies with Optimal Q-Guidance cs.LG · 2026-05-12 · unverdicted · none · ref 28 · internal anchor
Flow map policies enable fast one-step inference for flow-based RL policies, and FMQ provides an optimal closed-form Q-guided target for offline-to-online adaptation under trust-region constraints, achieving SOTA performance.
Offline Policy Evaluation for Manipulation Policies via Discounted Liveness Formulation cs.RO · 2026-05-12 · conditional · none · ref 18 · internal anchor
A liveness-based Bellman operator enables conservative offline policy evaluation for manipulation tasks by encoding task progression and reducing truncation bias from finite horizons.
Beyond Isolation: A Unified Benchmark for General-Purpose Navigation cs.RO · 2026-05-10 · unverdicted · none · ref 20 · internal anchor
OmniNavBench is a unified benchmark for general-purpose navigation featuring composite multi-skill instructions, support for humanoid, quadrupedal and wheeled robots, and 1779 human teleoperated trajectories across 170 environments.
ScoRe-Flow: Complete Distributional Control via Score-Based Reinforcement Learning for Flow Matching cs.RO · 2026-04-13 · unverdicted · none · ref 14 · internal anchor
ScoRe-Flow achieves decoupled mean-variance control in stochastic flow matching by deriving a closed-form score for drift modulation plus learned variance, yielding faster RL convergence and higher success rates on locomotion and manipulation benchmarks.
Referring-Aware Visuomotor Policy Learning for Closed-Loop Manipulation cs.RO · 2026-04-07 · unverdicted · none · ref 21 · internal anchor
ReV is a referring-aware visuomotor policy using coupled diffusion heads for real-time trajectory replanning in robotic manipulation, trained solely via targeted perturbations to expert demonstrations and achieving higher success rates in simulated and real tasks.
You've Got a Golden Ticket: Improving Generative Robot Policies With A Single Noise Vector cs.RO · 2026-03-16 · conditional · none · ref 21 · internal anchor
Optimizing a single constant initial noise vector for frozen generative robot policies improves success rates on 38 of 43 tasks by up to 58% relative improvement.
When to Act, Ask, or Learn: Uncertainty-Aware Policy Steering cs.RO · 2026-02-25 · unverdicted · none · ref 16 · internal anchor
UPS framework uses conformal prediction to calibrate VLM verifiers for choosing between high-confidence action execution, natural language task queries, or policy interventions, then applies residual learning from interventions to continually improve the base policy with minimal feedback.
Learning Physics from Pretrained Video Models: A Multimodal Continuous and Sequential World Interaction Models for Robotic Manipulation cs.RO · 2026-02-18 · unverdicted · none · ref 41 · internal anchor
PhysGen uses video models to learn physics for robots, outperforming baselines by up to 13.8% on Libero and matching specialized models in real-world tasks.
Steering Your Diffusion Policy with Latent Space Reinforcement Learning cs.RO · 2025-06-18 · unverdicted · none · ref 84 · internal anchor
DSRL steers pretrained diffusion policies for robotics by applying RL to their latent noise inputs, achieving sample-efficient real-world adaptation with only black-box access.
One Step Diffusion via Shortcut Models cs.LG · 2024-10-16 · conditional · none · ref 18 · internal anchor
Shortcut models enable high-quality single or few-step sampling in diffusion models with one network and training phase by conditioning on desired step size.
Any-point Trajectory Modeling for Policy Learning cs.RO · 2023-12-28 · conditional · none · ref 30 · internal anchor
ATM pre-trains models to predict trajectories of any points in videos, then uses those predictions to learn strong visuomotor policies from minimal action labels, beating baselines by 80% on 130+ tasks.
Open X-Embodiment: Robotic Learning Datasets and RT-X Models cs.RO · 2023-10-13 · unverdicted · none · ref 90 · internal anchor
A collaborative dataset spanning 22 robots and 527 skills enables RT-X models that transfer capabilities across different robot embodiments.
VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training cs.RO · 2022-09-30 · unverdicted · none · ref 16 · internal anchor
VIP learns a visual embedding from human videos whose distance defines dense, smooth rewards for arbitrary goal-image robot tasks without task-specific fine-tuning.
HITL-D: Human In The Loop Diffusion Assisted Shared Control cs.RO · 2026-05-20 · unverdicted · none · ref 28 · internal anchor
HITL-D combines diffusion policies with human input for shared robotic control, reducing required joystick axes and improving speed and workload in manipulation tasks per a 12-participant study.
COBALT: Crowdsourcing Robot Learning via Cloud-Based Teleoperation with Smartphones cs.RO · 2026-05-18 · conditional · none · ref 36 · 2 links · internal anchor
COBALT enables scalable crowdsourced teleoperation of robots using smartphones, supporting concurrent users with low latency and yielding a 7500+ demonstration dataset validated on imitation learning tasks.
DexJoCo: A Benchmark and Toolkit for Task-Oriented Dexterous Manipulation on MuJoCo cs.RO · 2026-05-15 · conditional · none · ref 24 · internal anchor
DexJoCo is a benchmark and toolkit with 11 functionally grounded tasks, 1.1K trajectories, and empirical benchmarks for task-oriented dexterous manipulation on MuJoCo.
SID: Sliding into Distribution for Robust Few-Demonstration Manipulation cs.RO · 2026-05-13 · unverdicted · none · ref 32 · internal anchor
SID achieves approximately 90% success on six real-world manipulation tasks with only two demonstrations under out-of-distribution initializations, with less than 10% performance drop under distractors and disturbances.
Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities cs.AI · 2026-05-07 · unverdicted · none · ref 26 · 2 links · internal anchor
LQL turns n-step action-sequence lower bounds into a practical hinge-loss stabilizer for off-policy Q-learning without extra networks or forward passes.
Learning Reactive Dexterous Grasping via Hierarchical Task-Space RL Planning and Joint-Space QP Control cs.RO · 2026-05-05 · conditional · none · ref 37 · internal anchor
A multi-agent RL high-level planner outputs task-space velocities that a GPU-parallel QP low-level controller converts to joint velocities while enforcing limits and collisions, yielding robust sim-to-real dexterous grasping with zero-shot steerability.
An Efficient Metric for Data Quality Measurement in Imitation Learning cs.RO · 2026-05-02 · unverdicted · none · ref 17 · internal anchor
Power spectral density of trajectories ranks demonstration quality for imitation learning, enabling rollout-free curation that improves fine-tuned policy success.
Learning from the Best: Smoothness-Driven Metrics for Data Quality in Imitation Learning cs.RO · 2026-04-24 · unverdicted · none · ref 6 · internal anchor
RINSE scores robot demonstration trajectories for smoothness via SAL and TED metrics to curate higher-quality data for behavioral cloning, improving success rates with less data on benchmarks and real robots.
VADF: Vision-Adaptive Diffusion Policy Framework for Efficient Robotic Manipulation cs.RO · 2026-04-17 · unverdicted · none · ref 21 · internal anchor
VADF adds an Adaptive Loss Network for hard-negative training sampling and a Hierarchical Vision Task Segmenter for adaptive noise scheduling during inference to speed convergence and reduce timeouts in diffusion robotic policies.
WARPED: Wrist-Aligned Rendering for Robot Policy Learning from Egocentric Human Demonstrations cs.RO · 2026-04-12 · unverdicted · none · ref 72 · internal anchor
WARPED synthesizes realistic wrist-view observations from monocular egocentric human videos via foundation models, hand-object tracking, retargeting, and Gaussian Splatting to train visuomotor policies that match teleoperation success rates on five tabletop tasks with 5-8x less collection effort.
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning cs.RO · 2025-09-11 · conditional · none · ref 4 · internal anchor
SimpleVLA-RL applies tailored reinforcement learning to VLA models, reaching SoTA on LIBERO, outperforming π₀ on RoboTwin, and surpassing SFT in real-world tasks while reducing data needs and identifying a 'pushcut' phenomenon.
Reinforcement Learning with Action Chunking cs.LG · 2025-07-10 · unverdicted · none · ref 42 · internal anchor
Q-chunking improves offline-to-online RL sample efficiency on long-horizon sparse-reward manipulation tasks by applying action chunking to TD learning.
DexWild: Dexterous Human Interactions for In-the-Wild Robot Policies cs.RO · 2025-05-12 · unverdicted · none · ref 25 · internal anchor
DexWild co-trains dexterous robot policies on in-the-wild human hand interactions recorded with a low-cost system and limited robot data, achieving 68.5% success in unseen environments and 5.8x better cross-embodiment generalization.
J-PARSE: Jacobian-based Projection Algorithm for Resolving Singularities Effectively in Inverse Kinematic Control of Serial Manipulators cs.RO · 2025-05-01 · unverdicted · none · ref 28 · internal anchor
J-PARSE modifies the Jacobian via aspect-ratio thresholding and directional projection to enable stable first-order inverse kinematic velocity control through kinematic singularities in serial manipulators.
Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets cs.RO · 2025-04-03 · unverdicted · none · ref 32 · internal anchor
Unified World Models couple video and action diffusion inside one transformer with independent timesteps, enabling pretraining on heterogeneous robot datasets that include action-free video and producing more generalizable policies than imitation learning alone.
Unified Video Action Model cs.RO · 2025-02-28 · unverdicted · none · ref 33 · internal anchor
UVA learns a joint video-action latent representation with decoupled diffusion decoding heads, enabling a single model to perform accurate fast policy learning, forward/inverse dynamics, and video generation without performance loss versus task-specific methods.
RoboMD: Uncovering Robot Vulnerabilities through Semantic Potential Fields cs.RO · 2024-12-03 · unverdicted · none · ref 49 · internal anchor
A deep RL vulnerability-prediction policy trained in semantic embedding space finds up to 23% more unique robot manipulation failures than vision-language baselines and enables more efficient fine-tuning.
Diffusion Policy Policy Optimization cs.RO · 2024-09-01 · unverdicted · none · ref 60 · internal anchor
DPPO fine-tunes diffusion policies via policy gradients and outperforms prior RL approaches for diffusion policies and PG-tuned alternatives on robot benchmarks while enabling stable training and hardware deployment.
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset cs.RO · 2024-03-19 · accept · none · ref 37 · internal anchor
DROID is a new 76k-trajectory in-the-wild robot manipulation dataset spanning 564 scenes and 84 tasks that improves policy performance and generalization when used for training.
Jointly Learning Predicates and Actions Enables Zero-Shot Skill Composition cs.RO · 2026-05-20 · unverdicted · none · ref 4 · internal anchor
PACTS jointly model action trajectories and predicate belief trajectories in a single generative policy, enabling zero-shot skill composition via symbolic planning without retraining.
Drift Flow Matching cs.LG · 2026-05-17 · unverdicted · none · ref 69 · internal anchor
Drift Flow Matching connects direct transport maps from Drift Models with flow-based iterative refinement to enable adaptive computation in generative modeling.
To Do or Not to Do: Ensuring the Safety of Visuomotor Policies Learned from Demonstrations cs.RO · 2026-05-02 · unverdicted · none · ref 21 · internal anchor
Execution guarantee certifies safe regions for IL policies via view synthesis and set invariance so that maximum task success is assured from within those regions even under small execution changes.
Gated Memory Policy cs.RO · 2026-04-21 · unverdicted · none · ref 36 · internal anchor
GMP selectively activates and represents memory via a gate and lightweight cross-attention, yielding 30.1% higher success on non-Markovian robotic tasks while staying competitive on Markovian ones.
Efficient Hierarchical Implicit Flow Q-learning for Offline Goal-conditioned Reinforcement Learning cs.LG · 2026-04-10 · unverdicted · none · ref 7 · internal anchor
Proposes mean flow policies and LeJEPA loss to overcome Gaussian policy limits and weak subgoal generation in hierarchical offline GCRL, reporting strong results on OGBench state and pixel tasks.
World Action Models: The Next Frontier in Embodied AI cs.RO · 2026-05-12 · unverdicted · none · ref 223 · internal anchor
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.
StereoPolicy: Improving Robotic Manipulation Policies via Stereo Perception cs.RO · 2026-05-11 · unreviewed · ref 85 · internal anchor
DexSynRefine: Synthesizing and Refining Human-Object Interaction Motion for Physically Feasible Dexterous Robot Actions cs.RO · 2026-05-07 · unreviewed · ref 2 · internal anchor
When Life Gives You BC, Make Q-functions: Extracting Q-values from Behavior Cloning for On-Robot Reinforcement Learning cs.RO · 2026-05-06 · unreviewed · ref 62 · internal anchor
OGPO: Sample Efficient Full-Finetuning of Generative Control Policies cs.LG · 2026-05-04 · unreviewed · ref 159 · internal anchor
Hyper-DP3: Frequency-Aware Right-Sizing of 3D Diffusion Policies for Visuomotor Control cs.RO · 2026-05-02 · unreviewed · ref 12 · 3 links · internal anchor
From Video to Control: A Survey of Learning Manipulation Interfaces from Temporal Visual Data cs.RO · 2026-04-04 · unreviewed · ref 65 · internal anchor
RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies cs.RO · 2026-03-04 · unreviewed · ref 29 · internal anchor
SERNF: Sample-Efficient Real-World Dexterous Policy Fine-Tuning via Action-Chunked Critics and Normalizing Flows cs.RO · 2026-02-10 · unreviewed · ref 34 · internal anchor

What Matters in Learning from Offline Human Demonstrations for Robot Manipulation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer