hub Mixed citations

MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations

Ajay Mandlekar, Soroush Nasiriany, Bowen Wen, Iretiayo Akinola, Yashraj Narang, Linxi Fan · 2023 · cs.RO · arXiv 2310.17596

Mixed citation behavior. Most common role is background (55%).

27 Pith papers citing it

Background 55% of classified citations

open full Pith review browse 27 citing papers arXiv PDF

abstract

Imitation learning from a large set of human demonstrations has proved to be an effective paradigm for building capable robot agents. However, the demonstrations can be extremely costly and time-consuming to collect. We introduce MimicGen, a system for automatically synthesizing large-scale, rich datasets from only a small number of human demonstrations by adapting them to new contexts. We use MimicGen to generate over 50K demonstrations across 18 tasks with diverse scene configurations, object instances, and robot arms from just ~200 human demonstrations. We show that robot agents can be effectively trained on this generated dataset by imitation learning to achieve strong performance in long-horizon and high-precision tasks, such as multi-part assembly and coffee preparation, across broad initial state distributions. We further demonstrate that the effectiveness and utility of MimicGen data compare favorably to collecting additional human demonstrations, making it a powerful and economical approach towards scaling up robot learning. Datasets, simulation environments, videos, and more at https://mimicgen.github.io .

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 6 method 4 dataset 1

citation-polarity summary

background 6 use method 4 unclear 1

representative citing papers

DockAnywhere: Data-Efficient Visuomotor Policy Learning for Mobile Manipulation via Novel Demonstration Generation

cs.RO · 2026-04-16 · unverdicted · novelty 7.0

DockAnywhere lifts single demonstrations to diverse docking points via structure-preserving augmentation and point-cloud spatial editing to improve viewpoint generalization in visuomotor policies for mobile manipulation.

Referring-Aware Visuomotor Policy Learning for Closed-Loop Manipulation

cs.RO · 2026-04-07 · unverdicted · novelty 7.0

ReV is a referring-aware visuomotor policy using coupled diffusion heads for real-time trajectory replanning in robotic manipulation, trained solely via targeted perturbations to expert demonstrations and achieving higher success rates in simulated and real tasks.

EXPO: Stable Reinforcement Learning with Expressive Policies

cs.LG · 2025-07-10 · conditional · novelty 7.0

EXPO stabilizes online RL for expressive policies by training a base policy with imitation and using a lightweight Gaussian edit policy to select higher-value actions on the fly for sampling and TD backups.

Mechanisms of Misgeneralization in Physical Sequence Modeling

cs.LG · 2026-05-19 · unverdicted · novelty 6.0

Generative sequence models for physical tasks exhibit physical misgeneralization where local prediction errors propagate through physical measurements to distort aggregate distributions over quantities like distance or energy; a data deviation kernel explains and predicts the shifts and supports a内核

COBALT: Crowdsourcing Robot Learning via Cloud-Based Teleoperation with Smartphones

cs.RO · 2026-05-18 · conditional · novelty 6.0 · 2 refs

COBALT enables scalable crowdsourced teleoperation of robots using smartphones, supporting concurrent users with low latency and yielding a 7500+ demonstration dataset validated on imitation learning tasks.

DexHoldem: Playing Texas Hold'em with Dexterous Embodied System

cs.RO · 2026-05-18 · unverdicted · novelty 6.0

DexHoldem is a new benchmark providing 1,470 teleoperated demonstrations across 14 manipulation primitives, plus standardized tests for dexterous policy execution and agentic perception in a physical Texas Hold'em setting.

Good in Bad (GiB): Sifting Through End-user Demonstrations for Learning a Better Policy

cs.RO · 2026-05-02 · unverdicted · novelty 6.0 · 2 refs

GiB uses self-supervised latent features and Mahalanobis distance to filter erroneous subtasks from mixed-quality human demonstrations, improving robot policy learning in simulation and real-world tasks.

Lucid-XR: An Extended-Reality Data Engine for Robotic Manipulation

cs.RO · 2026-04-30 · unverdicted · novelty 6.0

Lucid-XR uses XR-headset physics simulation and physics-guided video generation to create synthetic data that trains robot policies transferring zero-shot to unseen real-world manipulation tasks.

Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising

cs.RO · 2026-04-29 · unverdicted · novelty 6.0 · 2 refs

X-WAM unifies robotic action execution and 4D world synthesis by adapting video diffusion priors with a lightweight depth branch and asynchronous noise sampling, achieving 79-91% success on robot benchmarks.

Unmasking the Illusion of Embodied Reasoning in Vision-Language-Action Models

cs.RO · 2026-04-20 · unverdicted · novelty 6.0

State-of-the-art vision-language-action models catastrophically fail dynamic embodied reasoning due to lexical-kinematic shortcuts, behavioral inertia, and semantic feature collapse caused by architectural bottlenecks, as shown by the new BeTTER benchmark with real-world validation.

A Mechanistic Analysis of Sim-and-Real Co-Training in Generative Robot Policies

cs.RO · 2026-04-15 · unverdicted · novelty 6.0

Sim-and-real co-training for robot policies is driven primarily by balanced cross-domain representation alignment and secondarily by domain-dependent action reweighting.

WARPED: Wrist-Aligned Rendering for Robot Policy Learning from Egocentric Human Demonstrations

cs.RO · 2026-04-12 · unverdicted · novelty 6.0

WARPED synthesizes realistic wrist-view observations from monocular egocentric human videos via foundation models, hand-object tracking, retargeting, and Gaussian Splatting to train visuomotor policies that match teleoperation success rates on five tabletop tasks with 5-8x less collection effort.

Generative Simulation for Policy Learning in Physical Human-Robot Interaction

cs.RO · 2026-04-09 · unverdicted · novelty 6.0

A text-to-simulation pipeline using LLMs and VLMs generates synthetic pHRI data to train vision-based imitation learning policies that achieve over 80% success in zero-shot sim-to-real transfer on real assistive tasks.

SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds

cs.RO · 2026-04-09 · unverdicted · novelty 6.0

SIM1 converts sparse real demonstrations into high-fidelity synthetic data through physics-aligned simulation, yielding policies that match real-data performance at a 1:15 ratio with 90% zero-shot success on deformable manipulation.

ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors

cs.RO · 2026-03-16 · conditional · novelty 6.0

ExpertGen generates high-success expert policies in simulation from imperfect priors by freezing a diffusion behavior model and optimizing its initial noise via RL, then distills them for real-robot deployment.

Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning

cs.AI · 2026-01-22 · conditional · novelty 6.0

Single-stage fine-tuning of a video model to generate actions as latent frames plus future states and values yields state-of-the-art robot policy performance on LIBERO, RoboCasa, and bimanual tasks.

IGen: Scalable Data Generation for Robot Learning from Open-World Images

cs.RO · 2025-12-01 · unverdicted · novelty 6.0

IGen generates realistic visuomotor training data including actions and temporally coherent visuals from unstructured open-world images via 3D reconstruction and VLM reasoning.

R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation

cs.RO · 2025-10-09 · unverdicted · novelty 6.0

R2RGen introduces a simulator-free three-stage pipeline that parses, augments, and post-processes real pointcloud observation-action pairs to improve spatial generalization in robotic manipulation policies.

GraspVLA: a Grasping Foundation Model Pre-trained on Billion-scale Synthetic Action Data

cs.RO · 2025-05-06 · unverdicted · novelty 6.0

GraspVLA shows that pretraining a grasping model on a billion synthetic action frames enables zero-shot open-vocabulary performance and sim-to-real transfer.

RoboMD: Uncovering Robot Vulnerabilities through Semantic Potential Fields

cs.RO · 2024-12-03 · unverdicted · novelty 6.0

A deep RL vulnerability-prediction policy trained in semantic embedding space finds up to 23% more unique robot manipulation failures than vision-language baselines and enables more efficient fine-tuning.

$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

cs.LG · 2024-10-31 · unverdicted · novelty 6.0

π₀ is a vision-language-action flow model trained on diverse multi-platform robot data that supports zero-shot task performance, language instruction following, and efficient fine-tuning for dexterous tasks.

RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots

cs.RO · 2024-06-04 · unverdicted · novelty 6.0

RoboCasa supplies a large-scale kitchen simulator, generative assets, 100 tasks, and automated data pipelines that produce a clear scaling trend in imitation learning for generalist robots.

EmbodiedClaw: Conversational Workflow Execution for Embodied AI Development

cs.RO · 2026-04-15 · unverdicted · novelty 5.0

EmbodiedClaw automates embodied AI development workflows through conversation, reducing manual effort and improving consistency and reproducibility.

RESample: A Robust Data Augmentation Framework via Exploratory Sampling for Robotic Manipulation

cs.RO · 2025-10-20 · unverdicted · novelty 5.0

RESample uses exploratory sampling guided by a lightweight Coverage Function to expand VLA training data coverage, yielding 12% performance gains on LIBERO and real-world tasks with 10-20% added samples.

citing papers explorer

Showing 27 of 27 citing papers.

DockAnywhere: Data-Efficient Visuomotor Policy Learning for Mobile Manipulation via Novel Demonstration Generation cs.RO · 2026-04-16 · unverdicted · none · ref 17 · internal anchor
DockAnywhere lifts single demonstrations to diverse docking points via structure-preserving augmentation and point-cloud spatial editing to improve viewpoint generalization in visuomotor policies for mobile manipulation.
Referring-Aware Visuomotor Policy Learning for Closed-Loop Manipulation cs.RO · 2026-04-07 · unverdicted · none · ref 22 · internal anchor
ReV is a referring-aware visuomotor policy using coupled diffusion heads for real-time trajectory replanning in robotic manipulation, trained solely via targeted perturbations to expert demonstrations and achieving higher success rates in simulated and real tasks.
EXPO: Stable Reinforcement Learning with Expressive Policies cs.LG · 2025-07-10 · conditional · none · ref 16 · internal anchor
EXPO stabilizes online RL for expressive policies by training a base policy with imitation and using a lightweight Gaussian edit policy to select higher-value actions on the fly for sampling and TD backups.
Mechanisms of Misgeneralization in Physical Sequence Modeling cs.LG · 2026-05-19 · unverdicted · none · ref 152 · internal anchor
Generative sequence models for physical tasks exhibit physical misgeneralization where local prediction errors propagate through physical measurements to distort aggregate distributions over quantities like distance or energy; a data deviation kernel explains and predicts the shifts and supports a内核
COBALT: Crowdsourcing Robot Learning via Cloud-Based Teleoperation with Smartphones cs.RO · 2026-05-18 · conditional · none · ref 17 · 2 links · internal anchor
COBALT enables scalable crowdsourced teleoperation of robots using smartphones, supporting concurrent users with low latency and yielding a 7500+ demonstration dataset validated on imitation learning tasks.
DexHoldem: Playing Texas Hold'em with Dexterous Embodied System cs.RO · 2026-05-18 · unverdicted · none · ref 36 · internal anchor
DexHoldem is a new benchmark providing 1,470 teleoperated demonstrations across 14 manipulation primitives, plus standardized tests for dexterous policy execution and agentic perception in a physical Texas Hold'em setting.
Good in Bad (GiB): Sifting Through End-user Demonstrations for Learning a Better Policy cs.RO · 2026-05-02 · unverdicted · none · ref 8 · 2 links · internal anchor
GiB uses self-supervised latent features and Mahalanobis distance to filter erroneous subtasks from mixed-quality human demonstrations, improving robot policy learning in simulation and real-world tasks.
Lucid-XR: An Extended-Reality Data Engine for Robotic Manipulation cs.RO · 2026-04-30 · unverdicted · none · ref 14 · internal anchor
Lucid-XR uses XR-headset physics simulation and physics-guided video generation to create synthetic data that trains robot policies transferring zero-shot to unseen real-world manipulation tasks.
Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising cs.RO · 2026-04-29 · unverdicted · none · ref 75 · 2 links · internal anchor
X-WAM unifies robotic action execution and 4D world synthesis by adapting video diffusion priors with a lightweight depth branch and asynchronous noise sampling, achieving 79-91% success on robot benchmarks.
Unmasking the Illusion of Embodied Reasoning in Vision-Language-Action Models cs.RO · 2026-04-20 · unverdicted · none · ref 44 · internal anchor
State-of-the-art vision-language-action models catastrophically fail dynamic embodied reasoning due to lexical-kinematic shortcuts, behavioral inertia, and semantic feature collapse caused by architectural bottlenecks, as shown by the new BeTTER benchmark with real-world validation.
A Mechanistic Analysis of Sim-and-Real Co-Training in Generative Robot Policies cs.RO · 2026-04-15 · unverdicted · none · ref 19 · internal anchor
Sim-and-real co-training for robot policies is driven primarily by balanced cross-domain representation alignment and secondarily by domain-dependent action reweighting.
WARPED: Wrist-Aligned Rendering for Robot Policy Learning from Egocentric Human Demonstrations cs.RO · 2026-04-12 · unverdicted · none · ref 74 · internal anchor
WARPED synthesizes realistic wrist-view observations from monocular egocentric human videos via foundation models, hand-object tracking, retargeting, and Gaussian Splatting to train visuomotor policies that match teleoperation success rates on five tabletop tasks with 5-8x less collection effort.
Generative Simulation for Policy Learning in Physical Human-Robot Interaction cs.RO · 2026-04-09 · unverdicted · none · ref 22 · internal anchor
A text-to-simulation pipeline using LLMs and VLMs generates synthetic pHRI data to train vision-based imitation learning policies that achieve over 80% success in zero-shot sim-to-real transfer on real assistive tasks.
SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds cs.RO · 2026-04-09 · unverdicted · none · ref 40 · internal anchor
SIM1 converts sparse real demonstrations into high-fidelity synthetic data through physics-aligned simulation, yielding policies that match real-data performance at a 1:15 ratio with 90% zero-shot success on deformable manipulation.
ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors cs.RO · 2026-03-16 · conditional · none · ref 37 · internal anchor
ExpertGen generates high-success expert policies in simulation from imperfect priors by freezing a diffusion behavior model and optimizing its initial noise via RL, then distills them for real-robot deployment.
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning cs.AI · 2026-01-22 · conditional · none · ref 24 · internal anchor
Single-stage fine-tuning of a video model to generate actions as latent frames plus future states and values yields state-of-the-art robot policy performance on LIBERO, RoboCasa, and bimanual tasks.
IGen: Scalable Data Generation for Robot Learning from Open-World Images cs.RO · 2025-12-01 · unverdicted · none · ref 45 · internal anchor
IGen generates realistic visuomotor training data including actions and temporally coherent visuals from unstructured open-world images via 3D reconstruction and VLM reasoning.
R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation cs.RO · 2025-10-09 · unverdicted · none · ref 20 · internal anchor
R2RGen introduces a simulator-free three-stage pipeline that parses, augments, and post-processes real pointcloud observation-action pairs to improve spatial generalization in robotic manipulation policies.
GraspVLA: a Grasping Foundation Model Pre-trained on Billion-scale Synthetic Action Data cs.RO · 2025-05-06 · unverdicted · none · ref 41 · internal anchor
GraspVLA shows that pretraining a grasping model on a billion synthetic action frames enables zero-shot open-vocabulary performance and sim-to-real transfer.
RoboMD: Uncovering Robot Vulnerabilities through Semantic Potential Fields cs.RO · 2024-12-03 · unverdicted · none · ref 50 · internal anchor
A deep RL vulnerability-prediction policy trained in semantic embedding space finds up to 23% more unique robot manipulation failures than vision-language baselines and enables more efficient fine-tuning.
$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control cs.LG · 2024-10-31 · unverdicted · none · ref 34 · internal anchor
π₀ is a vision-language-action flow model trained on diverse multi-platform robot data that supports zero-shot task performance, language instruction following, and efficient fine-tuning for dexterous tasks.
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots cs.RO · 2024-06-04 · unverdicted · none · ref 35 · internal anchor
RoboCasa supplies a large-scale kitchen simulator, generative assets, 100 tasks, and automated data pipelines that produce a clear scaling trend in imitation learning for generalist robots.
EmbodiedClaw: Conversational Workflow Execution for Embodied AI Development cs.RO · 2026-04-15 · unverdicted · none · ref 25 · internal anchor
EmbodiedClaw automates embodied AI development workflows through conversation, reducing manual effort and improving consistency and reproducibility.
RESample: A Robust Data Augmentation Framework via Exploratory Sampling for Robotic Manipulation cs.RO · 2025-10-20 · unverdicted · none · ref 10 · internal anchor
RESample uses exploratory sampling guided by a lightweight Coverage Function to expand VLA training data coverage, yielding 12% performance gains on LIBERO and real-world tasks with 10-20% added samples.
A Careful Examination of Large Behavior Models for Multitask Dexterous Manipulation cs.RO · 2025-07-07 · accept · none · ref 47 · internal anchor
Multi-task pretraining of diffusion policies on diverse robot data produces more successful, robust, and data-efficient policies for dexterous manipulation than single-task baselines, with performance scaling with pretraining size and diversity.
AutoResearch AI: Towards AI-Powered Research Automation for Scientific Discovery cs.AI · 2026-05-22 · unverdicted · none · ref 141 · internal anchor
A survey organizing AI-powered research automation into five workflow stages, defining AutoResearch and Vibe Research, and proposing five evaluation dimensions while noting domain-conditioned limits on autonomy.
World Action Models: The Next Frontier in Embodied AI cs.RO · 2026-05-12 · unverdicted · none · ref 163 · internal anchor
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.

MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer