hub

//arxiv.org/abs/2302.12422

Chen Wang, Linxi Fan, Jiankai Sun, Ruohan Zhang, Li Fei-Fei, Danfei Xu, Yuke Zhu, Anima Anandkumar · 2023 · arXiv 2302.12422

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning

cs.AI · 2023-06-05 · conditional · novelty 8.0

LIBERO is a new benchmark for lifelong robot learning that evaluates transfer of declarative, procedural, and mixed knowledge across 130 manipulation tasks with provided demonstration data.

Referring-Aware Visuomotor Policy Learning for Closed-Loop Manipulation

cs.RO · 2026-04-07 · unverdicted · novelty 7.0

ReV is a referring-aware visuomotor policy using coupled diffusion heads for real-time trajectory replanning in robotic manipulation, trained solely via targeted perturbations to expert demonstrations and achieving higher success rates in simulated and real tasks.

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

cs.RO · 2023-07-12 · unverdicted · novelty 7.0

VoxPoser uses LLMs to compose 3D value maps via VLM interaction for model-based synthesis of robust robot trajectories on open-set language-specified manipulation tasks.

SID: Sliding into Distribution for Robust Few-Demonstration Manipulation

cs.RO · 2026-05-13 · unverdicted · novelty 6.0

SID achieves approximately 90% success on six real-world manipulation tasks with only two demonstrations under out-of-distribution initializations, with less than 10% performance drop under distractors and disturbances.

Bridging the Embodiment Gap: Disentangled Cross-Embodiment Video Editing

cs.RO · 2026-05-05 · unverdicted · novelty 6.0

A dual-contrastive disentanglement method factorizes videos into independent task and embodiment latents, then uses a parameter-efficient adapter on a frozen video diffusion model to synthesize robot executions from single human demonstrations without paired data.

BifrostUMI: Bridging Robot-Free Demonstrations and Humanoid Whole-Body Manipulation

cs.RO · 2026-05-05 · unverdicted · novelty 6.0

BifrostUMI enables robot-free human demonstration capture via VR and wrist cameras to train visuomotor policies that predict keypoint trajectories for transfer to humanoid whole-body control through retargeting.

GazeVLA: Learning Human Intention for Robotic Manipulation

cs.RO · 2026-04-24 · unverdicted · novelty 6.0

GazeVLA pretrains on large human egocentric datasets to capture gaze-based intention, then finetunes on limited robot data with chain-of-thought reasoning to achieve better robotic manipulation performance than baselines.

Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets

cs.RO · 2025-04-03 · unverdicted · novelty 6.0

Unified World Models couple video and action diffusion inside one transformer with independent timesteps, enabling pretraining on heterogeneous robot datasets that include action-free video and producing more generalizable policies than imitation learning alone.

Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

cs.RO · 2024-01-04 · conditional · novelty 6.0

A low-cost whole-body teleoperation system enables effective imitation learning for complex bimanual mobile manipulation by co-training on mobile and static demonstration datasets.

From Video to Control: A Survey of Learning Manipulation Interfaces from Temporal Visual Data

cs.RO · 2026-04-04 · accept · novelty 5.0

A survey introduces an interface-centric taxonomy for video-to-control methods in robotic manipulation and identifies the robotics integration layer as the central open challenge.

citing papers explorer

Showing 10 of 10 citing papers.

LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning cs.AI · 2023-06-05 · conditional · none · ref 68
LIBERO is a new benchmark for lifelong robot learning that evaluates transfer of declarative, procedural, and mixed knowledge across 130 manipulation tasks with provided demonstration data.
Referring-Aware Visuomotor Policy Learning for Closed-Loop Manipulation cs.RO · 2026-04-07 · unverdicted · none · ref 40
ReV is a referring-aware visuomotor policy using coupled diffusion heads for real-time trajectory replanning in robotic manipulation, trained solely via targeted perturbations to expert demonstrations and achieving higher success rates in simulated and real tasks.
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models cs.RO · 2023-07-12 · unverdicted · none · ref 106
VoxPoser uses LLMs to compose 3D value maps via VLM interaction for model-based synthesis of robust robot trajectories on open-set language-specified manipulation tasks.
SID: Sliding into Distribution for Robust Few-Demonstration Manipulation cs.RO · 2026-05-13 · unverdicted · none · ref 45
SID achieves approximately 90% success on six real-world manipulation tasks with only two demonstrations under out-of-distribution initializations, with less than 10% performance drop under distractors and disturbances.
Bridging the Embodiment Gap: Disentangled Cross-Embodiment Video Editing cs.RO · 2026-05-05 · unverdicted · none · ref 21
A dual-contrastive disentanglement method factorizes videos into independent task and embodiment latents, then uses a parameter-efficient adapter on a frozen video diffusion model to synthesize robot executions from single human demonstrations without paired data.
BifrostUMI: Bridging Robot-Free Demonstrations and Humanoid Whole-Body Manipulation cs.RO · 2026-05-05 · unverdicted · none · ref 11
BifrostUMI enables robot-free human demonstration capture via VR and wrist cameras to train visuomotor policies that predict keypoint trajectories for transfer to humanoid whole-body control through retargeting.
GazeVLA: Learning Human Intention for Robotic Manipulation cs.RO · 2026-04-24 · unverdicted · none · ref 63
GazeVLA pretrains on large human egocentric datasets to capture gaze-based intention, then finetunes on limited robot data with chain-of-thought reasoning to achieve better robotic manipulation performance than baselines.
Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets cs.RO · 2025-04-03 · unverdicted · none · ref 43
Unified World Models couple video and action diffusion inside one transformer with independent timesteps, enabling pretraining on heterogeneous robot datasets that include action-free video and producing more generalizable policies than imitation learning alone.
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation cs.RO · 2024-01-04 · conditional · none · ref 89
A low-cost whole-body teleoperation system enables effective imitation learning for complex bimanual mobile manipulation by co-training on mobile and static demonstration datasets.
From Video to Control: A Survey of Learning Manipulation Interfaces from Temporal Visual Data cs.RO · 2026-04-04 · accept · none · ref 91
A survey introduces an interface-centric taxonomy for video-to-control methods in robotic manipulation and identifies the robotics integration layer as the central open challenge.

//arxiv.org/abs/2302.12422

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer