Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

Chelsea Finn; Sergey Levine; Tony Z. Zhao; Vikash Kumar

arxiv: 2304.13705 · v1 · submitted 2023-04-23 · 💻 cs.RO · cs.LG

Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

Tony Z. Zhao , Vikash Kumar , Sergey Levine , Chelsea Finn This is my paper

Pith reviewed 2026-05-11 04:11 UTC · model grok-4.3

classification 💻 cs.RO cs.LG

keywords imitation learningbimanual manipulationlow-cost hardwareaction chunkingtransformersfine manipulationrobot learning

0 comments

The pith

Action Chunking with Transformers lets low-cost robots learn precise bimanual tasks from ten minutes of demonstrations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that imitation learning can enable low-cost and imprecise robots to perform fine manipulation tasks that normally require expensive hardware, accurate sensors, or careful calibration. It introduces Action Chunking with Transformers (ACT) as a method to learn a generative model over sequences of actions, which helps prevent errors from compounding during execution and handles non-stationary human demonstrations. Using a custom teleoperation interface to collect data, the approach trains a bimanual robot to complete six real-world tasks at 80-90% success rates. This includes opening a translucent condiment cup and slotting a battery, all from roughly ten minutes of demonstrations.

Core claim

The central claim is that a low-cost bimanual robot system performing end-to-end imitation learning with the ACT algorithm, which learns generative models over action sequences from visual observations, can successfully execute difficult fine-grained tasks such as opening a translucent condiment cup and slotting a battery, reaching 80-90% success rates in the real world after training on only ten minutes of demonstrations collected via a custom teleoperation interface.

What carries the argument

Action Chunking with Transformers (ACT), a transformer model that predicts chunks of future actions to enable stable closed-loop control and reduce compounding errors in high-precision imitation learning.

If this is right

Precise bimanual manipulation becomes feasible on inexpensive hardware without specialized force sensors or calibration procedures.
Imitation learning policies can succeed on long-horizon tasks despite non-stationary human demonstrations when action sequences are modeled generatively.
Visual feedback alone suffices for closed-loop control on tasks requiring careful contact forces.
Data collection effort drops to short sessions of roughly ten minutes while still yielding high success rates across multiple tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The chunking approach may extend to other robotic control problems that involve predicting extended action sequences.
Lowering hardware costs could broaden access to fine manipulation capabilities for non-industrial settings.
Combining ACT with additional sensing modalities might further improve reliability on even harder variants of the tasks.

Load-bearing premise

The custom teleoperation interface produces high-quality, consistent demonstrations that capture the necessary precision and force coordination without introducing human-induced biases or noise that the learning algorithm cannot overcome.

What would settle it

Retraining and testing the same tasks with demonstrations collected from a lower-quality or noisier teleoperation interface, then measuring whether success rates fall below 80%, would directly test whether the claim holds.

read the original abstract

Fine manipulation tasks, such as threading cable ties or slotting a battery, are notoriously difficult for robots because they require precision, careful coordination of contact forces, and closed-loop visual feedback. Performing these tasks typically requires high-end robots, accurate sensors, or careful calibration, which can be expensive and difficult to set up. Can learning enable low-cost and imprecise hardware to perform these fine manipulation tasks? We present a low-cost system that performs end-to-end imitation learning directly from real demonstrations, collected with a custom teleoperation interface. Imitation learning, however, presents its own challenges, particularly in high-precision domains: errors in the policy can compound over time, and human demonstrations can be non-stationary. To address these challenges, we develop a simple yet novel algorithm, Action Chunking with Transformers (ACT), which learns a generative model over action sequences. ACT allows the robot to learn 6 difficult tasks in the real world, such as opening a translucent condiment cup and slotting a battery with 80-90% success, with only 10 minutes worth of demonstrations. Project website: https://tonyzhaozh.github.io/aloha/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ACT shows real-world fine bimanual manipulation on cheap hardware with short demos, but the teleop interface's role in demo quality needs clearer separation from the algorithm.

read the letter

The key thing here is that the paper gets low-cost imprecise arms to handle precise bimanual tasks like battery slotting and cup opening at 80-90% success rates using only 10 minutes of demonstrations and a new method called ACT. This is a practical demonstration that learning can work on accessible hardware without expensive calibration or sensors. ACT models generative distributions over chunks of actions rather than single steps, which directly targets compounding errors and the non-stationary quality of human data in imitation learning. They collect the data through a custom teleoperation interface on a bimanual setup and run the policies end-to-end with visual feedback in the real world across six tasks. The results are concrete and show the approach scaling beyond simulation. The work does well by keeping the focus on physical experiments and reporting success on varied tasks that require contact forces and closed-loop control. The method itself is straightforward to describe and implement, which adds to its usefulness. One soft spot is the custom teleoperation interface. The central results depend on demonstrations that encode the needed precision and coordination, and while the paper states that ACT mitigates non-stationarity, there are no detailed ablations or metrics on demo variance or consistency to isolate how much the interface versus the algorithm drives performance. This does not break the claim, but it leaves some room for the data collection to be carrying more weight than acknowledged. The paper is aimed at people doing imitation learning for manipulation who care about real-world transfer to low-cost platforms. Readers looking for practical recipes and empirical benchmarks will get value from the system description and task results. It has enough new technical content and grounded experiments to deserve a serious referee. I would recommend sending it for peer review, with the main request being additional controls that separate the algorithm's contribution from the quality of the collected demonstrations.

Referee Report

2 major / 1 minor

Summary. The paper claims that a low-cost bimanual robot equipped with a custom teleoperation interface for collecting real-world demonstrations, combined with the novel Action Chunking with Transformers (ACT) algorithm, enables end-to-end imitation learning of fine-grained manipulation tasks. ACT models generative distributions over action chunks to mitigate compounding errors and non-stationary demonstrations, allowing 80-90% success rates on six contact-rich tasks (e.g., opening a translucent condiment cup, slotting a battery) using only 10 minutes of data on imprecise hardware.

Significance. If the empirical results hold after verification of demonstration quality and controls, the work would demonstrate that imitation learning with chunked generative policies can achieve high-precision bimanual performance on inexpensive platforms without specialized sensors or calibration. This has clear implications for accessibility in robotics, providing concrete real-world evidence on tasks that typically demand high-end setups.

major comments (2)

[Abstract] Abstract: The headline result that ACT enables 80-90% success with 10 min of demonstrations rests on the unverified assumption that the custom teleoperation interface supplies high-quality, low-bias demonstrations encoding precise contact forces and closed-loop coordination. No independent metrics (trajectory variance, force profiles, inter-demonstrator consistency) or ablations separating interface quality from policy performance are reported, leaving open the possibility that the interface itself supplies the critical precision rather than the learning algorithm.
[Experiments] Experiments section (inferred from reported success rates): Success rates on the six tasks are presented without baselines, ablations, or statistical tests, as highlighted in the review. This makes it impossible to assess whether the central claim—that ACT on low-cost hardware is responsible for the performance—holds or whether post-hoc tuning or task selection inflates the numbers.

minor comments (1)

[Abstract] The project website link is provided but no supplementary video or code repository is referenced in the abstract; adding these would aid reproducibility.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the thoughtful review and the opportunity to clarify our contributions. We address the two major comments below, committing to revisions where they strengthen the manuscript without misrepresenting our results.

read point-by-point responses

Referee: [Abstract] Abstract: The headline result that ACT enables 80-90% success with 10 min of demonstrations rests on the unverified assumption that the custom teleoperation interface supplies high-quality, low-bias demonstrations encoding precise contact forces and closed-loop coordination. No independent metrics (trajectory variance, force profiles, inter-demonstrator consistency) or ablations separating interface quality from policy performance are reported, leaving open the possibility that the interface itself supplies the critical precision rather than the learning algorithm.

Authors: The teleoperation interface is an integral component of the proposed low-cost system, as it enables collection of usable demonstrations on imprecise hardware without requiring high-end sensors. We acknowledge that the initial submission lacks explicit quantitative metrics on demonstration quality. We will add analysis of trajectory variance and inter-demonstrator consistency in the revised manuscript. Force profiles cannot be reported because the hardware lacks force sensors; the system relies on visual feedback instead. Full ablations isolating the interface from ACT would require new hardware setups, which we will discuss as a limitation rather than perform within this revision. revision: partial
Referee: [Experiments] Experiments section (inferred from reported success rates): Success rates on the six tasks are presented without baselines, ablations, or statistical tests, as highlighted in the review. This makes it impossible to assess whether the central claim—that ACT on low-cost hardware is responsible for the performance—holds or whether post-hoc tuning or task selection inflates the numbers.

Authors: We agree that the experiments section requires stronger validation. The manuscript already includes comparisons to standard behavior cloning, but we will expand it with additional baselines (e.g., non-chunked policies), architecture ablations, and statistical analysis including the number of evaluation trials, success-rate confidence intervals, and significance tests. These additions will clarify that the reported performance stems from the combination of the interface and ACT rather than task selection or tuning. revision: yes

standing simulated objections not resolved

Direct force profiles cannot be provided because the low-cost hardware does not include force sensors.

Circularity Check

0 steps flagged

No circularity: empirical results from hardware experiments are independent of any fitted inputs or self-referential definitions.

full rationale

The paper introduces the ACT algorithm as a novel generative model over action chunks to mitigate compounding errors in imitation learning, then validates it through real-world bimanual tasks on low-cost hardware using custom teleoperation demonstrations. Success rates (80-90%) are measured outcomes from physical rollouts, not quantities derived by construction from the training data or prior self-citations. No equations, uniqueness theorems, or ansatzes are presented that reduce the central claims to tautological inputs; the derivation chain consists of standard imitation learning setup plus a transformer-based policy whose performance is externally falsifiable via hardware metrics.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that teleoperated demonstrations are sufficiently high-quality and that chunked action prediction mitigates compounding errors in contact-rich tasks; no new physical entities are postulated.

free parameters (1)

ACT model hyperparameters
Standard neural network training parameters fitted during learning; not enumerated in abstract.

axioms (1)

domain assumption Imitation learning from a small number of human demonstrations can generalize to new task instances on physical hardware
Invoked to explain the reported 80-90% success rates across tasks.

invented entities (1)

Action Chunking with Transformers (ACT) no independent evidence
purpose: Generative model over action sequences to address error compounding and non-stationarity in imitation learning
New method introduced by the paper; no independent evidence outside the reported experiments.

pith-pipeline@v0.9.0 · 5505 in / 1358 out tokens · 43208 ms · 2026-05-11T04:11:19.220360+00:00 · methodology

discussion (0)

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Open-H-Embodiment: A Large-Scale Dataset for Enabling Foundation Models in Medical Robotics
cs.RO 2026-04 conditional novelty 8.0

Open-H-Embodiment is the largest open multi-embodiment medical robotics dataset, used to train GR00T-H, the first open vision-language-action model that achieves end-to-end suturing completion where prior models fail.
Point Tracking Improves World Action Models
cs.RO 2026-05 unverdicted novelty 7.0

JOPAT jointly models pixels, point tracks, and actions in a diffusion transformer and reports gains over pixel-only baselines on long-horizon robot tasks with occlusion and off-screen motion.
Understanding Multimodal Failure in Action-Chunking Behavioral Cloning
cs.LG 2026-05 unverdicted novelty 7.0

The paper identifies distinct failure mechanisms: excessive posterior-prior regularization erases mode information in latent policies, while smooth base-to-action maps limit mode coverage in generative policies.
RoboFlow4D: A Lightweight Flow World Model Toward Real-Time Flow-Guided Robotic Manipulation
cs.RO 2026-05 unverdicted novelty 7.0

RoboFlow4D is an end-to-end lightweight flow world model that predicts multi-frame 3D flows from visual observations and textual instructions to provide explicit planning for real-time robotic manipulation.
DSSP: Diffusion State Space Policy with Full-History Encoding
cs.RO 2026-05 conditional novelty 7.0

DSSP is a history-conditioned diffusion state space policy that uses SSMs to encode full observation streams with an auxiliary dynamics objective and hierarchical fusion, achieving SOTA results with reduced model size...
Realtime-VLA FLASH: Speculative Inference Framework for Diffusion-based VLAs
cs.RO 2026-05 unverdicted novelty 7.0

A new speculative inference system speeds up diffusion VLAs to 19.1 ms average latency (3.04x faster) on LIBERO by replacing most full 58 ms inferences with 7.8 ms draft rounds while preserving task performance.
Morphologically Equivariant Flow Matching for Bimanual Mobile Manipulation
cs.RO 2026-05 conditional novelty 7.0

A morphologically equivariant flow matching policy for bimanual robots enforces reflective symmetry to improve sample efficiency and enable zero-shot generalization to mirrored task configurations.
Beyond World-Frame Action Heads: Motion-Centric Action Frames for Vision-Language-Action Models
cs.AI 2026-05 unverdicted novelty 7.0

MCF-Proto adds a motion-centric local action frame and prototype parameterization to VLA models, inducing emergent geometric structure and improved robustness from standard demonstrations alone.
Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models
cs.RO 2026-05 unverdicted novelty 7.0

Pace-and-Path Correction decomposes a quadratic cost minimization into orthogonal pace and path channels to correct chunked actions in VLA models, raising success rates by up to 28.8% in dynamic settings.
ALAM: Algebraically Consistent Latent Action Model for Vision-Language-Action Models
cs.RO 2026-05 unverdicted novelty 7.0

ALAM creates algebraically consistent latent action transitions from videos to act as auxiliary generative targets, raising robot policy success rates from 47.9% to 85.0% on MetaWorld MT50 and 94.1% to 98.1% on LIBERO.
When to Re-Commit: Temporal Abstraction Discovery for Long-Horizon Vision-Language Reasoning
cs.AI 2026-05 conditional novelty 7.0

A vision-language policy learns state-conditioned commitment depth to Pareto-dominate fixed-depth baselines on long-horizon puzzles, achieving up to 12.5 pp higher solve rate with 25% fewer actions.
When to Re-Commit: Temporal Abstraction Discovery for Long-Horizon Vision-Language Reasoning
cs.AI 2026-05 conditional novelty 7.0

State-conditioned commitment depth in a vision-language policy Pareto-dominates fixed-depth baselines on Sliding Puzzle and Sokoban, raising solve rates by up to 12.5 points while using 25% fewer actions and beating l...
One Token Per Frame: Reconsidering Visual Bandwidth in World Models for VLA Policy
cs.CV 2026-05 conditional novelty 7.0

Reducing visual input to one token per frame in VLA world models maintains or improves long-horizon performance on MetaWorld, LIBERO, and real-robot tasks.
PhySPRING: Structure-Preserving Reduction of Physics-Informed Twins via GNN
cs.RO 2026-05 unverdicted novelty 7.0

PhySPRING uses differentiable GNNs to learn hierarchical coarsened spring-mass topologies and parameters from observations, delivering up to 2.3x speedup on PhysTwin benchmarks and comparable robot policy success rate...
BrickCraft: Visuomotor Skill Composition with Situated Manual Guidance for Long-Horizon Interlocking Brick Assembly
cs.RO 2026-05 unverdicted novelty 7.0

BrickCraft composes reusable visuomotor skills via relative anchoring to partial structures and situated visual manuals to achieve long-horizon interlocking brick assembly from limited demonstrations with generalizati...
OA-WAM: Object-Addressable World Action Model for Robust Robot Manipulation
cs.RO 2026-05 unverdicted novelty 7.0

OA-WAM uses persistent address vectors and dynamic content vectors in object slots to enable addressable world-action prediction, improving robustness on manipulation benchmarks under scene changes.
Shared Autonomy Assisted by Impedance-Driven Anisotropic Guidance Field
cs.RO 2026-05 unverdicted novelty 7.0

IAGF-SA adds a physically-grounded channel to shared autonomy by modulating robot impedance to convey intent, improving task performance, agreement, and user experience in three scenarios per user studies.
OmniRobotHome: A Multi-Camera Platform for Real-Time Multiadic Human-Robot Interaction
cs.RO 2026-04 unverdicted novelty 7.0

A 48-camera residential platform delivers real-time occlusion-robust 3D perception and coordinated actuation for multi-human multi-robot interaction in a shared home workspace.
Atomic-Probe Governance for Skill Updates in Compositional Robot Policies
cs.RO 2026-04 unverdicted novelty 7.0

A cross-version swap protocol reveals dominant skills that swing composition success by up to 50 percentage points, and an atomic probe with selective revalidation governs updates at lower cost than always re-testing ...
3D Generation for Embodied AI and Robotic Simulation: A Survey
cs.RO 2026-04 accept novelty 7.0

3D generation for embodied AI is shifting from visual realism toward interaction readiness, organized into data generation, simulation environments, and sim-to-real bridging roles.
DiscreteRTC: Discrete Diffusion Policies are Natural Asynchronous Executors
cs.RO 2026-04 unverdicted novelty 7.0

Discrete diffusion policies support native asynchronous execution via unmasking for real-time chunking, delivering higher success rates and 0.7x inference cost versus flow-matching RTC on dynamic robotics benchmarks a...
Characterizing Vision-Language-Action Models across XPUs: Constraints and Acceleration for On-Robot Deployment
cs.RO 2026-04 unverdicted novelty 7.0

VLA models exhibit a compute-bound VLM phase followed by a memory-bound action phase on edge hardware; DP-Cache and V-AEFusion reduce redundancy and enable pipeline parallelism for up to 6x speedup on NPUs with margin...
VistaBot: View-Robust Robot Manipulation via Spatiotemporal-Aware View Synthesis
cs.RO 2026-04 unverdicted novelty 7.0

VistaBot integrates 4D geometry estimation and spatiotemporal view synthesis into action policies to improve cross-view generalization by 2.6-2.8x on a new VGS metric in simulation and real tasks.
FingerEye: Learning Dexterous Manipulation with Continuous Vision-Tactile Sensing
cs.RO 2026-04 unverdicted novelty 7.0

FingerEye delivers continuous vision-tactile sensing via binocular RGB cameras and marker-tracked compliant ring deformation, supporting imitation learning policies that generalize across object variations for tasks l...
BiCoord: A Bimanual Manipulation Benchmark towards Long-Horizon Spatial-Temporal Coordination
cs.RO 2026-04 conditional novelty 7.0

BiCoord is a new benchmark for long-horizon tightly coordinated bimanual manipulation that includes quantitative metrics and shows existing policies like DP, RDT, Pi0 and OpenVLA-OFT struggle on such tasks.
Referring-Aware Visuomotor Policy Learning for Closed-Loop Manipulation
cs.RO 2026-04 unverdicted novelty 7.0

ReV is a referring-aware visuomotor policy using coupled diffusion heads for real-time trajectory replanning in robotic manipulation, trained solely via targeted perturbations to expert demonstrations and achieving hi...
Generative Control as Optimization: Time Unconditional Flow Matching for Adaptive and Robust Robotic Control
cs.RO 2026-03 conditional novelty 7.0

GeCO replaces time-dependent flow matching with time-unconditional optimization, enabling adaptive inference and intrinsic OOD detection for robotic imitation learning.
You've Got a Golden Ticket: Improving Generative Robot Policies With A Single Noise Vector
cs.RO 2026-03 conditional novelty 7.0

Optimizing a single constant initial noise vector for frozen generative robot policies improves success rates on 38 of 43 tasks by up to 58% relative improvement.
QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models
cs.LG 2026-02 unverdicted novelty 7.0

QuantVLA is the first post-training quantization framework for VLA models that quantizes the diffusion transformer action head and reports higher task success rates than full-precision baselines with roughly 70% memor...
Learning Physics from Pretrained Video Models: A Multimodal Continuous and Sequential World Interaction Models for Robotic Manipulation
cs.RO 2026-02 unverdicted novelty 7.0

PhysGen uses video models to learn physics for robots, outperforming baselines by up to 13.8% on Libero and matching specialized models in real-world tasks.
ST-BiBench: Benchmarking Multi-Stream Multimodal Coordination in Bimanual Embodied Tasks for MLLMs
cs.RO 2026-02 unverdicted novelty 7.0

ST-BiBench reveals a coordination paradox in which MLLMs show strong high-level strategic reasoning yet fail at fine-grained 16-dimensional bimanual action synthesis and multi-stream fusion.
TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch Guidance
cs.RO 2026-01 unverdicted novelty 7.0

TouchGuide improves contact-rich robot manipulation by steering diffusion or flow-matching visuomotor policies with tactile feasibility scores from a contrastively trained Contact Physical Model.
RoboCOIN: An Open-Sourced Bimanual Robotic Data Collection for Integrated Manipulation
cs.RO 2025-11 accept novelty 7.0

RoboCOIN is a large multi-embodiment bimanual manipulation dataset with hierarchical annotations and an open processing pipeline that improves model performance across robotic platforms.
Steering Your Diffusion Policy with Latent Space Reinforcement Learning
cs.RO 2025-06 unverdicted novelty 7.0

DSRL steers pretrained diffusion policies for robotics by applying RL to their latent noise inputs, achieving sample-efficient real-world adaptation with only black-box access.
Rodrigues Network for Learning Robot Actions
cs.RO 2025-06 unverdicted novelty 7.0

Proposes Rodrigues Network using a learnable Neural Rodrigues Operator to add kinematic inductive biases for improved robot action learning and prediction.
Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models
cs.RO 2023-10 conditional novelty 7.0

SuSIE uses a finetuned InstructPix2Pix diffusion model to propose subgoal images that guide a low-level goal-conditioned policy, achieving SOTA zero-shot performance on CALVIN and real-world manipulation.
TacO: Benchmarking Tactile Sensors for Object Manipulation
cs.RO 2026-05 unverdicted novelty 6.0

The paper provides a task-driven benchmark comparing visual, acoustic, magnetic, and resistive tactile sensors on three manipulation tasks and concludes that sensor utility depends on modality, material friction, and ...
COBALT: Crowdsourcing Robot Learning via Cloud-Based Teleoperation with Smartphones
cs.RO 2026-05 conditional novelty 6.0

COBALT enables scalable crowdsourced teleoperation of robots using smartphones, supporting concurrent users with low latency and yielding a 7500+ demonstration dataset validated on imitation learning tasks.
COBALT: Crowdsourcing Robot Learning via Cloud-Based Teleoperation with Smartphones
cs.RO 2026-05 unverdicted novelty 6.0

COBALT provides scalable cloud infrastructure for crowdsourced robot teleoperation via smartphones, supporting concurrent users with low latency and enabling collection of a 7500+ demonstration dataset validated throu...
DexHoldem: Playing Texas Hold'em with Dexterous Embodied System
cs.RO 2026-05 unverdicted novelty 6.0

DexHoldem is a new benchmark providing 1,470 teleoperated demonstrations across 14 manipulation primitives, plus standardized tests for dexterous policy execution and agentic perception in a physical Texas Hold'em setting.
HCLM: A Hierarchical Framework for Cooperative Loco-Manipulation with Dual Quadrupeds
cs.RO 2026-05 unverdicted novelty 6.0

HCLM presents a hierarchical architecture that uses an SE(3)-invariant diffusion policy for coordination and a hybrid whole-body controller with MPC and admittance control for safe closed-chain loco-manipulation on du...
DexJoCo: A Benchmark and Toolkit for Task-Oriented Dexterous Manipulation on MuJoCo
cs.RO 2026-05 conditional novelty 6.0

DexJoCo is a benchmark and toolkit with 11 functionally grounded tasks, 1.1K trajectories, and empirical benchmarks for task-oriented dexterous manipulation on MuJoCo.
Offline Semantic Guidance for Efficient Vision-Language-Action Policy Distillation
cs.CV 2026-05 conditional novelty 6.0

VLA-AD distills 7B VLA teachers into 158M students using offline VLM semantic guidance on task phases and directions, matching teacher performance on LIBERO with 44x size reduction and 3.28x speedup.
Learning Sim-Grounded Policies for Bimanual Rope Manipulation from Human Teleoperation Data
cs.RO 2026-05 conditional novelty 6.0

A simulation-grounded state policy using 3D particle dynamics outperforms an egocentric vision policy by 30.8% in L1 error on unseen rope configurations for bimanual manipulation from limited human data.
FLASH: Efficient Visuomotor Policy via Sparse Sampling
cs.RO 2026-05 unverdicted novelty 6.0

FLASH Policy uses sparse Legendre polynomial trajectory fitting and history-anchored flow matching to enable single-step inference for visuomotor control, reporting 31.4 ms per-episode latency and >=92% success on fiv...
SID: Sliding into Distribution for Robust Few-Demonstration Manipulation
cs.RO 2026-05 unverdicted novelty 6.0

SID achieves approximately 90% success on six real-world manipulation tasks with only two demonstrations under out-of-distribution initializations, with less than 10% performance drop under distractors and disturbances.
GuidedVLA: Specifying Task-Relevant Factors via Plug-and-Play Action Attention Specialization
cs.RO 2026-05 unverdicted novelty 6.0

GuidedVLA improves VLA success rates by manually supervising separate attention heads in the action decoder with auxiliary signals for task-relevant factors.
Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models
cs.RO 2026-05 unverdicted novelty 6.0

Pace-and-Path Correction is a closed-form inference-time operator that decomposes a quadratic cost minimization into orthogonal pace compression and path offset channels to correct dynamics-blindness in chunked-action...
ALAM: Algebraically Consistent Latent Action Model for Vision-Language-Action Models
cs.RO 2026-05 unverdicted novelty 6.0

ALAM introduces algebraic consistency regularization on latent action transitions from videos, raising VLA success rates from 47.9% to 85.0% on MetaWorld MT50 and 94.1% to 98.1% on LIBERO.
Retrieve-then-Steer: Online Success Memory for Test-Time Adaptation of Generative VLAs
cs.RO 2026-05 unverdicted novelty 6.0

A retrieve-then-steer method stores successful robot actions in memory and uses them to steer a frozen VLA's flow-matching sampler for better test-time reliability without parameter updates.
Retrieve-then-Steer: Online Success Memory for Test-Time Adaptation of Generative VLAs
cs.RO 2026-05 unverdicted novelty 6.0

Retrieve-then-steer stores successful observation-action segments in memory, retrieves relevant chunks, filters them, and uses an elite prior with confidence-adaptive guidance to steer a flow-matching action sampler f...
When to Re-Commit: Temporal Abstraction Discovery for Long-Horizon Vision-Language Reasoning
cs.AI 2026-05 unverdicted novelty 6.0

Learns state-conditioned commitment depth in a 7B vision-language policy that jointly predicts actions and replan intervals, outperforming fixed-depth baselines and larger models on Sliding Puzzle and Sokoban while pr...
One Token Per Frame: Reconsidering Visual Bandwidth in World Models for VLA Policy
cs.CV 2026-05 unverdicted novelty 6.0

Reducing visual input to one token per frame via adaptive attention pooling and a unified flow-matching objective improves long-horizon performance in VLA policies on MetaWorld, LIBERO, and real-robot tasks.
One Token Per Frame: Reconsidering Visual Bandwidth in World Models for VLA Policy
cs.CV 2026-05 unverdicted novelty 6.0

Reducing visual input to one token per frame in world models for vision-language-action policies maintains long-horizon performance while improving success rates on MetaWorld, LIBERO, and real-robot tasks.
When to Trust Imagination: Adaptive Action Execution for World Action Models
cs.RO 2026-05 unverdicted novelty 6.0

Future Forward Dynamics Causal Attention (FFDC) enables World Action Models to adaptively choose action chunk lengths based on prediction-observation consistency, cutting model inferences by 69% and improving real-wor...
When to Trust Imagination: Adaptive Action Execution for World Action Models
cs.RO 2026-05 unverdicted novelty 6.0

A verifier called Future Forward Dynamics Causal Attention enables adaptive action execution in World Action Models, reducing model inferences by 69% and improving success rates in robotic tasks.
DexSynRefine: Synthesizing and Refining Human-Object Interaction Motion for Physically Feasible Dexterous Robot Actions
cs.RO 2026-05 unverdicted novelty 6.0

DexSynRefine synthesizes HOI motions with an extended manifold method, refines them via task-space residual RL, and adapts for sim-to-real transfer, outperforming kinematic retargeting by 50-70 percentage points on fi...
Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities
cs.AI 2026-05 unverdicted novelty 6.0

LQL turns n-step action-sequence lower bounds into a practical hinge-loss stabilizer for off-policy Q-learning without extra networks or forward passes.
Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities
cs.AI 2026-05 unverdicted novelty 6.0

LQL stabilizes Q-learning by penalizing violations of n-step action-sequence lower bounds with a hinge loss computed from standard network outputs.
Adaptive Q-Chunking for Offline-to-Online Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 6.0

Adaptive Q-Chunking selects optimal action chunk sizes at each state via normalized advantage comparisons to outperform fixed chunk sizes in offline-to-online RL on robot benchmarks.

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages · cited by 159 Pith papers · 5 internal anchors

[1]

URL https://www

Viperx 300 robot arm 6dof. URL https://www. trossenrobotics.com/viperx-300-robot-arm-6dof.aspx

work page
[2]

URL https://www

Widowx 250 robot arm 6dof. URL https://www. trossenrobotics.com/widowx-250-robot-arm-6dof.aspx

work page
[3]

URL https://www.youtube.com/watch?v= TearcKVj0iY

Highly dexterous manipulation system - capabilities - part 1, Nov 2014. URL https://www.youtube.com/watch?v= TearcKVj0iY

work page 2014
[4]

URL https://www

Assembly performance metrics and test methods, Apr 2022. URL https://www. nist.gov/el/intelligent-systems-division-73500/ robotic-grasping-and-manipulation-assembly/assembly

work page 2022
[5]

Teleoperated robots - shadow teleoperation system, Nov

work page
[6]

URL https://www.shadowrobot.com/teleoperation/

work page
[7]

Holo-dex: Teaching dexterity with immersive mixed reality,

Sridhar Pandian Arunachalam, Irmak Güzey, Soumith Chintala, and Lerrel Pinto. Holo-dex: Teaching dex- terity with immersive mixed reality. arXiv preprint arXiv:2210.06463, 2022

work page arXiv 2022
[8]

RT-1: Robotics Transformer for Real-World Control at Scale

Anthony Brohan, Noah Brown, Justice Carbajal, Yev- gen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alexander Herzog, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Tomas Jackson, Sally Jesmonth, Nikhil J. Joshi, Ryan C. Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Kuang- Huei Lee, Sergey Levine, Yao Lu, U...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[9]

End- to-end object detection with transformers

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nico- las Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. ArXiv, abs/2005.12872, 2020

work page arXiv 2005
[10]

Towards human-level bimanual dexterous manipulation with rein- forcement learning

Yuanpei Chen, Yaodong Yang, Tianhao Wu, Shengjie Wang, Xidong Feng, Jiechuan Jiang, Stephen McAleer, Hao Dong, Zongqing Lu, and Song-Chun Zhu. Towards human-level bimanual dexterous manipulation with rein- forcement learning. ArXiv, abs/2206.08686, 2022

work page arXiv 2022
[11]

Efﬁcient bimanual manipulation using learned task schemas

Rohan Chitnis, Shubham Tulsiani, Saurabh Gupta, and Abhinav Kumar Gupta. Efﬁcient bimanual manipulation using learned task schemas. 2020 IEEE International Conference on Robotics and Automation (ICRA) , pages 1149–1155, 2019

work page 2020
[12]

Transformers for one-shot visual imitation

Sudeep Dasari and Abhinav Kumar Gupta. Transformers for one-shot visual imitation. In Conference on Robot Learning, 2020

work page 2020
[13]

Causal confusion in imitation learning

Pim de Haan, Dinesh Jayaraman, and Sergey Levine. Causal confusion in imitation learning. In Neural Information Processing Systems , 2019

work page 2019
[14]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirec- tional transformers for language understanding. ArXiv, abs/1810.04805, 2019

work page internal anchor Pith review Pith/arXiv arXiv 2019
[15]

One-Shot Imitation Learning

Yan Duan, Marcin Andrychowicz, Bradly C. Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever, P. Abbeel, and Wojciech Zaremba. One-shot imitation learning. ArXiv, abs/1703.07326, 2017

work page Pith review arXiv 2017
[16]

Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets

Frederik Ebert, Yanlai Yang, Karl Schmeckpeper, Bernadette Bucher, Georgios Georgakis, Kostas Daniilidis, Chelsea Finn, and Sergey Levine. Bridge data: Boosting generalization of robotic skills with cross-domain datasets. ArXiv, abs/2109.13396, 2021

work page internal anchor Pith review arXiv 2021
[17]

Florence, Lucas Manuelli, and Russ Tedrake

Peter R. Florence, Lucas Manuelli, and Russ Tedrake. Self- supervised correspondence in visuomotor policy learning. IEEE Robotics and Automation Letters , 5:492–499, 2019

work page 2019
[18]

Implicit

Peter R. Florence, Corey Lynch, Andy Zeng, Oscar Ramirez, Ayzaan Wahid, Laura Downs, Adrian S. Wong, Johnny Lee, Igor Mordatch, and Jonathan Tompson. Implicit behavioral cloning. ArXiv, abs/2109.00137, 2021

work page arXiv 2021
[19]

Learning dense visual correspondences in simulation to smooth and fold real fabrics

Aditya Ganapathi, Priya Sundaresan, Brijen Thananjeyan, Ashwin Balakrishna, Daniel Seita, Jennifer Grannen, Minho Hwang, Ryan Hoque, Joseph Gonzalez, Nawid Jamali, Katsu Yamane, Soshi Iba, and Ken Goldberg. Learning dense visual correspondences in simulation to smooth and fold real fabrics. 2021 IEEE International Conference on Robotics and Automation (IC...

work page 2021
[20]

Untangling dense knots by learning task-relevant keypoints

Jennifer Grannen, Priya Sundaresan, Brijen Thananjeyan, Jeffrey Ichnowski, Ashwin Balakrishna, Minho Hwang, Vainavi Viswanath, Michael Laskey, Joseph Gonzalez, and Ken Goldberg. Untangling dense knots by learning task-relevant keypoints. In Conference on Robot Learning, 2020

work page 2020
[21]

Flingbot: The unreasonable effectiveness of dynamic manipulation for cloth unfold- ing, 2021

Huy Ha and Shuran Song. Flingbot: The unreasonable effectiveness of dynamic manipulation for cloth unfolding. ArXiv, abs/2105.03655, 2021

work page arXiv 2021
[22]

Ratliff, and Dieter Fox

Ankur Handa, Karl Van Wyk, Wei Yang, Jacky Liang, Yu- Wei Chao, Qian Wan, Stan Birchﬁeld, Nathan D. Ratliff, and Dieter Fox. Dexpilot: Vision-based teleoperation of dexterous robotic hand-arm system. 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 9164–9170, 2019

work page 2020
[23]

Zhang, Shaoqing Ren, and Jian Sun

Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2015

work page 2016
[24]

Burgess, Xavier Glorot, Matthew M

Irina Higgins, Loïc Matthey, Arka Pal, Christopher P. Burgess, Xavier Glorot, Matthew M. Botvinick, Shakir Mohamed, and Alexander Lerchner. beta-vae: Learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations, 2016

work page 2016
[25]

Novoseller, Albert Wilcox, Daniel S

Ryan Hoque, Ashwin Balakrishna, Ellen R. Novoseller, Albert Wilcox, Daniel S. Brown, and Ken Goldberg. Thriftydagger: Budget-aware novelty and risk gating for interactive imitation learning. In Conference on Robot Learning, 2021

work page 2021
[26]

Stephen James, Michael Bloesch, and Andrew J. Davison. Task-embedded control networks for few-shot imitation learning. ArXiv, abs/1810.03237, 2018

work page Pith review arXiv 2018
[27]

Bc-z: Zero-shot task generalization with robotic imitation learning

Eric Jang, Alex Irpan, Mohi Khansari, Daniel Kappler, Frederik Ebert, Corey Lynch, Sergey Levine, and Chelsea Finn. Bc-z: Zero-shot task generalization with robotic imitation learning. In Conference on Robot Learning , 2022

work page 2022
[28]

Master–slave manipulators and remote maintenance at the oak ridge national labora- tory, Jan 1975

R G Jenness and C D Wicker. Master–slave manipulators and remote maintenance at the oak ridge national labora- tory, Jan 1975. URL https://www.osti.gov/biblio/4179544

work page arXiv 1975
[29]

Coarse-to-ﬁne imitation learning: Robot manipulation from a single demonstration

Edward Johns. Coarse-to-ﬁne imitation learning: Robot manipulation from a single demonstration. 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 4613–4619, 2021

work page 2021
[30]

Grasping with chopsticks: Combating covariate shift in model-free imitation learning for ﬁne manipulation

Liyiming Ke, Jingqiang Wang, Tapomayukh Bhattachar- jee, Byron Boots, and Siddhartha Srinivasa. Grasping with chopsticks: Combating covariate shift in model-free imitation learning for ﬁne manipulation. In International Conference on Robotics and Automation (ICRA) , 2021

work page 2021
[31]

Driggs-Campbell, and Mykel J

Michael Kelly, Chelsea Sidrane, K. Driggs-Campbell, and Mykel J. Kochenderfer. Hg-dagger: Interactive imitation learning with human experts. 2019 International Conference on Robotics and Automation (ICRA) , pages 8077–8083, 2018

work page 2019
[32]

Gaze-based dual resolution deep imitation learning for high-precision dexterous robot manipulation

Heecheol Kim, Yoshiyuki Ohmura, and Yasuo Kuniyoshi. Gaze-based dual resolution deep imitation learning for high-precision dexterous robot manipulation. IEEE Robotics and Automation Letters , 6:1630–1637, 2021

work page 2021
[33]

Robot peels banana with goal- conditioned dual-action deep imitation learn- ing

Heecheol Kim, Yoshiyuki Ohmura, and Yasuo Kuniyoshi. Robot peels banana with goal-conditioned dual-action deep imitation learning. ArXiv, abs/2203.09749, 2022

work page arXiv 2022
[34]

Auto-Encoding Variational Bayes

Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. CoRR, abs/1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[35]

Towards learning hierarchical skills for multi-phase manipulation tasks

Oliver Kroemer, Christian Daniel, Gerhard Neumann, Herke van Hoof, and Jan Peters. Towards learning hierarchical skills for multi-phase manipulation tasks. 2015 IEEE International Conference on Robotics and Automation (ICRA), pages 1503–1510, 2015

work page 2015
[36]

Action chunking as policy compression, Sep 2022

Lucy Lai, Ann Z Huang, and Samuel J Gershman. Action chunking as policy compression, Sep 2022. URL psyarxiv. com/z8yrv

work page 2022
[37]

Dragan, and Ken Goldberg

Michael Laskey, Jonathan Lee, Roy Fox, Anca D. Dragan, and Ken Goldberg. Dart: Noise injection for robust imitation learning. In Conference on Robot Learning , 2017

work page 2017
[38]

Lee, Henry Lu, Abhishek Gupta, Sergey Levine, and P

Alex X. Lee, Henry Lu, Abhishek Gupta, Sergey Levine, and P. Abbeel. Learning force-based manipulation of deformable objects from multiple demonstrations. 2015 IEEE International Conference on Robotics and Automation (ICRA), pages 177–184, 2015

work page 2015
[39]

Optimal control for biological movement systems

Weiwei Li. Optimal control for biological movement systems. 2006

work page 2006
[40]

Wong, Soroush Nasiriany, Chen Wang, Rohun Kulkarni, Li Fei-Fei, Silvio Savarese, Yuke Zhu, and Roberto Mart’in-Mart’in

Ajay Mandlekar, Danfei Xu, J. Wong, Soroush Nasiriany, Chen Wang, Rohun Kulkarni, Li Fei-Fei, Silvio Savarese, Yuke Zhu, and Roberto Mart’in-Mart’in. What matters in learning from ofﬂine human demonstrations for robot manipulation. In Conference on Robot Learning , 2021

work page 2021
[41]

Driggs-Campbell, and Mykel J

Kunal Menda, K. Driggs-Campbell, and Mykel J. Kochen- derfer. Ensembledagger: A bayesian approach to safe imitation learning. 2019 IEEE/RSJ International Confer- ence on Intelligent Robots and Systems (IROS) , pages 5041–5048, 2018

work page 2019
[42]

Intermittent visual servoing: Efﬁciently learning policies robust to instrument changes for high-precision surgical manipula- tion

Samuel Paradis, Minho Hwang, Brijen Thananjeyan, Jeffrey Ichnowski, Daniel Seita, Danyal Fer, Thomas Low, Joseph Gonzalez, and Ken Goldberg. Intermittent visual servoing: Efﬁciently learning policies robust to instrument changes for high-precision surgical manipula- tion. 2021 IEEE International Conference on Robotics and Automation (ICRA) , pages 7166–7173, 2020

work page 2021
[43]

The surprising ef- fectiveness of representation learning for visual imitation

Jyothish Pari, Nur Muhammad, Sridhar Pandian Arunacha- lam, and Lerrel Pinto. The surprising effectiveness of representation learning for visual imitation. arXiv preprint arXiv:2112.01511, 2021

work page arXiv 2021
[44]

Learning and generalization of motor skills by learning from demonstration

Peter Pastor, Heiko Hoffmann, Tamim Asfour, and Stefan Schaal. Learning and generalization of motor skills by learning from demonstration. 2009 IEEE International Conference on Robotics and Automation , pages 763–768, 2009

work page 2009
[45]

Pomerleau

Dean A. Pomerleau. Alvinn: An autonomous land vehicle in a neural network. In NIPS, 1988

work page 1988
[46]

From one hand to multiple hands: Imitation learning for dexterous manipulation from single-camera teleoperation

Yuzhe Qin, Hao Su, and Xiaolong Wang. From one hand to multiple hands: Imitation learning for dexterous manipulation from single-camera teleoperation. IEEE Robotics and Automation Letters , 7:10873–10881, 2022

work page 2022
[47]

Vision-based multi-task manipulation for inexpensive robots using end-to-end learning from demonstration

Rouhollah Rahmatizadeh, Pooya Abolghasemi, Ladislau Bölöni, and Sergey Levine. Vision-based multi-task manipulation for inexpensive robots using end-to-end learning from demonstration. 2018 IEEE International Conference on Robotics and Automation (ICRA) , pages 3758–3765, 2017

work page 2018
[48]

Gordon, and J

Stéphane Ross, Geoffrey J. Gordon, and J. Andrew Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. In International Conference on Artiﬁcial Intelligence and Statistics , 2010

work page 2010
[49]

A uniﬁed framework for coordinated multi-arm motion planning

Seyed Sina Mirrazavi Salehian, Nadia Figueroa, and Aude Billard. A uniﬁed framework for coordinated multi-arm motion planning. The International Journal of Robotics Research, 37:1205 – 1232, 2018

work page 2018
[50]

Behavior Transformers: Cloning $k$ modes with one stone, October 2022

Nur Muhammad (Mahi) Shaﬁullah, Zichen Jeff Cui, Ariuntuya Altanzaya, and Lerrel Pinto. Behavior trans- formers: Cloning k modes with one stone. ArXiv, abs/2206.11251, 2022

work page arXiv 2022
[51]

Sgtm 2.0: Autonomously untangling long cables using interactive perception

Kaushik Shivakumar, Vainavi Viswanath, Anrui Gu, Yahav Avigal, Justin Kerr, Jeffrey Ichnowski, Richard Cheng, Thomas Kollar, and Ken Goldberg. Sgtm 2.0: Autonomously untangling long cables using interactive perception. ArXiv, abs/2209.13706, 2022

work page arXiv 2022
[52]

Cliport: What and where pathways for robotic manipulation,

Mohit Shridhar, Lucas Manuelli, and Dieter Fox. Cliport: What and where pathways for robotic manipulation. ArXiv, abs/2109.12098, 2021

work page arXiv 2021
[53]

Perceiver-actor: A multi-task transformer for robotic manipulation

Mohit Shridhar, Lucas Manuelli, and Dieter Fox. Perceiver-actor: A multi-task transformer for robotic manipulation. ArXiv, abs/2209.05451, 2022

work page arXiv 2022
[54]

Robotic telekinesis: Learning a robotic hand imitator by watching humans on youtube

Aravind Sivakumar, Kenneth Shaw, and Deepak Pathak. Robotic telekinesis: Learning a robotic hand imitator by watching humans on youtube. RSS, 2022

work page 2022
[55]

Dimarogonas, and Danica Kragic

Christian Smith, Yiannis Karayiannidis, Lazaros Nal- pantidis, Xavi Gratal, Peng Qi, Dimos V . Dimarogonas, and Danica Kragic. Dual arm manipulation - a survey. Robotics Auton. Syst. , 60:1340–1353, 2012

work page 2012
[56]

Learning structured output representation using deep conditional generative models

Kihyuk Sohn, Honglak Lee, and Xinchen Yan. Learning structured output representation using deep conditional generative models. In NIPS, 2015

work page 2015
[57]

Shadow teleoperation system plays jenga, Mar 2021

srcteam. Shadow teleoperation system plays jenga, Mar 2021. URL https://www.youtube.com/watch?v= 7K9brH27jvM

work page 2021
[58]

How researchers are using shadow robot’s technology, Jun 2022

srcteam. How researchers are using shadow robot’s technology, Jun 2022. URL https://www.youtube.com/ watch?v=p36fYIoTD8M

work page 2022
[59]

Shadow teleoperation system, Jun 2022

srcteam. Shadow teleoperation system, Jun 2022. URL https://www.youtube.com/watch?v=cx8eznfDUJA

work page 2022
[60]

A system for imitation learning of contact-rich bimanual manipulation policies

Simon Stepputtis, Maryam Bandari, Stefan Schaal, and Heni Ben Amor. A system for imitation learning of contact-rich bimanual manipulation policies. 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages 11810–11817, 2022

work page 2022
[61]

Novoseller, Minho Hwang, Michael Laskey, Joseph Gon- zalez, and Ken Goldberg

Priya Sundaresan, Jennifer Grannen, Brijen Thanan- jeyan, Ashwin Balakrishna, Jeffrey Ichnowski, Ellen R. Novoseller, Minho Hwang, Michael Laskey, Joseph Gon- zalez, and Ken Goldberg. Untangling dense non-planar knots by learning manipulation features and recovery policies. ArXiv, abs/2107.08942, 2021

work page arXiv 2021
[62]

Andrew Bagnell, and Zhiwei Steven Wu

Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, and Zhiwei Steven Wu. Causal imitation learning under temporally correlated noise. In International Conference on Machine Learning , 2022

work page 2022
[63]

Deep learning and the information bottleneck principle

Naftali Tishby and Noga Zaslavsky. Deep learning and the information bottleneck principle. 2015 IEEE Information Theory Workshop (ITW), pages 1–5, 2015

work page 2015
[64]

Mujoco: A physics engine for model-based control

Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. 2012 IEEE/RSJ International Conference on Intelligent Robots and Sys- tems, pages 5026–5033, 2012

work page 2012
[65]

Stephen Tu, Alexander Robey, Tingnan Zhang, and N. Matni. On the sample complexity of stability con- strained imitation learning. In Conference on Learning for Dynamics & Control , 2021

work page 2021
[66]

Attention Is All You Need

Ashish Vaswani, Noam M. Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. ArXiv, abs/1706.03762, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[67]

interbotix_ros_manipulators

Solomon Wiznitzer, Luke Schmitt, and Matt Trossen. interbotix_ros_manipulators. URL https://github.com/ Interbotix/interbotix_ros_manipulators

work page
[68]

Fan Xie, A. M. Masum Bulbul Chowdhury, M. Clara De Paolis Kaluza, Linfeng Zhao, Lawson L. S. Wong, and Rose Yu. Deep imitation learning for bimanual robotic manipulation. ArXiv, abs/2010.05134, 2020

work page arXiv 2010
[69]

Florence, Jonathan Tompson, Stefan Welker, Jonathan Chien, Maria Attarian, Travis Armstrong, Ivan Krasin, Dan Duong, Vikas Sindhwani, and Johnny Lee

Andy Zeng, Peter R. Florence, Jonathan Tompson, Stefan Welker, Jonathan Chien, Maria Attarian, Travis Armstrong, Ivan Krasin, Dan Duong, Vikas Sindhwani, and Johnny Lee. Transporter networks: Rearranging the visual world for robotic manipulation. In Conference on Robot Learning, 2020

work page 2020
[70]

Tianhao Zhang, Zoe McCarthy, Owen Jow, Dennis Lee, Ken Goldberg, and P. Abbeel. Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. 2018 IEEE International Conference on Robotics and Automation (ICRA) , pages 1–8, 2017

work page 2018
[71]

Florence, and Chelsea Finn

Allan Zhou, Moo Jin Kim, Lirui Wang, Peter R. Florence, and Chelsea Finn. Nerf in the palm of your hand: Corrective augmentation for robotics via novel-view synthesis. ArXiv, abs/2301.08556, 2023

work page arXiv 2023
[72]

The measure- ment of proprioceptive accuracy: A systematic literature review

Áron Horváth, Eszter Ferentzi, Kristóf Schwartz, Nina Jacobs, Pieter Meyns, and Ferenc Köteles. The measure- ment of proprioceptive accuracy: A systematic literature review. Journal of Sport and Health Science , 2022. ISSN 2095-2546. doi: https://doi.org/10.1016/j.jshs.2022.04

work page doi:10.1016/j.jshs.2022.04 2022
[73]

beer pong

URL https://www.sciencedirect.com/science/article/ pii/S2095254622000473. APPENDIX A. Comparing ALOHA with Prior Teleoperation Setups In Figure 9, we include more teleoperated tasks that ALOHA is capable of. We stress that all objects are taken directly from the real world without any modiﬁcation, to demonstrate ALOHA’s generality in real life settings. A...

work page 1953

[1] [1]

URL https://www

Viperx 300 robot arm 6dof. URL https://www. trossenrobotics.com/viperx-300-robot-arm-6dof.aspx

work page

[2] [2]

URL https://www

Widowx 250 robot arm 6dof. URL https://www. trossenrobotics.com/widowx-250-robot-arm-6dof.aspx

work page

[3] [3]

URL https://www.youtube.com/watch?v= TearcKVj0iY

Highly dexterous manipulation system - capabilities - part 1, Nov 2014. URL https://www.youtube.com/watch?v= TearcKVj0iY

work page 2014

[4] [4]

URL https://www

Assembly performance metrics and test methods, Apr 2022. URL https://www. nist.gov/el/intelligent-systems-division-73500/ robotic-grasping-and-manipulation-assembly/assembly

work page 2022

[5] [5]

Teleoperated robots - shadow teleoperation system, Nov

work page

[6] [6]

URL https://www.shadowrobot.com/teleoperation/

work page

[7] [7]

Holo-dex: Teaching dexterity with immersive mixed reality,

Sridhar Pandian Arunachalam, Irmak Güzey, Soumith Chintala, and Lerrel Pinto. Holo-dex: Teaching dex- terity with immersive mixed reality. arXiv preprint arXiv:2210.06463, 2022

work page arXiv 2022

[8] [8]

RT-1: Robotics Transformer for Real-World Control at Scale

Anthony Brohan, Noah Brown, Justice Carbajal, Yev- gen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alexander Herzog, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Tomas Jackson, Sally Jesmonth, Nikhil J. Joshi, Ryan C. Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Kuang- Huei Lee, Sergey Levine, Yao Lu, U...

work page internal anchor Pith review Pith/arXiv arXiv 2022

[9] [9]

End- to-end object detection with transformers

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nico- las Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. ArXiv, abs/2005.12872, 2020

work page arXiv 2005

[10] [10]

Towards human-level bimanual dexterous manipulation with rein- forcement learning

Yuanpei Chen, Yaodong Yang, Tianhao Wu, Shengjie Wang, Xidong Feng, Jiechuan Jiang, Stephen McAleer, Hao Dong, Zongqing Lu, and Song-Chun Zhu. Towards human-level bimanual dexterous manipulation with rein- forcement learning. ArXiv, abs/2206.08686, 2022

work page arXiv 2022

[11] [11]

Efﬁcient bimanual manipulation using learned task schemas

Rohan Chitnis, Shubham Tulsiani, Saurabh Gupta, and Abhinav Kumar Gupta. Efﬁcient bimanual manipulation using learned task schemas. 2020 IEEE International Conference on Robotics and Automation (ICRA) , pages 1149–1155, 2019

work page 2020

[12] [12]

Transformers for one-shot visual imitation

Sudeep Dasari and Abhinav Kumar Gupta. Transformers for one-shot visual imitation. In Conference on Robot Learning, 2020

work page 2020

[13] [13]

Causal confusion in imitation learning

Pim de Haan, Dinesh Jayaraman, and Sergey Levine. Causal confusion in imitation learning. In Neural Information Processing Systems , 2019

work page 2019

[14] [14]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirec- tional transformers for language understanding. ArXiv, abs/1810.04805, 2019

work page internal anchor Pith review Pith/arXiv arXiv 2019

[15] [15]

One-Shot Imitation Learning

Yan Duan, Marcin Andrychowicz, Bradly C. Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever, P. Abbeel, and Wojciech Zaremba. One-shot imitation learning. ArXiv, abs/1703.07326, 2017

work page Pith review arXiv 2017

[16] [16]

Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets

Frederik Ebert, Yanlai Yang, Karl Schmeckpeper, Bernadette Bucher, Georgios Georgakis, Kostas Daniilidis, Chelsea Finn, and Sergey Levine. Bridge data: Boosting generalization of robotic skills with cross-domain datasets. ArXiv, abs/2109.13396, 2021

work page internal anchor Pith review arXiv 2021

[17] [17]

Florence, Lucas Manuelli, and Russ Tedrake

Peter R. Florence, Lucas Manuelli, and Russ Tedrake. Self- supervised correspondence in visuomotor policy learning. IEEE Robotics and Automation Letters , 5:492–499, 2019

work page 2019

[18] [18]

Implicit

Peter R. Florence, Corey Lynch, Andy Zeng, Oscar Ramirez, Ayzaan Wahid, Laura Downs, Adrian S. Wong, Johnny Lee, Igor Mordatch, and Jonathan Tompson. Implicit behavioral cloning. ArXiv, abs/2109.00137, 2021

work page arXiv 2021

[19] [19]

Learning dense visual correspondences in simulation to smooth and fold real fabrics

Aditya Ganapathi, Priya Sundaresan, Brijen Thananjeyan, Ashwin Balakrishna, Daniel Seita, Jennifer Grannen, Minho Hwang, Ryan Hoque, Joseph Gonzalez, Nawid Jamali, Katsu Yamane, Soshi Iba, and Ken Goldberg. Learning dense visual correspondences in simulation to smooth and fold real fabrics. 2021 IEEE International Conference on Robotics and Automation (IC...

work page 2021

[20] [20]

Untangling dense knots by learning task-relevant keypoints

Jennifer Grannen, Priya Sundaresan, Brijen Thananjeyan, Jeffrey Ichnowski, Ashwin Balakrishna, Minho Hwang, Vainavi Viswanath, Michael Laskey, Joseph Gonzalez, and Ken Goldberg. Untangling dense knots by learning task-relevant keypoints. In Conference on Robot Learning, 2020

work page 2020

[21] [21]

Flingbot: The unreasonable effectiveness of dynamic manipulation for cloth unfold- ing, 2021

Huy Ha and Shuran Song. Flingbot: The unreasonable effectiveness of dynamic manipulation for cloth unfolding. ArXiv, abs/2105.03655, 2021

work page arXiv 2021

[22] [22]

Ratliff, and Dieter Fox

Ankur Handa, Karl Van Wyk, Wei Yang, Jacky Liang, Yu- Wei Chao, Qian Wan, Stan Birchﬁeld, Nathan D. Ratliff, and Dieter Fox. Dexpilot: Vision-based teleoperation of dexterous robotic hand-arm system. 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 9164–9170, 2019

work page 2020

[23] [23]

Zhang, Shaoqing Ren, and Jian Sun

Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2015

work page 2016

[24] [24]

Burgess, Xavier Glorot, Matthew M

Irina Higgins, Loïc Matthey, Arka Pal, Christopher P. Burgess, Xavier Glorot, Matthew M. Botvinick, Shakir Mohamed, and Alexander Lerchner. beta-vae: Learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations, 2016

work page 2016

[25] [25]

Novoseller, Albert Wilcox, Daniel S

Ryan Hoque, Ashwin Balakrishna, Ellen R. Novoseller, Albert Wilcox, Daniel S. Brown, and Ken Goldberg. Thriftydagger: Budget-aware novelty and risk gating for interactive imitation learning. In Conference on Robot Learning, 2021

work page 2021

[26] [26]

Stephen James, Michael Bloesch, and Andrew J. Davison. Task-embedded control networks for few-shot imitation learning. ArXiv, abs/1810.03237, 2018

work page Pith review arXiv 2018

[27] [27]

Bc-z: Zero-shot task generalization with robotic imitation learning

Eric Jang, Alex Irpan, Mohi Khansari, Daniel Kappler, Frederik Ebert, Corey Lynch, Sergey Levine, and Chelsea Finn. Bc-z: Zero-shot task generalization with robotic imitation learning. In Conference on Robot Learning , 2022

work page 2022

[28] [28]

Master–slave manipulators and remote maintenance at the oak ridge national labora- tory, Jan 1975

R G Jenness and C D Wicker. Master–slave manipulators and remote maintenance at the oak ridge national labora- tory, Jan 1975. URL https://www.osti.gov/biblio/4179544

work page arXiv 1975

[29] [29]

Coarse-to-ﬁne imitation learning: Robot manipulation from a single demonstration

Edward Johns. Coarse-to-ﬁne imitation learning: Robot manipulation from a single demonstration. 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 4613–4619, 2021

work page 2021

[30] [30]

Grasping with chopsticks: Combating covariate shift in model-free imitation learning for ﬁne manipulation

Liyiming Ke, Jingqiang Wang, Tapomayukh Bhattachar- jee, Byron Boots, and Siddhartha Srinivasa. Grasping with chopsticks: Combating covariate shift in model-free imitation learning for ﬁne manipulation. In International Conference on Robotics and Automation (ICRA) , 2021

work page 2021

[31] [31]

Driggs-Campbell, and Mykel J

Michael Kelly, Chelsea Sidrane, K. Driggs-Campbell, and Mykel J. Kochenderfer. Hg-dagger: Interactive imitation learning with human experts. 2019 International Conference on Robotics and Automation (ICRA) , pages 8077–8083, 2018

work page 2019

[32] [32]

Gaze-based dual resolution deep imitation learning for high-precision dexterous robot manipulation

Heecheol Kim, Yoshiyuki Ohmura, and Yasuo Kuniyoshi. Gaze-based dual resolution deep imitation learning for high-precision dexterous robot manipulation. IEEE Robotics and Automation Letters , 6:1630–1637, 2021

work page 2021

[33] [33]

Robot peels banana with goal- conditioned dual-action deep imitation learn- ing

Heecheol Kim, Yoshiyuki Ohmura, and Yasuo Kuniyoshi. Robot peels banana with goal-conditioned dual-action deep imitation learning. ArXiv, abs/2203.09749, 2022

work page arXiv 2022

[34] [34]

Auto-Encoding Variational Bayes

Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. CoRR, abs/1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[35] [35]

Towards learning hierarchical skills for multi-phase manipulation tasks

Oliver Kroemer, Christian Daniel, Gerhard Neumann, Herke van Hoof, and Jan Peters. Towards learning hierarchical skills for multi-phase manipulation tasks. 2015 IEEE International Conference on Robotics and Automation (ICRA), pages 1503–1510, 2015

work page 2015

[36] [36]

Action chunking as policy compression, Sep 2022

Lucy Lai, Ann Z Huang, and Samuel J Gershman. Action chunking as policy compression, Sep 2022. URL psyarxiv. com/z8yrv

work page 2022

[37] [37]

Dragan, and Ken Goldberg

Michael Laskey, Jonathan Lee, Roy Fox, Anca D. Dragan, and Ken Goldberg. Dart: Noise injection for robust imitation learning. In Conference on Robot Learning , 2017

work page 2017

[38] [38]

Lee, Henry Lu, Abhishek Gupta, Sergey Levine, and P

Alex X. Lee, Henry Lu, Abhishek Gupta, Sergey Levine, and P. Abbeel. Learning force-based manipulation of deformable objects from multiple demonstrations. 2015 IEEE International Conference on Robotics and Automation (ICRA), pages 177–184, 2015

work page 2015

[39] [39]

Optimal control for biological movement systems

Weiwei Li. Optimal control for biological movement systems. 2006

work page 2006

[40] [40]

Wong, Soroush Nasiriany, Chen Wang, Rohun Kulkarni, Li Fei-Fei, Silvio Savarese, Yuke Zhu, and Roberto Mart’in-Mart’in

Ajay Mandlekar, Danfei Xu, J. Wong, Soroush Nasiriany, Chen Wang, Rohun Kulkarni, Li Fei-Fei, Silvio Savarese, Yuke Zhu, and Roberto Mart’in-Mart’in. What matters in learning from ofﬂine human demonstrations for robot manipulation. In Conference on Robot Learning , 2021

work page 2021

[41] [41]

Driggs-Campbell, and Mykel J

Kunal Menda, K. Driggs-Campbell, and Mykel J. Kochen- derfer. Ensembledagger: A bayesian approach to safe imitation learning. 2019 IEEE/RSJ International Confer- ence on Intelligent Robots and Systems (IROS) , pages 5041–5048, 2018

work page 2019

[42] [42]

Intermittent visual servoing: Efﬁciently learning policies robust to instrument changes for high-precision surgical manipula- tion

Samuel Paradis, Minho Hwang, Brijen Thananjeyan, Jeffrey Ichnowski, Daniel Seita, Danyal Fer, Thomas Low, Joseph Gonzalez, and Ken Goldberg. Intermittent visual servoing: Efﬁciently learning policies robust to instrument changes for high-precision surgical manipula- tion. 2021 IEEE International Conference on Robotics and Automation (ICRA) , pages 7166–7173, 2020

work page 2021

[43] [43]

The surprising ef- fectiveness of representation learning for visual imitation

Jyothish Pari, Nur Muhammad, Sridhar Pandian Arunacha- lam, and Lerrel Pinto. The surprising effectiveness of representation learning for visual imitation. arXiv preprint arXiv:2112.01511, 2021

work page arXiv 2021

[44] [44]

Learning and generalization of motor skills by learning from demonstration

Peter Pastor, Heiko Hoffmann, Tamim Asfour, and Stefan Schaal. Learning and generalization of motor skills by learning from demonstration. 2009 IEEE International Conference on Robotics and Automation , pages 763–768, 2009

work page 2009

[45] [45]

Pomerleau

Dean A. Pomerleau. Alvinn: An autonomous land vehicle in a neural network. In NIPS, 1988

work page 1988

[46] [46]

From one hand to multiple hands: Imitation learning for dexterous manipulation from single-camera teleoperation

Yuzhe Qin, Hao Su, and Xiaolong Wang. From one hand to multiple hands: Imitation learning for dexterous manipulation from single-camera teleoperation. IEEE Robotics and Automation Letters , 7:10873–10881, 2022

work page 2022

[47] [47]

Vision-based multi-task manipulation for inexpensive robots using end-to-end learning from demonstration

Rouhollah Rahmatizadeh, Pooya Abolghasemi, Ladislau Bölöni, and Sergey Levine. Vision-based multi-task manipulation for inexpensive robots using end-to-end learning from demonstration. 2018 IEEE International Conference on Robotics and Automation (ICRA) , pages 3758–3765, 2017

work page 2018

[48] [48]

Gordon, and J

Stéphane Ross, Geoffrey J. Gordon, and J. Andrew Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. In International Conference on Artiﬁcial Intelligence and Statistics , 2010

work page 2010

[49] [49]

A uniﬁed framework for coordinated multi-arm motion planning

Seyed Sina Mirrazavi Salehian, Nadia Figueroa, and Aude Billard. A uniﬁed framework for coordinated multi-arm motion planning. The International Journal of Robotics Research, 37:1205 – 1232, 2018

work page 2018

[50] [50]

Behavior Transformers: Cloning $k$ modes with one stone, October 2022

Nur Muhammad (Mahi) Shaﬁullah, Zichen Jeff Cui, Ariuntuya Altanzaya, and Lerrel Pinto. Behavior trans- formers: Cloning k modes with one stone. ArXiv, abs/2206.11251, 2022

work page arXiv 2022

[51] [51]

Sgtm 2.0: Autonomously untangling long cables using interactive perception

Kaushik Shivakumar, Vainavi Viswanath, Anrui Gu, Yahav Avigal, Justin Kerr, Jeffrey Ichnowski, Richard Cheng, Thomas Kollar, and Ken Goldberg. Sgtm 2.0: Autonomously untangling long cables using interactive perception. ArXiv, abs/2209.13706, 2022

work page arXiv 2022

[52] [52]

Cliport: What and where pathways for robotic manipulation,

Mohit Shridhar, Lucas Manuelli, and Dieter Fox. Cliport: What and where pathways for robotic manipulation. ArXiv, abs/2109.12098, 2021

work page arXiv 2021

[53] [53]

Perceiver-actor: A multi-task transformer for robotic manipulation

Mohit Shridhar, Lucas Manuelli, and Dieter Fox. Perceiver-actor: A multi-task transformer for robotic manipulation. ArXiv, abs/2209.05451, 2022

work page arXiv 2022

[54] [54]

Robotic telekinesis: Learning a robotic hand imitator by watching humans on youtube

Aravind Sivakumar, Kenneth Shaw, and Deepak Pathak. Robotic telekinesis: Learning a robotic hand imitator by watching humans on youtube. RSS, 2022

work page 2022

[55] [55]

Dimarogonas, and Danica Kragic

Christian Smith, Yiannis Karayiannidis, Lazaros Nal- pantidis, Xavi Gratal, Peng Qi, Dimos V . Dimarogonas, and Danica Kragic. Dual arm manipulation - a survey. Robotics Auton. Syst. , 60:1340–1353, 2012

work page 2012

[56] [56]

Learning structured output representation using deep conditional generative models

Kihyuk Sohn, Honglak Lee, and Xinchen Yan. Learning structured output representation using deep conditional generative models. In NIPS, 2015

work page 2015

[57] [57]

Shadow teleoperation system plays jenga, Mar 2021

srcteam. Shadow teleoperation system plays jenga, Mar 2021. URL https://www.youtube.com/watch?v= 7K9brH27jvM

work page 2021

[58] [58]

How researchers are using shadow robot’s technology, Jun 2022

srcteam. How researchers are using shadow robot’s technology, Jun 2022. URL https://www.youtube.com/ watch?v=p36fYIoTD8M

work page 2022

[59] [59]

Shadow teleoperation system, Jun 2022

srcteam. Shadow teleoperation system, Jun 2022. URL https://www.youtube.com/watch?v=cx8eznfDUJA

work page 2022

[60] [60]

A system for imitation learning of contact-rich bimanual manipulation policies

Simon Stepputtis, Maryam Bandari, Stefan Schaal, and Heni Ben Amor. A system for imitation learning of contact-rich bimanual manipulation policies. 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages 11810–11817, 2022

work page 2022

[61] [61]

Novoseller, Minho Hwang, Michael Laskey, Joseph Gon- zalez, and Ken Goldberg

Priya Sundaresan, Jennifer Grannen, Brijen Thanan- jeyan, Ashwin Balakrishna, Jeffrey Ichnowski, Ellen R. Novoseller, Minho Hwang, Michael Laskey, Joseph Gon- zalez, and Ken Goldberg. Untangling dense non-planar knots by learning manipulation features and recovery policies. ArXiv, abs/2107.08942, 2021

work page arXiv 2021

[62] [62]

Andrew Bagnell, and Zhiwei Steven Wu

Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, and Zhiwei Steven Wu. Causal imitation learning under temporally correlated noise. In International Conference on Machine Learning , 2022

work page 2022

[63] [63]

Deep learning and the information bottleneck principle

Naftali Tishby and Noga Zaslavsky. Deep learning and the information bottleneck principle. 2015 IEEE Information Theory Workshop (ITW), pages 1–5, 2015

work page 2015

[64] [64]

Mujoco: A physics engine for model-based control

Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. 2012 IEEE/RSJ International Conference on Intelligent Robots and Sys- tems, pages 5026–5033, 2012

work page 2012

[65] [65]

Stephen Tu, Alexander Robey, Tingnan Zhang, and N. Matni. On the sample complexity of stability con- strained imitation learning. In Conference on Learning for Dynamics & Control , 2021

work page 2021

[66] [66]

Attention Is All You Need

Ashish Vaswani, Noam M. Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. ArXiv, abs/1706.03762, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[67] [67]

interbotix_ros_manipulators

Solomon Wiznitzer, Luke Schmitt, and Matt Trossen. interbotix_ros_manipulators. URL https://github.com/ Interbotix/interbotix_ros_manipulators

work page

[68] [68]

Fan Xie, A. M. Masum Bulbul Chowdhury, M. Clara De Paolis Kaluza, Linfeng Zhao, Lawson L. S. Wong, and Rose Yu. Deep imitation learning for bimanual robotic manipulation. ArXiv, abs/2010.05134, 2020

work page arXiv 2010

[69] [69]

Florence, Jonathan Tompson, Stefan Welker, Jonathan Chien, Maria Attarian, Travis Armstrong, Ivan Krasin, Dan Duong, Vikas Sindhwani, and Johnny Lee

Andy Zeng, Peter R. Florence, Jonathan Tompson, Stefan Welker, Jonathan Chien, Maria Attarian, Travis Armstrong, Ivan Krasin, Dan Duong, Vikas Sindhwani, and Johnny Lee. Transporter networks: Rearranging the visual world for robotic manipulation. In Conference on Robot Learning, 2020

work page 2020

[70] [70]

Tianhao Zhang, Zoe McCarthy, Owen Jow, Dennis Lee, Ken Goldberg, and P. Abbeel. Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. 2018 IEEE International Conference on Robotics and Automation (ICRA) , pages 1–8, 2017

work page 2018

[71] [71]

Florence, and Chelsea Finn

Allan Zhou, Moo Jin Kim, Lirui Wang, Peter R. Florence, and Chelsea Finn. Nerf in the palm of your hand: Corrective augmentation for robotics via novel-view synthesis. ArXiv, abs/2301.08556, 2023

work page arXiv 2023

[72] [72]

The measure- ment of proprioceptive accuracy: A systematic literature review

Áron Horváth, Eszter Ferentzi, Kristóf Schwartz, Nina Jacobs, Pieter Meyns, and Ferenc Köteles. The measure- ment of proprioceptive accuracy: A systematic literature review. Journal of Sport and Health Science , 2022. ISSN 2095-2546. doi: https://doi.org/10.1016/j.jshs.2022.04

work page doi:10.1016/j.jshs.2022.04 2022

[73] [73]

beer pong

URL https://www.sciencedirect.com/science/article/ pii/S2095254622000473. APPENDIX A. Comparing ALOHA with Prior Teleoperation Setups In Figure 9, we include more teleoperated tasks that ALOHA is capable of. We stress that all objects are taken directly from the real world without any modiﬁcation, to demonstrate ALOHA’s generality in real life settings. A...

work page 1953