pith. machine review for the scientific record. sign in

arxiv: 2304.13705 · v1 · submitted 2023-04-23 · 💻 cs.RO · cs.LG

Recognition: 2 theorem links

· Lean Theorem

Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

Chelsea Finn, Sergey Levine, Tony Z. Zhao, Vikash Kumar

Pith reviewed 2026-05-11 04:11 UTC · model grok-4.3

classification 💻 cs.RO cs.LG
keywords imitation learningbimanual manipulationlow-cost hardwareaction chunkingtransformersfine manipulationrobot learning
0
0 comments X

The pith

Action Chunking with Transformers lets low-cost robots learn precise bimanual tasks from ten minutes of demonstrations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that imitation learning can enable low-cost and imprecise robots to perform fine manipulation tasks that normally require expensive hardware, accurate sensors, or careful calibration. It introduces Action Chunking with Transformers (ACT) as a method to learn a generative model over sequences of actions, which helps prevent errors from compounding during execution and handles non-stationary human demonstrations. Using a custom teleoperation interface to collect data, the approach trains a bimanual robot to complete six real-world tasks at 80-90% success rates. This includes opening a translucent condiment cup and slotting a battery, all from roughly ten minutes of demonstrations.

Core claim

The central claim is that a low-cost bimanual robot system performing end-to-end imitation learning with the ACT algorithm, which learns generative models over action sequences from visual observations, can successfully execute difficult fine-grained tasks such as opening a translucent condiment cup and slotting a battery, reaching 80-90% success rates in the real world after training on only ten minutes of demonstrations collected via a custom teleoperation interface.

What carries the argument

Action Chunking with Transformers (ACT), a transformer model that predicts chunks of future actions to enable stable closed-loop control and reduce compounding errors in high-precision imitation learning.

If this is right

  • Precise bimanual manipulation becomes feasible on inexpensive hardware without specialized force sensors or calibration procedures.
  • Imitation learning policies can succeed on long-horizon tasks despite non-stationary human demonstrations when action sequences are modeled generatively.
  • Visual feedback alone suffices for closed-loop control on tasks requiring careful contact forces.
  • Data collection effort drops to short sessions of roughly ten minutes while still yielding high success rates across multiple tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The chunking approach may extend to other robotic control problems that involve predicting extended action sequences.
  • Lowering hardware costs could broaden access to fine manipulation capabilities for non-industrial settings.
  • Combining ACT with additional sensing modalities might further improve reliability on even harder variants of the tasks.

Load-bearing premise

The custom teleoperation interface produces high-quality, consistent demonstrations that capture the necessary precision and force coordination without introducing human-induced biases or noise that the learning algorithm cannot overcome.

What would settle it

Retraining and testing the same tasks with demonstrations collected from a lower-quality or noisier teleoperation interface, then measuring whether success rates fall below 80%, would directly test whether the claim holds.

read the original abstract

Fine manipulation tasks, such as threading cable ties or slotting a battery, are notoriously difficult for robots because they require precision, careful coordination of contact forces, and closed-loop visual feedback. Performing these tasks typically requires high-end robots, accurate sensors, or careful calibration, which can be expensive and difficult to set up. Can learning enable low-cost and imprecise hardware to perform these fine manipulation tasks? We present a low-cost system that performs end-to-end imitation learning directly from real demonstrations, collected with a custom teleoperation interface. Imitation learning, however, presents its own challenges, particularly in high-precision domains: errors in the policy can compound over time, and human demonstrations can be non-stationary. To address these challenges, we develop a simple yet novel algorithm, Action Chunking with Transformers (ACT), which learns a generative model over action sequences. ACT allows the robot to learn 6 difficult tasks in the real world, such as opening a translucent condiment cup and slotting a battery with 80-90% success, with only 10 minutes worth of demonstrations. Project website: https://tonyzhaozh.github.io/aloha/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that a low-cost bimanual robot equipped with a custom teleoperation interface for collecting real-world demonstrations, combined with the novel Action Chunking with Transformers (ACT) algorithm, enables end-to-end imitation learning of fine-grained manipulation tasks. ACT models generative distributions over action chunks to mitigate compounding errors and non-stationary demonstrations, allowing 80-90% success rates on six contact-rich tasks (e.g., opening a translucent condiment cup, slotting a battery) using only 10 minutes of data on imprecise hardware.

Significance. If the empirical results hold after verification of demonstration quality and controls, the work would demonstrate that imitation learning with chunked generative policies can achieve high-precision bimanual performance on inexpensive platforms without specialized sensors or calibration. This has clear implications for accessibility in robotics, providing concrete real-world evidence on tasks that typically demand high-end setups.

major comments (2)
  1. [Abstract] Abstract: The headline result that ACT enables 80-90% success with 10 min of demonstrations rests on the unverified assumption that the custom teleoperation interface supplies high-quality, low-bias demonstrations encoding precise contact forces and closed-loop coordination. No independent metrics (trajectory variance, force profiles, inter-demonstrator consistency) or ablations separating interface quality from policy performance are reported, leaving open the possibility that the interface itself supplies the critical precision rather than the learning algorithm.
  2. [Experiments] Experiments section (inferred from reported success rates): Success rates on the six tasks are presented without baselines, ablations, or statistical tests, as highlighted in the review. This makes it impossible to assess whether the central claim—that ACT on low-cost hardware is responsible for the performance—holds or whether post-hoc tuning or task selection inflates the numbers.
minor comments (1)
  1. [Abstract] The project website link is provided but no supplementary video or code repository is referenced in the abstract; adding these would aid reproducibility.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the thoughtful review and the opportunity to clarify our contributions. We address the two major comments below, committing to revisions where they strengthen the manuscript without misrepresenting our results.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The headline result that ACT enables 80-90% success with 10 min of demonstrations rests on the unverified assumption that the custom teleoperation interface supplies high-quality, low-bias demonstrations encoding precise contact forces and closed-loop coordination. No independent metrics (trajectory variance, force profiles, inter-demonstrator consistency) or ablations separating interface quality from policy performance are reported, leaving open the possibility that the interface itself supplies the critical precision rather than the learning algorithm.

    Authors: The teleoperation interface is an integral component of the proposed low-cost system, as it enables collection of usable demonstrations on imprecise hardware without requiring high-end sensors. We acknowledge that the initial submission lacks explicit quantitative metrics on demonstration quality. We will add analysis of trajectory variance and inter-demonstrator consistency in the revised manuscript. Force profiles cannot be reported because the hardware lacks force sensors; the system relies on visual feedback instead. Full ablations isolating the interface from ACT would require new hardware setups, which we will discuss as a limitation rather than perform within this revision. revision: partial

  2. Referee: [Experiments] Experiments section (inferred from reported success rates): Success rates on the six tasks are presented without baselines, ablations, or statistical tests, as highlighted in the review. This makes it impossible to assess whether the central claim—that ACT on low-cost hardware is responsible for the performance—holds or whether post-hoc tuning or task selection inflates the numbers.

    Authors: We agree that the experiments section requires stronger validation. The manuscript already includes comparisons to standard behavior cloning, but we will expand it with additional baselines (e.g., non-chunked policies), architecture ablations, and statistical analysis including the number of evaluation trials, success-rate confidence intervals, and significance tests. These additions will clarify that the reported performance stems from the combination of the interface and ACT rather than task selection or tuning. revision: yes

standing simulated objections not resolved
  • Direct force profiles cannot be provided because the low-cost hardware does not include force sensors.

Circularity Check

0 steps flagged

No circularity: empirical results from hardware experiments are independent of any fitted inputs or self-referential definitions.

full rationale

The paper introduces the ACT algorithm as a novel generative model over action chunks to mitigate compounding errors in imitation learning, then validates it through real-world bimanual tasks on low-cost hardware using custom teleoperation demonstrations. Success rates (80-90%) are measured outcomes from physical rollouts, not quantities derived by construction from the training data or prior self-citations. No equations, uniqueness theorems, or ansatzes are presented that reduce the central claims to tautological inputs; the derivation chain consists of standard imitation learning setup plus a transformer-based policy whose performance is externally falsifiable via hardware metrics.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that teleoperated demonstrations are sufficiently high-quality and that chunked action prediction mitigates compounding errors in contact-rich tasks; no new physical entities are postulated.

free parameters (1)
  • ACT model hyperparameters
    Standard neural network training parameters fitted during learning; not enumerated in abstract.
axioms (1)
  • domain assumption Imitation learning from a small number of human demonstrations can generalize to new task instances on physical hardware
    Invoked to explain the reported 80-90% success rates across tasks.
invented entities (1)
  • Action Chunking with Transformers (ACT) no independent evidence
    purpose: Generative model over action sequences to address error compounding and non-stationarity in imitation learning
    New method introduced by the paper; no independent evidence outside the reported experiments.

pith-pipeline@v0.9.0 · 5505 in / 1358 out tokens · 43208 ms · 2026-05-11T04:11:19.220360+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Open-H-Embodiment: A Large-Scale Dataset for Enabling Foundation Models in Medical Robotics

    cs.RO 2026-04 conditional novelty 8.0

    Open-H-Embodiment is the largest open multi-embodiment medical robotics dataset, used to train GR00T-H, the first open vision-language-action model that achieves end-to-end suturing completion where prior models fail.

  2. Morphologically Equivariant Flow Matching for Bimanual Mobile Manipulation

    cs.RO 2026-05 conditional novelty 7.0

    A morphologically equivariant flow matching policy for bimanual robots enforces reflective symmetry to improve sample efficiency and enable zero-shot generalization to mirrored task configurations.

  3. Beyond World-Frame Action Heads: Motion-Centric Action Frames for Vision-Language-Action Models

    cs.AI 2026-05 unverdicted novelty 7.0

    MCF-Proto adds a motion-centric local action frame and prototype parameterization to VLA models, inducing emergent geometric structure and improved robustness from standard demonstrations alone.

  4. ALAM: Algebraically Consistent Latent Action Model for Vision-Language-Action Models

    cs.RO 2026-05 unverdicted novelty 7.0

    ALAM creates algebraically consistent latent action transitions from videos to act as auxiliary generative targets, raising robot policy success rates from 47.9% to 85.0% on MetaWorld MT50 and 94.1% to 98.1% on LIBERO.

  5. When to Re-Commit: Temporal Abstraction Discovery for Long-Horizon Vision-Language Reasoning

    cs.AI 2026-05 conditional novelty 7.0

    State-conditioned commitment depth in a vision-language policy Pareto-dominates fixed-depth baselines on Sliding Puzzle and Sokoban, raising solve rates by up to 12.5 points while using 25% fewer actions and beating l...

  6. PhySPRING: Structure-Preserving Reduction of Physics-Informed Twins via GNN

    cs.RO 2026-05 unverdicted novelty 7.0

    PhySPRING uses differentiable GNNs to learn hierarchical coarsened spring-mass topologies and parameters from observations, delivering up to 2.3x speedup on PhysTwin benchmarks and comparable robot policy success rate...

  7. BrickCraft: Visuomotor Skill Composition with Situated Manual Guidance for Long-Horizon Interlocking Brick Assembly

    cs.RO 2026-05 unverdicted novelty 7.0

    BrickCraft composes reusable visuomotor skills via relative anchoring to partial structures and situated visual manuals to achieve long-horizon interlocking brick assembly from limited demonstrations with generalizati...

  8. OA-WAM: Object-Addressable World Action Model for Robust Robot Manipulation

    cs.RO 2026-05 unverdicted novelty 7.0

    OA-WAM uses persistent address vectors and dynamic content vectors in object slots to enable addressable world-action prediction, improving robustness on manipulation benchmarks under scene changes.

  9. Shared Autonomy Assisted by Impedance-Driven Anisotropic Guidance Field

    cs.RO 2026-05 unverdicted novelty 7.0

    IAGF-SA adds a physically-grounded channel to shared autonomy by modulating robot impedance to convey intent, improving task performance, agreement, and user experience in three scenarios per user studies.

  10. OmniRobotHome: A Multi-Camera Platform for Real-Time Multiadic Human-Robot Interaction

    cs.RO 2026-04 unverdicted novelty 7.0

    A 48-camera residential platform delivers real-time occlusion-robust 3D perception and coordinated actuation for multi-human multi-robot interaction in a shared home workspace.

  11. Atomic-Probe Governance for Skill Updates in Compositional Robot Policies

    cs.RO 2026-04 unverdicted novelty 7.0

    A cross-version swap protocol reveals dominant skills that swing composition success by up to 50 percentage points, and an atomic probe with selective revalidation governs updates at lower cost than always re-testing ...

  12. 3D Generation for Embodied AI and Robotic Simulation: A Survey

    cs.RO 2026-04 accept novelty 7.0

    3D generation for embodied AI is shifting from visual realism toward interaction readiness, organized into data generation, simulation environments, and sim-to-real bridging roles.

  13. DiscreteRTC: Discrete Diffusion Policies are Natural Asynchronous Executors

    cs.RO 2026-04 unverdicted novelty 7.0

    Discrete diffusion policies support native asynchronous execution via unmasking for real-time chunking, delivering higher success rates and 0.7x inference cost versus flow-matching RTC on dynamic robotics benchmarks a...

  14. Characterizing Vision-Language-Action Models across XPUs: Constraints and Acceleration for On-Robot Deployment

    cs.RO 2026-04 unverdicted novelty 7.0

    VLA models exhibit a compute-bound VLM phase followed by a memory-bound action phase on edge hardware; DP-Cache and V-AEFusion reduce redundancy and enable pipeline parallelism for up to 6x speedup on NPUs with margin...

  15. VistaBot: View-Robust Robot Manipulation via Spatiotemporal-Aware View Synthesis

    cs.RO 2026-04 unverdicted novelty 7.0

    VistaBot integrates 4D geometry estimation and spatiotemporal view synthesis into action policies to improve cross-view generalization by 2.6-2.8x on a new VGS metric in simulation and real tasks.

  16. FingerEye: Continuous and Unified Vision-Tactile Sensing for Dexterous Manipulation

    cs.RO 2026-04 unverdicted novelty 7.0

    FingerEye delivers continuous vision-tactile sensing via binocular RGB cameras and marker-tracked compliant ring deformation, supporting imitation learning policies that generalize across object variations for tasks l...

  17. BiCoord: A Bimanual Manipulation Benchmark towards Long-Horizon Spatial-Temporal Coordination

    cs.RO 2026-04 conditional novelty 7.0

    BiCoord is a new benchmark for long-horizon tightly coordinated bimanual manipulation that includes quantitative metrics and shows existing policies like DP, RDT, Pi0 and OpenVLA-OFT struggle on such tasks.

  18. Referring-Aware Visuomotor Policy Learning for Closed-Loop Manipulation

    cs.RO 2026-04 unverdicted novelty 7.0

    ReV is a referring-aware visuomotor policy using coupled diffusion heads for real-time trajectory replanning in robotic manipulation, trained solely via targeted perturbations to expert demonstrations and achieving hi...

  19. GuidedVLA: Specifying Task-Relevant Factors via Plug-and-Play Action Attention Specialization

    cs.RO 2026-05 unverdicted novelty 6.0

    GuidedVLA improves VLA success rates by manually supervising separate attention heads in the action decoder with auxiliary signals for task-relevant factors.

  20. Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models

    cs.RO 2026-05 unverdicted novelty 6.0

    Pace-and-Path Correction is a closed-form inference-time operator that decomposes a quadratic cost minimization into orthogonal pace compression and path offset channels to correct dynamics-blindness in chunked-action...

  21. Retrieve-then-Steer: Online Success Memory for Test-Time Adaptation of Generative VLAs

    cs.RO 2026-05 unverdicted novelty 6.0

    Retrieve-then-steer stores successful observation-action segments in memory, retrieves relevant chunks, filters them, and uses an elite prior with confidence-adaptive guidance to steer a flow-matching action sampler f...

  22. Retrieve-then-Steer: Online Success Memory for Test-Time Adaptation of Generative VLAs

    cs.RO 2026-05 unverdicted novelty 6.0

    A retrieve-then-steer method stores successful robot actions in memory and uses them to steer a frozen VLA's flow-matching sampler for better test-time reliability without parameter updates.

  23. One Token Per Frame: Reconsidering Visual Bandwidth in World Models for VLA Policy

    cs.CV 2026-05 unverdicted novelty 6.0

    Reducing visual input to one token per frame in world models for vision-language-action policies maintains long-horizon performance while improving success rates on MetaWorld, LIBERO, and real-robot tasks.

  24. One Token Per Frame: Reconsidering Visual Bandwidth in World Models for VLA Policy

    cs.CV 2026-05 unverdicted novelty 6.0

    Reducing visual input to one token per frame via adaptive attention pooling and a unified flow-matching objective improves long-horizon performance in VLA policies on MetaWorld, LIBERO, and real-robot tasks.

  25. When to Trust Imagination: Adaptive Action Execution for World Action Models

    cs.RO 2026-05 unverdicted novelty 6.0

    Future Forward Dynamics Causal Attention (FFDC) enables World Action Models to adaptively choose action chunk lengths based on prediction-observation consistency, cutting model inferences by 69% and improving real-wor...

  26. When to Trust Imagination: Adaptive Action Execution for World Action Models

    cs.RO 2026-05 unverdicted novelty 6.0

    A verifier called Future Forward Dynamics Causal Attention enables adaptive action execution in World Action Models, reducing model inferences by 69% and improving success rates in robotic tasks.

  27. DexSynRefine: Synthesizing and Refining Human-Object Interaction Motion for Physically Feasible Dexterous Robot Actions

    cs.RO 2026-05 unverdicted novelty 6.0

    DexSynRefine synthesizes HOI motions with an extended manifold method, refines them via task-space residual RL, and adapts for sim-to-real transfer, outperforming kinematic retargeting by 50-70 percentage points on fi...

  28. Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities

    cs.AI 2026-05 unverdicted novelty 6.0

    LQL turns n-step action-sequence lower bounds into a practical hinge-loss stabilizer for off-policy Q-learning without extra networks or forward passes.

  29. Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities

    cs.AI 2026-05 unverdicted novelty 6.0

    LQL stabilizes Q-learning by penalizing violations of n-step action-sequence lower bounds with a hinge loss computed from standard network outputs.

  30. Adaptive Q-Chunking for Offline-to-Online Reinforcement Learning

    cs.LG 2026-05 unverdicted novelty 6.0

    Adaptive Q-Chunking selects optimal action chunk sizes at each state via normalized advantage comparisons to outperform fixed chunk sizes in offline-to-online RL on robot benchmarks.

  31. ConsisVLA-4D: Advancing Spatiotemporal Consistency in Efficient 3D-Perception and 4D-Reasoning for Robotic Manipulation

    cs.RO 2026-05 unverdicted novelty 6.0

    ConsisVLA-4D adds cross-view semantic alignment, cross-object geometric fusion, and cross-scene dynamic reasoning to VLA models, delivering 21.6% and 41.5% gains plus 2.3x and 2.4x speedups on LIBERO and real-world tasks.

  32. From Pixels to Tokens: A Systematic Study of Latent Action Supervision for Vision-Language-Action Models

    cs.RO 2026-05 unverdicted novelty 6.0

    A unified comparison of latent action supervision strategies for VLA models reveals task-specific benefits, with image-based approaches aiding reasoning and generalization, action-based aiding motor control, and discr...

  33. Bridging the Embodiment Gap: Disentangled Cross-Embodiment Video Editing

    cs.RO 2026-05 unverdicted novelty 6.0

    A dual-contrastive disentanglement method factorizes videos into independent task and embodiment latents, then uses a parameter-efficient adapter on a frozen video diffusion model to synthesize robot executions from s...

  34. BifrostUMI: Bridging Robot-Free Demonstrations and Humanoid Whole-Body Manipulation

    cs.RO 2026-05 unverdicted novelty 6.0

    BifrostUMI enables robot-free human demonstration capture via VR and wrist cameras to train visuomotor policies that predict keypoint trajectories for transfer to humanoid whole-body control through retargeting.

  35. OGPO: Sample Efficient Full-Finetuning of Generative Control Policies

    cs.LG 2026-05 unverdicted novelty 6.0

    OGPO is a sample-efficient off-policy method for full finetuning of generative control policies that reaches SOTA on robotic manipulation tasks and can recover from poor behavior-cloning initializations without expert data.

  36. Seeing Realism from Simulation: Efficient Video Transfer for Vision-Language-Action Data Augmentation

    cs.CV 2026-05 unverdicted novelty 6.0

    A video transfer pipeline augments simulated VLA data into realistic videos while preserving actions, yielding consistent performance gains on robot benchmarks such as 8% on Robotwin 2.0.

  37. An Efficient Metric for Data Quality Measurement in Imitation Learning

    cs.RO 2026-05 unverdicted novelty 6.0

    Power spectral density of trajectories ranks demonstration quality for imitation learning, enabling rollout-free curation that improves fine-tuned policy success.

  38. TAIL-Safe: Task-Agnostic Safety Monitoring for Imitation Learning Policies

    cs.RO 2026-05 unverdicted novelty 6.0

    TAIL-Safe learns a Lipschitz Q-function from digital-twin failure data to identify an empirical control-invariant safe set for imitation learning policies and applies gradient-based recovery to keep actions inside it.

  39. Lucid-XR: An Extended-Reality Data Engine for Robotic Manipulation

    cs.RO 2026-04 unverdicted novelty 6.0

    Lucid-XR uses XR-headset physics simulation and physics-guided video generation to create synthetic data that trains robot policies transferring zero-shot to unseen real-world manipulation tasks.

  40. Atomic-Probe Governance for Skill Updates in Compositional Robot Policies

    cs.RO 2026-04 unverdicted novelty 6.0

    Empirical study on robosuite tasks reveals a dominant-skill effect in compositions and shows that an atomic probe approximates full revalidation for skill updates at much lower cost.

  41. AsyncShield: A Plug-and-Play Edge Adapter for Asynchronous Cloud-based VLA Navigation

    cs.RO 2026-04 unverdicted novelty 6.0

    AsyncShield restores VLA geometric intent from latency via kinematic pose mapping and uses PPO-Lagrangian to balance tracking with LiDAR safety constraints in a plug-and-play module.

  42. RL Token: Bootstrapping Online RL with Vision-Language-Action Models

    cs.LG 2026-04 unverdicted novelty 6.0

    RL Token enables sample-efficient online RL fine-tuning of large VLAs, delivering up to 3x speed gains and higher success rates on real-robot manipulation tasks within minutes to hours.

  43. Learning from the Best: Smoothness-Driven Metrics for Data Quality in Imitation Learning

    cs.RO 2026-04 unverdicted novelty 6.0

    RINSE scores robot demonstration trajectories for smoothness via SAL and TED metrics to curate higher-quality data for behavioral cloning, improving success rates with less data on benchmarks and real robots.

  44. GazeVLA: Learning Human Intention for Robotic Manipulation

    cs.RO 2026-04 unverdicted novelty 6.0

    GazeVLA pretrains on large human egocentric datasets to capture gaze-based intention, then finetunes on limited robot data with chain-of-thought reasoning to achieve better robotic manipulation performance than baselines.

  45. LeHome: A Simulation Environment for Deformable Object Manipulation in Household Scenarios

    cs.RO 2026-04 unverdicted novelty 6.0

    LeHome is a simulation platform offering high-fidelity dynamics for robotic manipulation of varied deformable objects in household settings, with support for multiple robot embodiments including low-cost hardware.

  46. Wiggle and Go! System Identification for Zero-Shot Dynamic Rope Manipulation

    cs.RO 2026-04 unverdicted novelty 6.0

    Wiggle and Go! uses system identification from rope motion observations to predict parameters that enable zero-shot goal-conditioned dynamic manipulation, achieving 3.55 cm accuracy on 3D target striking versus 15.34 ...

  47. FingerViP: Learning Real-World Dexterous Manipulation with Fingertip Visual Perception

    cs.RO 2026-04 conditional novelty 6.0

    FingerViP equips each finger with a miniature camera and trains a multi-view diffusion policy that achieves 80.8% success on real-world dexterous tasks previously limited by wrist-camera occlusion.

  48. SpaceDex: Generalizable Dexterous Grasping in Tiered Workspaces

    cs.RO 2026-04 unverdicted novelty 6.0

    SpaceDex achieves 63% success grasping unseen objects in tiered workspaces via VLM spatial planning and arm-hand feature separation, beating a 39% tabletop baseline in 100 real trials.

  49. AnchorRefine: Synergy-Manipulation Based on Trajectory Anchor and Residual Refinement for Vision-Language-Action Models

    cs.RO 2026-04 unverdicted novelty 6.0

    AnchorRefine factorizes VLA action generation into a trajectory anchor for coarse planning and residual refinement for local corrections, improving success rates by up to 7.8% in simulation and 18% on real robots acro...

  50. From Seeing to Simulating: Generative High-Fidelity Simulation with Digital Cousins for Generalizable Robot Learning and Evaluation

    cs.RO 2026-04 unverdicted novelty 6.0

    Digital Cousins is a generative real-to-sim method that creates diverse high-fidelity simulation scenes from real panoramas to improve generalization in robot learning and evaluation.

  51. Long-Term Memory for VLA-based Agents in Open-World Task Execution

    cs.RO 2026-04 unverdicted novelty 6.0

    ChemBot adds dual-layer memory and future-state asynchronous inference to VLA models, enabling better long-horizon success in chemical lab automation on collaborative robots.

  52. UMI-3D: Extending Universal Manipulation Interface from Vision-Limited to 3D Spatial Perception

    cs.RO 2026-04 unverdicted novelty 6.0

    UMI-3D integrates LiDAR into the UMI hardware for robust multimodal 3D perception in manipulation demonstrations, yielding higher policy success rates and enabling previously infeasible tasks like deformable object handling.

  53. Robotic Manipulation is Vision-to-Geometry Mapping ($f(v) \rightarrow G$): Vision-Geometry Backbones over Language and Video Models

    cs.RO 2026-04 unverdicted novelty 6.0

    Vision-geometry backbones using pretrained 3D world models outperform vision-language and video models for robotic manipulation by enabling direct mapping from visual input to geometric actions.

  54. WARPED: Wrist-Aligned Rendering for Robot Policy Learning from Egocentric Human Demonstrations

    cs.RO 2026-04 unverdicted novelty 6.0

    WARPED synthesizes realistic wrist-view observations from monocular egocentric human videos via foundation models, hand-object tracking, retargeting, and Gaussian Splatting to train visuomotor policies that match tele...

  55. AnySlot: Goal-Conditioned Vision-Language-Action Policies for Zero-Shot Slot-Level Placement

    cs.RO 2026-04 unverdicted novelty 6.0

    AnySlot decouples language grounding from low-level control by inserting an explicit visual goal image, yielding better zero-shot performance on precise slot placement tasks than flat VLA policies.

  56. MoRI: Mixture of RL and IL Experts for Long-Horizon Manipulation Tasks

    cs.RO 2026-04 unverdicted novelty 6.0

    MoRI dynamically mixes RL and IL experts with variance-based switching and IL regularization to reach 97.5% success in four real-world robotic tasks while cutting human intervention by 85.8%.

  57. ActiveGlasses: Learning Manipulation with Active Vision from Ego-centric Human Demonstration

    cs.RO 2026-04 unverdicted novelty 6.0

    ActiveGlasses learns robot manipulation from ego-centric human demos captured with active vision via smart glasses, achieving zero-shot transfer using object-centric point-cloud policies.

  58. Multi-View Video Diffusion Policy: A 3D Spatio-Temporal-Aware Video Action Model

    cs.RO 2026-04 conditional novelty 6.0

    MV-VDP jointly predicts multi-view RGB and heatmap videos via diffusion to achieve data-efficient, robust robotic manipulation policies.

  59. Open-Loop Planning, Closed-Loop Verification: Speculative Verification for VLA

    cs.RO 2026-04 unverdicted novelty 6.0

    SV-VLA uses infrequent heavy VLA planning of action chunks plus a lightweight closed-loop verifier to achieve both efficiency and robustness in dynamic robot control.

  60. Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning

    cs.AI 2026-01 conditional novelty 6.0

    Single-stage fine-tuning of a video model to generate actions as latent frames plus future states and values yields state-of-the-art robot policy performance on LIBERO, RoboCasa, and bimanual tasks.

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages · cited by 83 Pith papers · 5 internal anchors

  1. [1]

    URL https://www

    Viperx 300 robot arm 6dof. URL https://www. trossenrobotics.com/viperx-300-robot-arm-6dof.aspx

  2. [2]

    URL https://www

    Widowx 250 robot arm 6dof. URL https://www. trossenrobotics.com/widowx-250-robot-arm-6dof.aspx

  3. [3]

    URL https://www.youtube.com/watch?v= TearcKVj0iY

    Highly dexterous manipulation system - capabilities - part 1, Nov 2014. URL https://www.youtube.com/watch?v= TearcKVj0iY

  4. [4]

    URL https://www

    Assembly performance metrics and test methods, Apr 2022. URL https://www. nist.gov/el/intelligent-systems-division-73500/ robotic-grasping-and-manipulation-assembly/assembly

  5. [5]

    Teleoperated robots - shadow teleoperation system, Nov

  6. [6]

    URL https://www.shadowrobot.com/teleoperation/

  7. [7]

    Holo-dex: Teaching dexterity with immersive mixed reality,

    Sridhar Pandian Arunachalam, Irmak Güzey, Soumith Chintala, and Lerrel Pinto. Holo-dex: Teaching dex- terity with immersive mixed reality. arXiv preprint arXiv:2210.06463, 2022

  8. [8]

    RT-1: Robotics Transformer for Real-World Control at Scale

    Anthony Brohan, Noah Brown, Justice Carbajal, Yev- gen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alexander Herzog, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Tomas Jackson, Sally Jesmonth, Nikhil J. Joshi, Ryan C. Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Kuang- Huei Lee, Sergey Levine, Yao Lu, U...

  9. [9]

    Xinlei Chen, Hao Fang, Tsung-yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollar, and C Lawrence Zitnick

    Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nico- las Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. ArXiv, abs/2005.12872, 2020

  10. [10]

    Towards human-level bimanual dexterous manipulation with rein- forcement learning

    Yuanpei Chen, Yaodong Yang, Tianhao Wu, Shengjie Wang, Xidong Feng, Jiechuan Jiang, Stephen McAleer, Hao Dong, Zongqing Lu, and Song-Chun Zhu. Towards human-level bimanual dexterous manipulation with rein- forcement learning. ArXiv, abs/2206.08686, 2022

  11. [11]

    Efficient bimanual manipulation using learned task schemas

    Rohan Chitnis, Shubham Tulsiani, Saurabh Gupta, and Abhinav Kumar Gupta. Efficient bimanual manipulation using learned task schemas. 2020 IEEE International Conference on Robotics and Automation (ICRA) , pages 1149–1155, 2019

  12. [12]

    Transformers for one-shot visual imitation

    Sudeep Dasari and Abhinav Kumar Gupta. Transformers for one-shot visual imitation. In Conference on Robot Learning, 2020

  13. [13]

    Causal confusion in imitation learning

    Pim de Haan, Dinesh Jayaraman, and Sergey Levine. Causal confusion in imitation learning. In Neural Information Processing Systems , 2019

  14. [14]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirec- tional transformers for language understanding. ArXiv, abs/1810.04805, 2019

  15. [15]

    One-shot imitation learn- ing,

    Yan Duan, Marcin Andrychowicz, Bradly C. Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever, P. Abbeel, and Wojciech Zaremba. One-shot imitation learning. ArXiv, abs/1703.07326, 2017

  16. [16]

    Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets

    Frederik Ebert, Yanlai Yang, Karl Schmeckpeper, Bernadette Bucher, Georgios Georgakis, Kostas Daniilidis, Chelsea Finn, and Sergey Levine. Bridge data: Boosting generalization of robotic skills with cross-domain datasets. ArXiv, abs/2109.13396, 2021

  17. [17]

    Florence, Lucas Manuelli, and Russ Tedrake

    Peter R. Florence, Lucas Manuelli, and Russ Tedrake. Self- supervised correspondence in visuomotor policy learning. IEEE Robotics and Automation Letters , 5:492–499, 2019

  18. [18]

    Will Dabney, Mark Rowland, Marc G Bellemare, and R ´emi Munos

    Peter R. Florence, Corey Lynch, Andy Zeng, Oscar Ramirez, Ayzaan Wahid, Laura Downs, Adrian S. Wong, Johnny Lee, Igor Mordatch, and Jonathan Tompson. Implicit behavioral cloning. ArXiv, abs/2109.00137, 2021

  19. [19]

    Learning dense visual correspondences in simulation to smooth and fold real fabrics

    Aditya Ganapathi, Priya Sundaresan, Brijen Thananjeyan, Ashwin Balakrishna, Daniel Seita, Jennifer Grannen, Minho Hwang, Ryan Hoque, Joseph Gonzalez, Nawid Jamali, Katsu Yamane, Soshi Iba, and Ken Goldberg. Learning dense visual correspondences in simulation to smooth and fold real fabrics. 2021 IEEE International Conference on Robotics and Automation (IC...

  20. [20]

    Untangling dense knots by learning task-relevant keypoints

    Jennifer Grannen, Priya Sundaresan, Brijen Thananjeyan, Jeffrey Ichnowski, Ashwin Balakrishna, Minho Hwang, Vainavi Viswanath, Michael Laskey, Joseph Gonzalez, and Ken Goldberg. Untangling dense knots by learning task-relevant keypoints. In Conference on Robot Learning, 2020

  21. [21]

    Flingbot: The unreasonable effectiveness of dynamic manipulation for cloth unfolding

    Huy Ha and Shuran Song. Flingbot: The unreasonable effectiveness of dynamic manipulation for cloth unfolding. ArXiv, abs/2105.03655, 2021

  22. [22]

    Ratliff, and Dieter Fox

    Ankur Handa, Karl Van Wyk, Wei Yang, Jacky Liang, Yu- Wei Chao, Qian Wan, Stan Birchfield, Nathan D. Ratliff, and Dieter Fox. Dexpilot: Vision-based teleoperation of dexterous robotic hand-arm system. 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 9164–9170, 2019

  23. [23]

    Zhang, Shaoqing Ren, and Jian Sun

    Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2015

  24. [24]

    Burgess, Xavier Glorot, Matthew M

    Irina Higgins, Loïc Matthey, Arka Pal, Christopher P. Burgess, Xavier Glorot, Matthew M. Botvinick, Shakir Mohamed, and Alexander Lerchner. beta-vae: Learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations, 2016

  25. [25]

    Novoseller, Albert Wilcox, Daniel S

    Ryan Hoque, Ashwin Balakrishna, Ellen R. Novoseller, Albert Wilcox, Daniel S. Brown, and Ken Goldberg. Thriftydagger: Budget-aware novelty and risk gating for interactive imitation learning. In Conference on Robot Learning, 2021

  26. [26]

    Stephen James, Michael Bloesch, and Andrew J. Davison. Task-embedded control networks for few-shot imitation learning. ArXiv, abs/1810.03237, 2018

  27. [27]

    Bc-z: Zero-shot task generalization with robotic imitation learning

    Eric Jang, Alex Irpan, Mohi Khansari, Daniel Kappler, Frederik Ebert, Corey Lynch, Sergey Levine, and Chelsea Finn. Bc-z: Zero-shot task generalization with robotic imitation learning. In Conference on Robot Learning , 2022

  28. [28]

    Master–slave manipulators and remote maintenance at the oak ridge national labora- tory, Jan 1975

    R G Jenness and C D Wicker. Master–slave manipulators and remote maintenance at the oak ridge national labora- tory, Jan 1975. URL https://www.osti.gov/biblio/4179544

  29. [29]

    Coarse-to-fine imitation learning: Robot manipulation from a single demonstration

    Edward Johns. Coarse-to-fine imitation learning: Robot manipulation from a single demonstration. 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 4613–4619, 2021

  30. [30]

    Grasping with chopsticks: Combating covariate shift in model-free imitation learning for fine manipulation

    Liyiming Ke, Jingqiang Wang, Tapomayukh Bhattachar- jee, Byron Boots, and Siddhartha Srinivasa. Grasping with chopsticks: Combating covariate shift in model-free imitation learning for fine manipulation. In International Conference on Robotics and Automation (ICRA) , 2021

  31. [31]

    Driggs-Campbell, and Mykel J

    Michael Kelly, Chelsea Sidrane, K. Driggs-Campbell, and Mykel J. Kochenderfer. Hg-dagger: Interactive imitation learning with human experts. 2019 International Conference on Robotics and Automation (ICRA) , pages 8077–8083, 2018

  32. [32]

    Gaze-based dual resolution deep imitation learning for high-precision dexterous robot manipulation

    Heecheol Kim, Yoshiyuki Ohmura, and Yasuo Kuniyoshi. Gaze-based dual resolution deep imitation learning for high-precision dexterous robot manipulation. IEEE Robotics and Automation Letters , 6:1630–1637, 2021

  33. [33]

    Robot peels banana with goal-conditioned dual-action deep imitation learning

    Heecheol Kim, Yoshiyuki Ohmura, and Yasuo Kuniyoshi. Robot peels banana with goal-conditioned dual-action deep imitation learning. ArXiv, abs/2203.09749, 2022

  34. [34]

    Auto-Encoding Variational Bayes

    Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. CoRR, abs/1312.6114, 2013

  35. [35]

    Towards learning hierarchical skills for multi-phase manipulation tasks

    Oliver Kroemer, Christian Daniel, Gerhard Neumann, Herke van Hoof, and Jan Peters. Towards learning hierarchical skills for multi-phase manipulation tasks. 2015 IEEE International Conference on Robotics and Automation (ICRA), pages 1503–1510, 2015

  36. [36]

    Action chunking as policy compression, Sep 2022

    Lucy Lai, Ann Z Huang, and Samuel J Gershman. Action chunking as policy compression, Sep 2022. URL psyarxiv. com/z8yrv

  37. [37]

    Dragan, and Ken Goldberg

    Michael Laskey, Jonathan Lee, Roy Fox, Anca D. Dragan, and Ken Goldberg. Dart: Noise injection for robust imitation learning. In Conference on Robot Learning , 2017

  38. [38]

    Lee, Henry Lu, Abhishek Gupta, Sergey Levine, and P

    Alex X. Lee, Henry Lu, Abhishek Gupta, Sergey Levine, and P. Abbeel. Learning force-based manipulation of deformable objects from multiple demonstrations. 2015 IEEE International Conference on Robotics and Automation (ICRA), pages 177–184, 2015

  39. [39]

    Optimal control for biological movement systems

    Weiwei Li. Optimal control for biological movement systems. 2006

  40. [40]

    Wong, Soroush Nasiriany, Chen Wang, Rohun Kulkarni, Li Fei-Fei, Silvio Savarese, Yuke Zhu, and Roberto Mart’in-Mart’in

    Ajay Mandlekar, Danfei Xu, J. Wong, Soroush Nasiriany, Chen Wang, Rohun Kulkarni, Li Fei-Fei, Silvio Savarese, Yuke Zhu, and Roberto Mart’in-Mart’in. What matters in learning from offline human demonstrations for robot manipulation. In Conference on Robot Learning , 2021

  41. [41]

    Driggs-Campbell, and Mykel J

    Kunal Menda, K. Driggs-Campbell, and Mykel J. Kochen- derfer. Ensembledagger: A bayesian approach to safe imitation learning. 2019 IEEE/RSJ International Confer- ence on Intelligent Robots and Systems (IROS) , pages 5041–5048, 2018

  42. [42]

    Intermittent visual servoing: Efficiently learning policies robust to instrument changes for high-precision surgical manipula- tion

    Samuel Paradis, Minho Hwang, Brijen Thananjeyan, Jeffrey Ichnowski, Daniel Seita, Danyal Fer, Thomas Low, Joseph Gonzalez, and Ken Goldberg. Intermittent visual servoing: Efficiently learning policies robust to instrument changes for high-precision surgical manipula- tion. 2021 IEEE International Conference on Robotics and Automation (ICRA) , pages 7166–7173, 2020

  43. [43]

    The surprising effectiveness of representation learning for visual imitation

    Jyothish Pari, Nur Muhammad, Sridhar Pandian Arunacha- lam, and Lerrel Pinto. The surprising effectiveness of representation learning for visual imitation. arXiv preprint arXiv:2112.01511, 2021

  44. [44]

    Learning and generalization of motor skills by learning from demonstration

    Peter Pastor, Heiko Hoffmann, Tamim Asfour, and Stefan Schaal. Learning and generalization of motor skills by learning from demonstration. 2009 IEEE International Conference on Robotics and Automation , pages 763–768, 2009

  45. [45]

    Pomerleau

    Dean A. Pomerleau. Alvinn: An autonomous land vehicle in a neural network. In NIPS, 1988

  46. [46]

    From one hand to multiple hands: Imitation learning for dexterous manipulation from single-camera teleoperation

    Yuzhe Qin, Hao Su, and Xiaolong Wang. From one hand to multiple hands: Imitation learning for dexterous manipulation from single-camera teleoperation. IEEE Robotics and Automation Letters , 7:10873–10881, 2022

  47. [47]

    Vision-based multi-task manipulation for inexpensive robots using end-to-end learning from demonstration

    Rouhollah Rahmatizadeh, Pooya Abolghasemi, Ladislau Bölöni, and Sergey Levine. Vision-based multi-task manipulation for inexpensive robots using end-to-end learning from demonstration. 2018 IEEE International Conference on Robotics and Automation (ICRA) , pages 3758–3765, 2017

  48. [48]

    Gordon, and J

    Stéphane Ross, Geoffrey J. Gordon, and J. Andrew Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. In International Conference on Artificial Intelligence and Statistics , 2010

  49. [49]

    A unified framework for coordinated multi-arm motion planning

    Seyed Sina Mirrazavi Salehian, Nadia Figueroa, and Aude Billard. A unified framework for coordinated multi-arm motion planning. The International Journal of Robotics Research, 37:1205 – 1232, 2018

  50. [50]

    Behavior trans- formers: Cloning k modes with one stone, 2022

    Nur Muhammad (Mahi) Shafiullah, Zichen Jeff Cui, Ariuntuya Altanzaya, and Lerrel Pinto. Behavior trans- formers: Cloning k modes with one stone. ArXiv, abs/2206.11251, 2022

  51. [51]

    Sgtm 2.0: Autonomously untangling long cables using interactive perception

    Kaushik Shivakumar, Vainavi Viswanath, Anrui Gu, Yahav Avigal, Justin Kerr, Jeffrey Ichnowski, Richard Cheng, Thomas Kollar, and Ken Goldberg. Sgtm 2.0: Autonomously untangling long cables using interactive perception. ArXiv, abs/2209.13706, 2022

  52. [52]

    Cliport: What and where pathways for robotic manipulation

    Mohit Shridhar, Lucas Manuelli, and Dieter Fox. Cliport: What and where pathways for robotic manipulation. ArXiv, abs/2109.12098, 2021

  53. [53]

    Perceiver-Actor: A multi-task transformer for robotic manipulation

    Mohit Shridhar, Lucas Manuelli, and Dieter Fox. Perceiver-actor: A multi-task transformer for robotic manipulation. ArXiv, abs/2209.05451, 2022

  54. [54]

    Robotic telekinesis: Learning a robotic hand imitator by watching humans on youtube

    Aravind Sivakumar, Kenneth Shaw, and Deepak Pathak. Robotic telekinesis: Learning a robotic hand imitator by watching humans on youtube. RSS, 2022

  55. [55]

    Dimarogonas, and Danica Kragic

    Christian Smith, Yiannis Karayiannidis, Lazaros Nal- pantidis, Xavi Gratal, Peng Qi, Dimos V . Dimarogonas, and Danica Kragic. Dual arm manipulation - a survey. Robotics Auton. Syst. , 60:1340–1353, 2012

  56. [56]

    Learning structured output representation using deep conditional generative models

    Kihyuk Sohn, Honglak Lee, and Xinchen Yan. Learning structured output representation using deep conditional generative models. In NIPS, 2015

  57. [57]

    Shadow teleoperation system plays jenga, Mar 2021

    srcteam. Shadow teleoperation system plays jenga, Mar 2021. URL https://www.youtube.com/watch?v= 7K9brH27jvM

  58. [58]

    How researchers are using shadow robot’s technology, Jun 2022

    srcteam. How researchers are using shadow robot’s technology, Jun 2022. URL https://www.youtube.com/ watch?v=p36fYIoTD8M

  59. [59]

    Shadow teleoperation system, Jun 2022

    srcteam. Shadow teleoperation system, Jun 2022. URL https://www.youtube.com/watch?v=cx8eznfDUJA

  60. [60]

    A system for imitation learning of contact-rich bimanual manipulation policies

    Simon Stepputtis, Maryam Bandari, Stefan Schaal, and Heni Ben Amor. A system for imitation learning of contact-rich bimanual manipulation policies. 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages 11810–11817, 2022

  61. [61]

    Novoseller, Minho Hwang, Michael Laskey, Joseph Gon- zalez, and Ken Goldberg

    Priya Sundaresan, Jennifer Grannen, Brijen Thanan- jeyan, Ashwin Balakrishna, Jeffrey Ichnowski, Ellen R. Novoseller, Minho Hwang, Michael Laskey, Joseph Gon- zalez, and Ken Goldberg. Untangling dense non-planar knots by learning manipulation features and recovery policies. ArXiv, abs/2107.08942, 2021

  62. [62]

    Andrew Bagnell, and Zhiwei Steven Wu

    Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, and Zhiwei Steven Wu. Causal imitation learning under temporally correlated noise. In International Conference on Machine Learning , 2022

  63. [63]

    Deep learning and the information bottleneck principle

    Naftali Tishby and Noga Zaslavsky. Deep learning and the information bottleneck principle. 2015 IEEE Information Theory Workshop (ITW), pages 1–5, 2015

  64. [64]

    Mujoco: A physics engine for model-based control

    Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. 2012 IEEE/RSJ International Conference on Intelligent Robots and Sys- tems, pages 5026–5033, 2012

  65. [65]

    Stephen Tu, Alexander Robey, Tingnan Zhang, and N. Matni. On the sample complexity of stability con- strained imitation learning. In Conference on Learning for Dynamics & Control , 2021

  66. [66]

    Attention Is All You Need

    Ashish Vaswani, Noam M. Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. ArXiv, abs/1706.03762, 2017

  67. [67]

    interbotix_ros_manipulators

    Solomon Wiznitzer, Luke Schmitt, and Matt Trossen. interbotix_ros_manipulators. URL https://github.com/ Interbotix/interbotix_ros_manipulators

  68. [68]

    Fan Xie, A. M. Masum Bulbul Chowdhury, M. Clara De Paolis Kaluza, Linfeng Zhao, Lawson L. S. Wong, and Rose Yu. Deep imitation learning for bimanual robotic manipulation. ArXiv, abs/2010.05134, 2020

  69. [69]

    Florence, Jonathan Tompson, Stefan Welker, Jonathan Chien, Maria Attarian, Travis Armstrong, Ivan Krasin, Dan Duong, Vikas Sindhwani, and Johnny Lee

    Andy Zeng, Peter R. Florence, Jonathan Tompson, Stefan Welker, Jonathan Chien, Maria Attarian, Travis Armstrong, Ivan Krasin, Dan Duong, Vikas Sindhwani, and Johnny Lee. Transporter networks: Rearranging the visual world for robotic manipulation. In Conference on Robot Learning, 2020

  70. [70]

    Tianhao Zhang, Zoe McCarthy, Owen Jow, Dennis Lee, Ken Goldberg, and P. Abbeel. Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. 2018 IEEE International Conference on Robotics and Automation (ICRA) , pages 1–8, 2017

  71. [71]

    Florence, and Chelsea Finn

    Allan Zhou, Moo Jin Kim, Lirui Wang, Peter R. Florence, and Chelsea Finn. Nerf in the palm of your hand: Corrective augmentation for robotics via novel-view synthesis. ArXiv, abs/2301.08556, 2023

  72. [72]

    The measure- ment of proprioceptive accuracy: A systematic literature review

    Áron Horváth, Eszter Ferentzi, Kristóf Schwartz, Nina Jacobs, Pieter Meyns, and Ferenc Köteles. The measure- ment of proprioceptive accuracy: A systematic literature review. Journal of Sport and Health Science , 2022. ISSN 2095-2546. doi: https://doi.org/10.1016/j.jshs.2022.04

  73. [73]

    beer pong

    URL https://www.sciencedirect.com/science/article/ pii/S2095254622000473. APPENDIX A. Comparing ALOHA with Prior Teleoperation Setups In Figure 9, we include more teleoperated tasks that ALOHA is capable of. We stress that all objects are taken directly from the real world without any modification, to demonstrate ALOHA’s generality in real life settings. A...