pith. sign in

arxiv: 2606.10244 · v1 · pith:3O6PGVLXnew · submitted 2026-06-08 · 💻 cs.RO · cs.AI

YUBI: Yielding Universal Bidigital Interface for Bimanual Dexterous Manipulation at Scale

Pith reviewed 2026-06-27 15:58 UTC · model grok-4.3

classification 💻 cs.RO cs.AI
keywords bimanual dexterous manipulationgripper designdata collection interfacepolicy transferrobotic learningmanipulation datasethuman demonstration
0
0 comments X

The pith

YUBI gripper collects 8434 hours of bimanual data that trains one policy transferable to UR, Franka, and ELEY robots by simple mounting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents YUBI as a finger-aligned gripper that uses yielding actuation to map human finger motion directly to jaw movement for collecting manipulation trajectories. This design replaces bulkier pistol-grip systems and supports VR-tracked data capture at large scale. The resulting dataset covers 119 tasks and demonstrates that a single trained policy can be deployed on different robot bases without retraining when the gripper is mounted on each. The work supplies open hardware, software, and data to support broader collection efforts for dexterous policies.

Core claim

YUBI is a yielding finger-driven gripper that directly maps human finger movements to gripper jaw motion and integrates with VR-based 6-DoF tracking. Using this interface the authors assembled a dataset of 8434 hours across 1.20 million episodes and 119 tasks. Experiments show the gripper improves versatility on complex bimanual tasks, dexterity, and efficiency relative to prior pistol-grip designs. A policy trained on the full YUBI dataset executes successfully on UR, Franka, and ELEY platforms simply by attaching the gripper, confirming that the recorded trajectories serve directly as supervision.

What carries the argument

Yielding finger-driven actuation that maps human finger movements directly to gripper jaw motion while enabling VR-tracked 6-DoF trajectory recording.

If this is right

  • A single policy trained on the YUBI dataset executes on UR, Franka, and ELEY bases after mounting the gripper.
  • YUBI trajectories support higher success rates on complex bimanual tasks than data collected with pistol-grip grippers.
  • The released hardware, software, and 1.20 million episodes provide a reproducible route to large-scale bimanual data collection.
  • Data collection efficiency and dexterity increase when operators use finger-aligned yielding actuation instead of pistol grips.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Robot-specific retraining may become unnecessary for gripper-mounted bimanual policies once large YUBI-style datasets exist.
  • The same interface could support human demonstration collection for tasks beyond the 119 tasks reported.
  • Open release of the integrated stack lowers the barrier for other groups to contribute compatible trajectory data.

Load-bearing premise

Finger-driven yielding actuation produces higher-fidelity and more ergonomic trajectories than pistol-grip designs for fine bimanual tasks.

What would settle it

A policy trained on the YUBI dataset fails to produce successful bimanual behavior when the gripper is mounted on a second robot platform.

read the original abstract

We introduce Yielding Universal Bidigital Interface (YUBI), a finger-aligned gripper designed to enable intuitive, ergonomic, and scalable data collection for bimanual dexterous manipulation. While handheld data collection systems such as Universal Manipulation Interface (UMI) enable affordable data collection, their bulky pistol-grip designs can pose ergonomic and usability challenges for fine-grained, dexterous manipulation tasks. To address this, YUBI presents a distinct design principle: yielding, finger-driven actuation that directly maps human finger movements to gripper jaw motion. Using the YUBI devices, we set up a data collection system with integrated VR-based 6 DoF tracking of the gripper, ensuring high-fidelity trajectory data acquisition. We curate a UMI-based dataset of unprecedented scale: 8,434 hours across 1.20M episodes and 119 tasks. Experiments show that YUBI offers advantages over the UMI gripper in versatility for complex bimanual tasks, dexterity, and operational efficiency. A single policy trained on the YUBI dataset transfers across multiple bimanual robots (UR, Franka, and ELEY) simply by mounting the gripper on each platform, confirming that the collected data are directly executable as policy supervision. We release the gripper hardware, data-collection software, and dataset as one integrated stack, offering the open community a reproducible path to large-scale data acquisition for advancing robotic foundation models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces the Yielding Universal Bidigital Interface (YUBI), a finger-aligned gripper using yielding, finger-driven actuation for ergonomic bimanual data collection, contrasting with pistol-grip designs like UMI. It describes a VR-based 6 DoF tracking setup for high-fidelity trajectories and presents a large dataset of 8,434 hours across 1.20M episodes and 119 tasks. The paper claims advantages over UMI in versatility for complex bimanual tasks, dexterity, and operational efficiency, and asserts that a single policy trained on the YUBI dataset transfers across UR, Franka, and ELEY robots simply by mounting the gripper, confirming the data are directly executable as policy supervision. The gripper hardware, data-collection software, and dataset are released as an integrated open stack.

Significance. If the policy-transfer result and quantitative advantages hold, the work would be significant for enabling scalable, reproducible data collection toward robotic foundation models in dexterous bimanual manipulation. The explicit release of hardware designs, software, and the full dataset is a concrete strength that supports community adoption and verification.

major comments (2)
  1. [Abstract] Abstract: the claim that 'a single policy trained on the YUBI dataset transfers across multiple bimanual robots (UR, Franka, and ELEY) simply by mounting the gripper on each platform' is load-bearing for the assertion that the collected data are 'directly executable as policy supervision,' yet no information is supplied on the policy observation or action representation, whether identical weights were used without modification, or whether any robot-specific calibration or adaptation occurred. This detail is required to evaluate the assumption that gripper mounting alone equalizes the control interface.
  2. [Abstract] Abstract: the assertions of advantages 'in versatility for complex bimanual tasks, dexterity, and operational efficiency' over the UMI gripper form a core part of the contribution, but the abstract supplies no quantitative metrics, baselines, error bars, or experimental protocol to support them. These claims cannot be assessed without the supporting evidence.
minor comments (1)
  1. [Abstract] Abstract: the descriptor 'unprecedented scale' is subjective; a direct numerical comparison to prior bimanual datasets would improve precision.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback focused on the abstract. We agree that the abstract should be more self-contained regarding the policy transfer details and the quantitative support for the claimed advantages. We will revise the abstract accordingly while preserving its length constraints. Point-by-point responses follow.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that 'a single policy trained on the YUBI dataset transfers across multiple bimanual robots (UR, Franka, and ELEY) simply by mounting the gripper on each platform' is load-bearing for the assertion that the collected data are 'directly executable as policy supervision,' yet no information is supplied on the policy observation or action representation, whether identical weights were used without modification, or whether any robot-specific calibration or adaptation occurred. This detail is required to evaluate the assumption that gripper mounting alone equalizes the control interface.

    Authors: We agree the abstract should briefly address these points for clarity. The full manuscript (Section 4.3 and Experiments) specifies that the policy uses a shared observation space of 6-DoF gripper poses plus finger joint angles from the YUBI device and actions as target joint positions; the identical trained weights are deployed on all three robot platforms with no additional calibration or adaptation beyond physical mounting of the gripper. We will revise the abstract to include a concise clause noting the use of identical weights and observation/action representations across platforms. revision: yes

  2. Referee: [Abstract] Abstract: the assertions of advantages 'in versatility for complex bimanual tasks, dexterity, and operational efficiency' over the UMI gripper form a core part of the contribution, but the abstract supplies no quantitative metrics, baselines, error bars, or experimental protocol to support them. These claims cannot be assessed without the supporting evidence.

    Authors: The abstract is a high-level summary; the supporting quantitative results (success rates, task completion times, user-study metrics on dexterity and fatigue, with baselines and error bars) appear in Section 5 and Tables 2–4. We acknowledge that the abstract would benefit from referencing these concrete findings. We will revise the abstract to incorporate one or two key quantitative results (e.g., success-rate deltas and efficiency metrics) while remaining within length limits. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical hardware and data claims with no derivations or fitted predictions

full rationale

The paper describes a new gripper design, data collection setup, and experimental results on task performance and cross-robot transfer. No equations, parameter fitting, or mathematical derivations appear in the abstract or described content. The transfer claim is presented as an empirical observation from training and testing a policy, not as a derived result that reduces to its inputs by construction. All load-bearing statements are direct descriptions of hardware, dataset scale, or observed outcomes rather than self-referential definitions or renamed fits.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

This is an applied hardware and data-collection contribution with no mathematical model; the central claim rests on the effectiveness of the new gripper design and the quality of the collected trajectories.

invented entities (1)
  • YUBI gripper no independent evidence
    purpose: Yielding finger-aligned actuation to enable ergonomic, high-fidelity bimanual data collection
    New device introduced to overcome ergonomic limitations of prior pistol-grip designs.

pith-pipeline@v0.9.1-grok · 5885 in / 1221 out tokens · 26999 ms · 2026-06-27T15:58:26.587481+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references

  1. [1]

    Move the part to the specific area of the center box

    Grasp a part in the left box 2. Move the part to the specific area of the center box

  2. [2]

    Grasp a part in the center box

  3. [3]

    Move the part to the specific area of the right box Pick J/B protector & place it a specific area in the box T op View Left View Right View 0:00 0:10 0:20 0:30 0:40 0:50

  4. [10]

    Pick up a cord on the table and put it into the box

  5. [11]

    Four representative YUBI episodes, each visualized as a timeline of segmented sub-actions with wrist-camera snapshots of the corresponding task progress

    Close the box with the flap tucked inside Untangle braided cords and pack into the box Figure 13 | Action segmentation examples (1/3). Four representative YUBI episodes, each visualized as a timeline of segmented sub-actions with wrist-camera snapshots of the corresponding task progress. 18 YUBI: Yielding Universal Bidigital Interface T op View Left View ...

  6. [12]

    Pick another bolt

    Place the bolt into a deeper hole 3. Pick another bolt

  7. [13]

    Insert the bolt into a hexagonal hole

  8. [14]

    Push it to fit in the hole 7

    Put the grommet to cover the bolt inside the hole. Push it to fit in the hole 7. Pick a shock bush 8. Put the shock bush over the bolt Insert bolts and fit rubber caps T op View Left View Right View 0:00 0:05 0:10 0:15 0:20 0:25 0:30 0:35 0:40

  9. [18]

    Pick a brick of color [A] and place into right box

  10. [19]

    Pick a brick of color [A] and place into right box. 6. Pick a brick of color [B] and place into center box

  11. [20]

    Pick a brick of color [B] and place into center box

  12. [21]

    Pick a brick of color [B] and place into center box. 9. Pick a brick of color [B] and place into center box. 10. Pick a brick of color [B] and place into center box. 11. Pick a brick of color [C] and place into left box. 12. Pick a brick of color [C] and place into left box

  13. [22]

    Pick a brick of color [C] and place into left box. 14. Pick a brick of color [C] and place into left box

  14. [23]

    Sort 3 colored Domino bricks T op View Left View Right View 0:00 0:05 0:10 0:15 0:20

    Pick a brick of color [C] and place into left box. Sort 3 colored Domino bricks T op View Left View Right View 0:00 0:05 0:10 0:15 0:20

  15. [24]

    Pick up the bottom half and fold it in half

  16. [25]

    Unfold it horizontally

    Pick up the right half and fold it in half again 3. Unfold it horizontally

  17. [26]

    Continued from Figure 13

    Unfold it vertically Fold and unfold handkerchief Figure 14 | Action segmentation examples (2/3). Continued from Figure 13. 19 YUBI: Yielding Universal Bidigital Interface T op View Left View Right View 0:00 0:10 0:20 0:30 0:40 0:50

  18. [27]

    Remove the left support

  19. [28]

    Disassemble the left support into two pieces

  20. [29]

    Remove the right support. 4. Disassemble the right support into two pieces

  21. [30]

    Reassemble the left support

  22. [31]

    Reassemble the right support

  23. [32]

    Attach the left support to the bridge

  24. [33]

    Attach the right support to the bridge Disassemble and Reassemble a Bridge-Shaped LEGO Structure T op View Left View Right View 0:00 0:05 0:10 0:15 0:20 0:25 0:30 0:35

  25. [34]

    Pick up one end of the cable

  26. [35]

    Pick up the power bank

  27. [36]

    Place the power bank on the table

    Insert the connector into the power bank 4. Place the power bank on the table

  28. [37]

    Place the phone on the stand

    Pick up the phone 6. Place the phone on the stand

  29. [38]

    Pick up the other end of the cable

  30. [39]

    Insert the connector into the phone Charge the phone with the power bank T op View Left View Right View 0:00 0:05 0:10 0:15 0:20 0:25 0:30 0:35

  31. [40]

    Pick a plate and place it on another plate

  32. [41]

    Pick a plate and place it on two stacked plates

  33. [42]

    Pick a plate and place it on three stacked plates

  34. [43]

    Pick the top plate from four stacked plates and place it on the table

  35. [44]

    Pick the top plate from three stacked plates and place it on the table

  36. [45]

    ball in basket

    Pick the top plate from two stacked plates and place it on the table Stack & unstack plates Figure 15 | Action segmentation examples (3/3). Continued from Figure 14. 20 YUBI: Yielding Universal Bidigital Interface RGB Videos from Cameras Task Instruction Status of YUBI Device 3D Tracking Visualization Figure 16 | Task UI. A rig-mounted laptop displays the...