YUBI: Yielding Universal Bidigital Interface for Bimanual Dexterous Manipulation at Scale
Pith reviewed 2026-06-27 15:58 UTC · model grok-4.3
The pith
YUBI gripper collects 8434 hours of bimanual data that trains one policy transferable to UR, Franka, and ELEY robots by simple mounting.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
YUBI is a yielding finger-driven gripper that directly maps human finger movements to gripper jaw motion and integrates with VR-based 6-DoF tracking. Using this interface the authors assembled a dataset of 8434 hours across 1.20 million episodes and 119 tasks. Experiments show the gripper improves versatility on complex bimanual tasks, dexterity, and efficiency relative to prior pistol-grip designs. A policy trained on the full YUBI dataset executes successfully on UR, Franka, and ELEY platforms simply by attaching the gripper, confirming that the recorded trajectories serve directly as supervision.
What carries the argument
Yielding finger-driven actuation that maps human finger movements directly to gripper jaw motion while enabling VR-tracked 6-DoF trajectory recording.
If this is right
- A single policy trained on the YUBI dataset executes on UR, Franka, and ELEY bases after mounting the gripper.
- YUBI trajectories support higher success rates on complex bimanual tasks than data collected with pistol-grip grippers.
- The released hardware, software, and 1.20 million episodes provide a reproducible route to large-scale bimanual data collection.
- Data collection efficiency and dexterity increase when operators use finger-aligned yielding actuation instead of pistol grips.
Where Pith is reading between the lines
- Robot-specific retraining may become unnecessary for gripper-mounted bimanual policies once large YUBI-style datasets exist.
- The same interface could support human demonstration collection for tasks beyond the 119 tasks reported.
- Open release of the integrated stack lowers the barrier for other groups to contribute compatible trajectory data.
Load-bearing premise
Finger-driven yielding actuation produces higher-fidelity and more ergonomic trajectories than pistol-grip designs for fine bimanual tasks.
What would settle it
A policy trained on the YUBI dataset fails to produce successful bimanual behavior when the gripper is mounted on a second robot platform.
read the original abstract
We introduce Yielding Universal Bidigital Interface (YUBI), a finger-aligned gripper designed to enable intuitive, ergonomic, and scalable data collection for bimanual dexterous manipulation. While handheld data collection systems such as Universal Manipulation Interface (UMI) enable affordable data collection, their bulky pistol-grip designs can pose ergonomic and usability challenges for fine-grained, dexterous manipulation tasks. To address this, YUBI presents a distinct design principle: yielding, finger-driven actuation that directly maps human finger movements to gripper jaw motion. Using the YUBI devices, we set up a data collection system with integrated VR-based 6 DoF tracking of the gripper, ensuring high-fidelity trajectory data acquisition. We curate a UMI-based dataset of unprecedented scale: 8,434 hours across 1.20M episodes and 119 tasks. Experiments show that YUBI offers advantages over the UMI gripper in versatility for complex bimanual tasks, dexterity, and operational efficiency. A single policy trained on the YUBI dataset transfers across multiple bimanual robots (UR, Franka, and ELEY) simply by mounting the gripper on each platform, confirming that the collected data are directly executable as policy supervision. We release the gripper hardware, data-collection software, and dataset as one integrated stack, offering the open community a reproducible path to large-scale data acquisition for advancing robotic foundation models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the Yielding Universal Bidigital Interface (YUBI), a finger-aligned gripper using yielding, finger-driven actuation for ergonomic bimanual data collection, contrasting with pistol-grip designs like UMI. It describes a VR-based 6 DoF tracking setup for high-fidelity trajectories and presents a large dataset of 8,434 hours across 1.20M episodes and 119 tasks. The paper claims advantages over UMI in versatility for complex bimanual tasks, dexterity, and operational efficiency, and asserts that a single policy trained on the YUBI dataset transfers across UR, Franka, and ELEY robots simply by mounting the gripper, confirming the data are directly executable as policy supervision. The gripper hardware, data-collection software, and dataset are released as an integrated open stack.
Significance. If the policy-transfer result and quantitative advantages hold, the work would be significant for enabling scalable, reproducible data collection toward robotic foundation models in dexterous bimanual manipulation. The explicit release of hardware designs, software, and the full dataset is a concrete strength that supports community adoption and verification.
major comments (2)
- [Abstract] Abstract: the claim that 'a single policy trained on the YUBI dataset transfers across multiple bimanual robots (UR, Franka, and ELEY) simply by mounting the gripper on each platform' is load-bearing for the assertion that the collected data are 'directly executable as policy supervision,' yet no information is supplied on the policy observation or action representation, whether identical weights were used without modification, or whether any robot-specific calibration or adaptation occurred. This detail is required to evaluate the assumption that gripper mounting alone equalizes the control interface.
- [Abstract] Abstract: the assertions of advantages 'in versatility for complex bimanual tasks, dexterity, and operational efficiency' over the UMI gripper form a core part of the contribution, but the abstract supplies no quantitative metrics, baselines, error bars, or experimental protocol to support them. These claims cannot be assessed without the supporting evidence.
minor comments (1)
- [Abstract] Abstract: the descriptor 'unprecedented scale' is subjective; a direct numerical comparison to prior bimanual datasets would improve precision.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive feedback focused on the abstract. We agree that the abstract should be more self-contained regarding the policy transfer details and the quantitative support for the claimed advantages. We will revise the abstract accordingly while preserving its length constraints. Point-by-point responses follow.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that 'a single policy trained on the YUBI dataset transfers across multiple bimanual robots (UR, Franka, and ELEY) simply by mounting the gripper on each platform' is load-bearing for the assertion that the collected data are 'directly executable as policy supervision,' yet no information is supplied on the policy observation or action representation, whether identical weights were used without modification, or whether any robot-specific calibration or adaptation occurred. This detail is required to evaluate the assumption that gripper mounting alone equalizes the control interface.
Authors: We agree the abstract should briefly address these points for clarity. The full manuscript (Section 4.3 and Experiments) specifies that the policy uses a shared observation space of 6-DoF gripper poses plus finger joint angles from the YUBI device and actions as target joint positions; the identical trained weights are deployed on all three robot platforms with no additional calibration or adaptation beyond physical mounting of the gripper. We will revise the abstract to include a concise clause noting the use of identical weights and observation/action representations across platforms. revision: yes
-
Referee: [Abstract] Abstract: the assertions of advantages 'in versatility for complex bimanual tasks, dexterity, and operational efficiency' over the UMI gripper form a core part of the contribution, but the abstract supplies no quantitative metrics, baselines, error bars, or experimental protocol to support them. These claims cannot be assessed without the supporting evidence.
Authors: The abstract is a high-level summary; the supporting quantitative results (success rates, task completion times, user-study metrics on dexterity and fatigue, with baselines and error bars) appear in Section 5 and Tables 2–4. We acknowledge that the abstract would benefit from referencing these concrete findings. We will revise the abstract to incorporate one or two key quantitative results (e.g., success-rate deltas and efficiency metrics) while remaining within length limits. revision: yes
Circularity Check
No circularity: empirical hardware and data claims with no derivations or fitted predictions
full rationale
The paper describes a new gripper design, data collection setup, and experimental results on task performance and cross-robot transfer. No equations, parameter fitting, or mathematical derivations appear in the abstract or described content. The transfer claim is presented as an empirical observation from training and testing a policy, not as a derived result that reduces to its inputs by construction. All load-bearing statements are direct descriptions of hardware, dataset scale, or observed outcomes rather than self-referential definitions or renamed fits.
Axiom & Free-Parameter Ledger
invented entities (1)
-
YUBI gripper
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Move the part to the specific area of the center box
Grasp a part in the left box 2. Move the part to the specific area of the center box
-
[2]
Grasp a part in the center box
-
[3]
Move the part to the specific area of the right box Pick J/B protector & place it a specific area in the box T op View Left View Right View 0:00 0:10 0:20 0:30 0:40 0:50
-
[10]
Pick up a cord on the table and put it into the box
-
[11]
Four representative YUBI episodes, each visualized as a timeline of segmented sub-actions with wrist-camera snapshots of the corresponding task progress
Close the box with the flap tucked inside Untangle braided cords and pack into the box Figure 13 | Action segmentation examples (1/3). Four representative YUBI episodes, each visualized as a timeline of segmented sub-actions with wrist-camera snapshots of the corresponding task progress. 18 YUBI: Yielding Universal Bidigital Interface T op View Left View ...
-
[12]
Pick another bolt
Place the bolt into a deeper hole 3. Pick another bolt
-
[13]
Insert the bolt into a hexagonal hole
-
[14]
Push it to fit in the hole 7
Put the grommet to cover the bolt inside the hole. Push it to fit in the hole 7. Pick a shock bush 8. Put the shock bush over the bolt Insert bolts and fit rubber caps T op View Left View Right View 0:00 0:05 0:10 0:15 0:20 0:25 0:30 0:35 0:40
-
[18]
Pick a brick of color [A] and place into right box
-
[19]
Pick a brick of color [A] and place into right box. 6. Pick a brick of color [B] and place into center box
-
[20]
Pick a brick of color [B] and place into center box
-
[21]
Pick a brick of color [B] and place into center box. 9. Pick a brick of color [B] and place into center box. 10. Pick a brick of color [B] and place into center box. 11. Pick a brick of color [C] and place into left box. 12. Pick a brick of color [C] and place into left box
-
[22]
Pick a brick of color [C] and place into left box. 14. Pick a brick of color [C] and place into left box
-
[23]
Sort 3 colored Domino bricks T op View Left View Right View 0:00 0:05 0:10 0:15 0:20
Pick a brick of color [C] and place into left box. Sort 3 colored Domino bricks T op View Left View Right View 0:00 0:05 0:10 0:15 0:20
-
[24]
Pick up the bottom half and fold it in half
-
[25]
Unfold it horizontally
Pick up the right half and fold it in half again 3. Unfold it horizontally
-
[26]
Continued from Figure 13
Unfold it vertically Fold and unfold handkerchief Figure 14 | Action segmentation examples (2/3). Continued from Figure 13. 19 YUBI: Yielding Universal Bidigital Interface T op View Left View Right View 0:00 0:10 0:20 0:30 0:40 0:50
-
[27]
Remove the left support
-
[28]
Disassemble the left support into two pieces
-
[29]
Remove the right support. 4. Disassemble the right support into two pieces
-
[30]
Reassemble the left support
-
[31]
Reassemble the right support
-
[32]
Attach the left support to the bridge
-
[33]
Attach the right support to the bridge Disassemble and Reassemble a Bridge-Shaped LEGO Structure T op View Left View Right View 0:00 0:05 0:10 0:15 0:20 0:25 0:30 0:35
-
[34]
Pick up one end of the cable
-
[35]
Pick up the power bank
-
[36]
Place the power bank on the table
Insert the connector into the power bank 4. Place the power bank on the table
-
[37]
Place the phone on the stand
Pick up the phone 6. Place the phone on the stand
-
[38]
Pick up the other end of the cable
-
[39]
Insert the connector into the phone Charge the phone with the power bank T op View Left View Right View 0:00 0:05 0:10 0:15 0:20 0:25 0:30 0:35
-
[40]
Pick a plate and place it on another plate
-
[41]
Pick a plate and place it on two stacked plates
-
[42]
Pick a plate and place it on three stacked plates
-
[43]
Pick the top plate from four stacked plates and place it on the table
-
[44]
Pick the top plate from three stacked plates and place it on the table
-
[45]
ball in basket
Pick the top plate from two stacked plates and place it on the table Stack & unstack plates Figure 15 | Action segmentation examples (3/3). Continued from Figure 14. 20 YUBI: Yielding Universal Bidigital Interface RGB Videos from Cameras Task Instruction Status of YUBI Device 3D Tracking Visualization Figure 16 | Task UI. A rig-mounted laptop displays the...
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.