Learning Controlled Separation of Small Objects Between Two Fingers with a Tactile Skin

Berthold B\"auml; Ulf Kasolowsky

arxiv: 2605.31486 · v1 · pith:AO2CSIYOnew · submitted 2026-05-29 · 💻 cs.RO

Learning Controlled Separation of Small Objects Between Two Fingers with a Tactile Skin

Ulf Kasolowsky , Berthold B\"auml This is my paper

Pith reviewed 2026-06-28 22:15 UTC · model grok-4.3

classification 💻 cs.RO

keywords tactile sensingrobotic manipulationreinforcement learningsim-to-real transferobject separationfingertip sensormulti-fingered handpellet handling

0 comments

The pith

A robotic hand separates small objects to a precise count between two fingers using only tactile skin feedback without vision.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the novel task of controlled separation of 6mm pellets can be solved purely with tactile sensing from a spatially-resolved skin on one fingertip of a multi-fingered robotic hand. A reinforcement learning policy is trained in simulation with a sparse reward that simply checks whether the desired number of objects remains between the fingers after grasping from a box. Exhaustive simulation experiments show that an ideal high-resolution tactile sensor nearly solves the task perfectly while a 4x4 taxel sensor still yields up to 20 percent improvement over joint-position sensing alone; an auxiliary estimator is trained to predict contact positions. The policy transfers successfully to the physical DLR-Hand II equipped with tactile skin.

Core claim

The central claim is that a reinforcement learning policy trained in simulation using only spatially-resolved tactile feedback from a fingertip sensor enables controlled dropping of small objects until exactly the desired number remains between the fingers, and that this policy transfers directly to the real DLR-Hand II without additional fine-tuning.

What carries the argument

Spatially-resolved tactile skin on the fingertip that supplies contact-position data to a reinforcement-learning policy trained with a sparse count-based reward.

If this is right

An ideal high-resolution tactile sensor solves the separation task almost perfectly.
A 4x4 taxel sensor improves success by up to 20 percent over joint sensors alone.
Training an estimator alongside the policy allows prediction of ground-truth contact positions from tactile readings.
The policy achieves successful sim-to-real transfer on the DLR-Hand II without real-world fine-tuning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Tactile-only control of this kind could extend to other fine-manipulation tasks such as sorting or assembly of small parts where vision is occluded.
Varying pellet size or friction in simulation could reveal the minimum sensor resolution needed for reliable performance.
Combining the tactile policy with simple force thresholds might increase robustness when objects have varying weights.

Load-bearing premise

The physics simulation matches real-world contact dynamics, friction, and tactile sensor responses closely enough for direct transfer of the learned policy to hardware.

What would settle it

Running the transferred policy on the real DLR-Hand II and measuring whether it consistently leaves exactly the target number of pellets between the fingers across repeated trials.

Figures

Figures reproduced from arXiv: 2605.31486 by Berthold B\"auml, Ulf Kasolowsky.

**Figure 1.** Figure 1: Left: Grasping procedure. The two fingers are in an open configuration and ”dive” into the box with the pellets. After closing the fingers, the box is removed and multiple pellets remain between the fingertips. Right: Manipulation procedure. The policy moves the fingers so that pellets drop until only the desired number, in this case one (Pd = 1), remains between the fingers (see also the accompanying vide… view at source ↗

**Figure 2.** Figure 2: Top: Hardware setup. We use the DLR-Hand II together with a tactile skin. Onto the tactile skin we add a small layer of rubber (visible in blue on the left) for better friction. Bottom: Tactile sensor simulation. The red circles indicate the pellets on the fingertip. On the left, the simulation of the real tactile sensor with 4 × 4 taxels is shown. The shown heatmap visualizes the normalized simulated pres… view at source ↗

**Figure 3.** Figure 3: Top: Grasping procedure in simulation. Pellets get spawned between the opened fingers in a random initial configuration and are held in place via a constraint. The fingers then close and the constraint is removed. Only pellets that are stably grasped remain between the fingertips, the other ones drop due to gravity. Bottom: Three samples of initial pellet configurations. The distribution approximates how p… view at source ↗

**Figure 4.** Figure 4: In the following, the individual aspects are explained [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 4.** Figure 4: Control architecture. On the left side, the controlled system is depicted, which can either be the real robot or the simulation. It includes the hand [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Results of learning in simulation. Success rate [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 7.** Figure 7: Example run on the real system for Pd = 3 with tactile feedback. On the left, a sequence of the manipulation is shown over time. On the right, the corresponding tactile images are visualized (black: no activation, white: high activation). Also see the accompanying video to get the best impression [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: Results of the sim-to-real transfer of the tactile policy. Success [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

read the original abstract

We introduce and solve the novel task of controlled separation of small objects with two fingers of a multi-purpose robotic hand: after grasping into a box of small objects, the task is to drop as many of them until a desired number remains between the fingers. The objects are small compared to the width of the fingers but also in absolute terms. In our case little pellets with a diameter of only 6mm are handled. We show that the task can be performed purely tactile (no vision) using a spatially-resolved tactile skin on a fingertip. The separation policy is trained in simulation via reinforcement learning using a straightforward sparse reward, which basically checks if the desired number of objects is reached. In simulation experiments, we provide an exhaustive analysis of the benefits of using spatially-resolved tactile feedback: while an ideal (high-resolution) tactile sensor allows solving the task almost perfectly, a sensor with lower spatial resolution (here 4x4 taxels) still leads to an improvement of up to 20% compared to using only the fingers' joint sensors. For this analysis, we further train an estimator alongside the policy that predicts the ground truth contact positions. Finally, we demonstrate the successful sim-to-real transfer for the DLR-Hand II equipped with a tactile skin.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

New task of tactile-only pellet separation shows resolution benefits in sim and a real transfer, but the sim-to-real modeling gets little visible support.

read the letter

The paper's main contribution is defining and solving a controlled separation task: grasp a pile of 6 mm pellets with two fingers, then release until a target count remains between them, using only tactile feedback. They train a policy via RL in simulation with a sparse reward and run an exhaustive breakdown of sensor resolution. A 4x4 taxel skin improves success by up to 20% over joint sensing alone, while ideal high-resolution sensing gets close to perfect. They also train a contact-position estimator alongside the policy. The real-robot result on the DLR-Hand II is presented as successful zero-shot transfer.

The sim analysis is the strongest part. It gives a direct, quantitative comparison of what spatial tactile information adds versus proprioception, which is useful for anyone working on tactile dexterous manipulation. The estimator is a reasonable addition that lets them inspect what the policy is using.

The soft spot is the sim-to-real claim. The abstract states successful transfer but gives no numbers on how well the simulated taxel responses or contact dynamics match hardware, no domain-randomization details, and no ablation showing what happens when those are deliberately mismatched. If the full paper has raw sensor comparisons or failure-mode breakdowns, that would shore it up; otherwise the transfer result rests on an untested modeling assumption.

This is for robotics groups focused on tactile sensing and in-hand manipulation. The sim results are solid enough to be worth citing in that niche. It deserves peer review because the task is new and the resolution study is concrete, even if the transfer section will likely need more evidence.

Referee Report

1 major / 2 minor

Summary. The paper introduces the task of controlled separation of 6mm pellets grasped between two fingers of a multi-fingered hand, where the goal is to release objects until a target number remains. The approach uses reinforcement learning in simulation with a sparse reward based solely on the final object count, relying on spatially-resolved tactile skin (ideal or 4x4 taxels) without vision. It analyzes performance gains from tactile feedback (up to 20% over joint sensors), trains a contact-position estimator alongside the policy, and reports successful zero-shot sim-to-real transfer on the DLR-Hand II hardware.

Significance. If the sim-to-real modeling assumptions hold, the work would demonstrate a practical advance in tactile-only dexterous manipulation for small objects where vision is unreliable due to occlusion or scale. The systematic comparison of tactile resolutions and the accompanying estimator provide useful insights into sensor requirements. Credit is due for the hardware validation on the DLR-Hand II and the use of a simple sparse reward that still enables learning.

major comments (1)

[Abstract and real-robot experiments section] The central claim of successful zero-shot sim-to-real transfer (Abstract and real-robot experiments section) rests on the unvalidated assumption that the simulator's contact dynamics, friction, and 4x4 tactile responses match hardware for 6mm pellets. No quantitative sim-vs-real comparison of raw taxel values, no domain randomization details for sensor noise or pellet properties, and no ablation showing transfer failure under mismatched parameters are provided; this directly undermines assessment of whether the policy exploits simulation-specific cues.

minor comments (2)

The abstract states quantitative gains (up to 20% with 4x4 tactile) but omits error bars, trial counts, and failure-mode analysis; these details are needed to interpret the simulation results.
Notation for the tactile sensor resolutions and the estimator architecture could be clarified with an explicit diagram or table relating taxel count to input dimensionality.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and for highlighting the need for stronger validation of the sim-to-real transfer. We address the single major comment point-by-point below.

read point-by-point responses

Referee: [Abstract and real-robot experiments section] The central claim of successful zero-shot sim-to-real transfer (Abstract and real-robot experiments section) rests on the unvalidated assumption that the simulator's contact dynamics, friction, and 4x4 tactile responses match hardware for 6mm pellets. No quantitative sim-vs-real comparison of raw taxel values, no domain randomization details for sensor noise or pellet properties, and no ablation showing transfer failure under mismatched parameters are provided; this directly undermines assessment of whether the policy exploits simulation-specific cues.

Authors: We agree that the current manuscript lacks quantitative sim-vs-real comparisons of raw taxel values, explicit domain randomization details, and ablations on mismatched parameters. The zero-shot hardware success on the DLR-Hand II is offered as empirical evidence that the simulation model was adequate for the task, but we acknowledge this does not fully address the referee's concern about potential simulation-specific cues. In revision we will add a dedicated subsection to the real-robot experiments section that includes (1) side-by-side quantitative plots of 4x4 taxel activation patterns for matched contact scenarios in simulation and on hardware, (2) a description of how contact dynamics and friction parameters were calibrated from hardware measurements, and (3) clarification that no domain randomization was applied because the simulator was tuned to the specific hardware and pellet properties. An ablation on deliberately mismatched parameters was not performed within the scope of this work; we will note this limitation explicitly. These additions will allow readers to better evaluate the transfer. revision: yes

Circularity Check

0 steps flagged

No circularity in RL training or sim-to-real claims

full rationale

The paper's core chain consists of RL policy training in simulation (sparse reward on object count) plus an auxiliary contact estimator, followed by zero-shot transfer to DLR-Hand II hardware. No equations define a quantity in terms of itself, no fitted parameters are relabeled as predictions, and no self-citations or imported uniqueness theorems carry the central result. The reported outcomes (simulation success rates and hardware transfer) are generated by independent training runs and physical experiments rather than by algebraic or definitional reduction to the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no identifiable free parameters, axioms, or invented entities; sim-to-real transfer implicitly assumes unstated modeling fidelity.

pith-pipeline@v0.9.1-grok · 5757 in / 1030 out tokens · 21645 ms · 2026-06-28T22:15:50.964669+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 1 canonical work pages · 1 internal anchor

[1]

Learning to pick by digging: Data-driven dig-grasping for bin picking from clutter,

C. Zhao, Z. Tong, J. Rojas, and J. Seo, “Learning to pick by digging: Data-driven dig-grasping for bin picking from clutter,” inIEEE Int. Conf. on Robotics and Automation, 2022

2022
[2]

Vision-sensorless bin-picking system using compliant fingers with proximity sensors,

M. Ohara, K. Koyama, and K. Harada, “Vision-sensorless bin-picking system using compliant fingers with proximity sensors,” inIEEE/SICE Int. Symposium on System Integration, 2025

2025
[3]

In-hand singulation and scooping manipulation with a 5 dof tactile gripper,

Y . Zhou, P. Zhou, S. Wang, and Y . She, “In-hand singulation and scooping manipulation with a 5 dof tactile gripper,” inIEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2024

2024
[4]

In-hand singulation, scooping, and cable untangling with a 5- dof tactile-reactive gripper,

——, “In-hand singulation, scooping, and cable untangling with a 5- dof tactile-reactive gripper,”Advanced Robotics Research, 2025

2025
[5]

Inter-finger small object manipulation with densetact optical tactile sensor,

W. K. Do, B. Aumann, C. Chungyoun, and M. Kennedy, “Inter-finger small object manipulation with densetact optical tactile sensor,”IEEE Robotics and Automation Letters, vol. 9, no. 1, 2024

2024
[6]

Blind bin picking of small screws through in-finger manipulation with compliant robotic fingers,

M. Ishigeet al., “Blind bin picking of small screws through in-finger manipulation with compliant robotic fingers,” inIEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2020

2020
[7]

Dlr-hand ii: Next generation of a dextrous robot hand,

J. Butterfaß, M. Grebenstein, H. Liu, and G. Hirzinger, “Dlr-hand ii: Next generation of a dextrous robot hand,” inIEEE Int. Conf. on Robotics and Automation, 2001

2001
[8]

Agile justin: An upgraded member of dlr’s family of lightweight and torque controlled humanoids,

B. B ¨aumlet al., “Agile justin: An upgraded member of dlr’s family of lightweight and torque controlled humanoids,” inIEEE Int. Conf. on Robotics and Automation, 2014

2014
[9]

Composing dextrous grasping and in-hand manipu- lation via scoring with a reinforcement learning critic,

L. R ¨ostelet al., “Composing dextrous grasping and in-hand manipu- lation via scoring with a reinforcement learning critic,” inIEEE Int. Conf. on Robotics and Automation, 2025

2025
[10]

Fine manipulation using a tactile skin: Learning in simulation and sim-to-real transfer,

U. Kasolowsky and B. B ¨auml, “Fine manipulation using a tactile skin: Learning in simulation and sim-to-real transfer,” inIEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2024

2024
[11]

Mujoco: A physics engine for model-based control,

E. Todorov, T. Erez, and Y . Tassa, “Mujoco: A physics engine for model-based control,” inIEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2012

2012
[12]

Learning purely tactile in-hand manipulation with a torque-controlled hand,

L. Sievers, J. Pitz, and B. B ¨auml, “Learning purely tactile in-hand manipulation with a torque-controlled hand,” inIEEE Int. Conf. on Robotics and Automation, 2022

2022
[13]

Proximal Policy Optimization Algorithms

J. Schulmanet al., “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[14]

Openai gym,

G. Brockmanet al., “Openai gym,”arXiv preprint arXiv, 2016

2016
[15]

Stable-baselines3: Reliable reinforcement learning implementations,

A. Raffinet al., “Stable-baselines3: Reliable reinforcement learning implementations,”The Journal of Machine Learning Research, vol. 22, no. 1, 2021

2021
[16]

Solving rubik’s cube with a robot hand,

I. Akkayaet al., “Solving rubik’s cube with a robot hand,”arXiv preprint, 2019

2019
[17]

Learning dexterous in-hand manipulation,

M. Andrychowiczet al., “Learning dexterous in-hand manipulation,” The International Journal of Robotics Research, vol. 39, no. 1, 2020

2020

[1] [1]

Learning to pick by digging: Data-driven dig-grasping for bin picking from clutter,

C. Zhao, Z. Tong, J. Rojas, and J. Seo, “Learning to pick by digging: Data-driven dig-grasping for bin picking from clutter,” inIEEE Int. Conf. on Robotics and Automation, 2022

2022

[2] [2]

Vision-sensorless bin-picking system using compliant fingers with proximity sensors,

M. Ohara, K. Koyama, and K. Harada, “Vision-sensorless bin-picking system using compliant fingers with proximity sensors,” inIEEE/SICE Int. Symposium on System Integration, 2025

2025

[3] [3]

In-hand singulation and scooping manipulation with a 5 dof tactile gripper,

Y . Zhou, P. Zhou, S. Wang, and Y . She, “In-hand singulation and scooping manipulation with a 5 dof tactile gripper,” inIEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2024

2024

[4] [4]

In-hand singulation, scooping, and cable untangling with a 5- dof tactile-reactive gripper,

——, “In-hand singulation, scooping, and cable untangling with a 5- dof tactile-reactive gripper,”Advanced Robotics Research, 2025

2025

[5] [5]

Inter-finger small object manipulation with densetact optical tactile sensor,

W. K. Do, B. Aumann, C. Chungyoun, and M. Kennedy, “Inter-finger small object manipulation with densetact optical tactile sensor,”IEEE Robotics and Automation Letters, vol. 9, no. 1, 2024

2024

[6] [6]

Blind bin picking of small screws through in-finger manipulation with compliant robotic fingers,

M. Ishigeet al., “Blind bin picking of small screws through in-finger manipulation with compliant robotic fingers,” inIEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2020

2020

[7] [7]

Dlr-hand ii: Next generation of a dextrous robot hand,

J. Butterfaß, M. Grebenstein, H. Liu, and G. Hirzinger, “Dlr-hand ii: Next generation of a dextrous robot hand,” inIEEE Int. Conf. on Robotics and Automation, 2001

2001

[8] [8]

Agile justin: An upgraded member of dlr’s family of lightweight and torque controlled humanoids,

B. B ¨aumlet al., “Agile justin: An upgraded member of dlr’s family of lightweight and torque controlled humanoids,” inIEEE Int. Conf. on Robotics and Automation, 2014

2014

[9] [9]

Composing dextrous grasping and in-hand manipu- lation via scoring with a reinforcement learning critic,

L. R ¨ostelet al., “Composing dextrous grasping and in-hand manipu- lation via scoring with a reinforcement learning critic,” inIEEE Int. Conf. on Robotics and Automation, 2025

2025

[10] [10]

Fine manipulation using a tactile skin: Learning in simulation and sim-to-real transfer,

U. Kasolowsky and B. B ¨auml, “Fine manipulation using a tactile skin: Learning in simulation and sim-to-real transfer,” inIEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2024

2024

[11] [11]

Mujoco: A physics engine for model-based control,

E. Todorov, T. Erez, and Y . Tassa, “Mujoco: A physics engine for model-based control,” inIEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2012

2012

[12] [12]

Learning purely tactile in-hand manipulation with a torque-controlled hand,

L. Sievers, J. Pitz, and B. B ¨auml, “Learning purely tactile in-hand manipulation with a torque-controlled hand,” inIEEE Int. Conf. on Robotics and Automation, 2022

2022

[13] [13]

Proximal Policy Optimization Algorithms

J. Schulmanet al., “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[14] [14]

Openai gym,

G. Brockmanet al., “Openai gym,”arXiv preprint arXiv, 2016

2016

[15] [15]

Stable-baselines3: Reliable reinforcement learning implementations,

A. Raffinet al., “Stable-baselines3: Reliable reinforcement learning implementations,”The Journal of Machine Learning Research, vol. 22, no. 1, 2021

2021

[16] [16]

Solving rubik’s cube with a robot hand,

I. Akkayaet al., “Solving rubik’s cube with a robot hand,”arXiv preprint, 2019

2019

[17] [17]

Learning dexterous in-hand manipulation,

M. Andrychowiczet al., “Learning dexterous in-hand manipulation,” The International Journal of Robotics Research, vol. 39, no. 1, 2020

2020