pith. machine review for the scientific record. sign in

arxiv: 2604.08528 · v1 · submitted 2026-04-09 · 💻 cs.RO

Recognition: unknown

A-SLIP: Acoustic Sensing for Continuous In-hand Slip Estimation

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:26 UTC · model grok-4.3

classification 💻 cs.RO
keywords acoustic sensingslip estimationin-hand manipulationrobotic grippertactile sensingmulti-channel audioconvolutional networkvibration capture
0
0 comments X

The pith

Multi-channel acoustic sensing estimates continuous in-hand slip direction to 14.1 degrees accuracy

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that placing multiple piezoelectric microphones behind a textured silicone pad on a parallel-jaw gripper captures contact-induced vibrations from slip events. These synchronized audio channels are converted to log-mel spectrograms and processed by a lightweight convolutional network to jointly predict slip presence, direction in the grasp plane, and magnitude. A reader would care because existing tactile methods trade off size, durability, and the ability to measure both direction and magnitude at once, limiting reliable robotic grasping. The multi-channel design cuts directional error by 64 percent and magnitude error by 68 percent relative to single-microphone versions, and the system supports closed-loop control.

Core claim

A-SLIP integrates four piezoelectric microphones behind a textured silicone contact pad to capture structured contact-induced vibrations as multi-channel audio. The signals are processed as synchronized log-mel spectrograms by a convolutional network that outputs predictions for slip presence, direction, and magnitude. In robot- and externally induced slip experiments, the fine-tuned four-microphone configuration reaches a mean absolute directional error of 14.1 degrees, outperforms baselines by up to 12 percent in detection accuracy, and reduces directional error by 32 percent overall.

What carries the argument

The multi-channel piezoelectric microphone array behind the textured silicone pad, whose synchronized audio is transformed into log-mel spectrograms and fed to a lightweight convolutional network for joint slip prediction

Load-bearing premise

Structured vibrations from slip on the textured pad remain distinct and repeatable enough across object materials, textures, grasp forces, and background noise for the CNN to generalize reliably

What would settle it

Directional error rising well above 14 degrees in tests with previously unseen materials, textures, or higher noise levels would show the predictions do not hold outside the reported conditions

Figures

Figures reproduced from arXiv: 2604.08528 by Jean Oh, Jeffrey Ichnowski, Uksang Yoo, Yuemin Mao.

Figure 1
Figure 1. Figure 1: Overview of A-SLIP: Piezoelectric microphones embedded behind textured silicone contact pads capture structure-borne vibra￾tions during slip. Multi-channel log-mel spectrograms are processed by a convolutional network with channel and temporal attention to jointly estimate slip presence, magnitude, and direction as vt ∈ R 2 . from shear and pressure redistribution patterns [1]. However, these sensors are s… view at source ↗
Figure 3
Figure 3. Figure 3: A-SLIP Model Architecture. Log-mel spectrograms from synchronized microphones are normalized and fused via a learned channel attention module. The fused representation passes through 2D convolutional layers preserving temporal resolution, then 1D temporal convolutions modeling slip dynamics. A temporal atten￾tion pooling module aggregates features into a latent vector passed to three heads: a slip classifi… view at source ↗
Figure 4
Figure 4. Figure 4: A-SLIP System. We mount A-SLIP sensors on an XArm gripper attached to an XArm7 robot. To obtain ground-truth in-hand slip, we use an OptiTrack Trio to track poses of the left finger and the object, each with reflective markers attached. frequency content and temporal evolution of these vibrations, we represent each microphone signal as a log-mel spectro￾gram computed over 200 ms windows. Each input sample … view at source ↗
Figure 5
Figure 5. Figure 5: A-SLIP Dataset Distribution. (Top) Robot-induced slip data collected automatically via randomized robot motions sweep￾ing the gripper across a stationary probe, with labels derived from robot states. (Bottom) Externally-induced slip data collected by manually perturbing grasped objects, with labels from OptiTrack￾tracked rigid-body motion. Counts indicate the number of audio slices in the dataset. predicti… view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative Evaluation of A-SLIP. Each row shows a different object; each column shows a sample evaluation frame with predicted (gray) and ground-truth (green) slip vectors overlaid on the contact image alongside per-channel log-mel spectrograms. A-SLIP accurately estimates slip direction and magnitude across objects with varying geometry and surface material, even under impulsive externally induced slip. … view at source ↗
Figure 7
Figure 7. Figure 7: Reactive Control. A-SLIP predicts slip direction and magnitude in real time, enabling rapid robot responses to in-hand slip. (Left) Task 1: the robot pushes an object against a wall and stops automatically upon in-hand slip detection. (Right) Task 2: as an experimenter induces slip, the robot follows the model-predicted slip vector to maintain a stable grasp. Although the textured silicone pad promotes str… view at source ↗
read the original abstract

Reliable in-hand manipulation requires accurate real-time estimation of slip between a gripper and a grasped object. Existing tactile sensing approaches based on vision, capacitance, or force-torque measurements face fundamental trade-offs in form factor, durability, and their ability to jointly estimate slip direction and magnitude. We present A-SLIP, a multi-channel acoustic sensing system integrated into a parallel-jaw gripper for estimating continuous slip in the grasp plane. The A-SLIP sensor consists of piezoelectric microphones positioned behind a textured silicone contact pad to capture structured contact-induced vibrations. The A-SLIP model processes synchronized multi-channel audio as log-mel spectrograms using a lightweight convolutional network, jointly predicting the presence, direction, and magnitude of slip. Across experiments with robot- and externally induced slip conditions, the fine-tuned four-microphone configuration achieves a mean absolute directional error of 14.1 degrees, outperforms baselines by up to 12 percent in detection accuracy, and reduces directional error by 32 percent. Compared with single-microphone configurations, the multi-channel design reduces directional error by 64 percent and magnitude error by 68 percent, underscoring the importance of spatial acoustic sensing in resolving slip direction ambiguity. We further evaluate A-SLIP in closed-loop reactive control and find that it enables reliable, low-cost, real-time estimation of in-hand slip. Project videos and additional details are available at https://a-slip.github.io.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces A-SLIP, a multi-channel acoustic sensing system for continuous in-hand slip estimation in a parallel-jaw gripper. Piezoelectric microphones behind a textured silicone pad capture contact-induced vibrations, which are processed as synchronized log-mel spectrograms by a lightweight CNN to jointly predict slip presence, direction, and magnitude. Experiments with robot- and externally-induced slip report a 14.1° mean absolute directional error for the four-microphone configuration, up to 12% better detection accuracy than baselines, 32% directional error reduction, and 64%/68% error reductions versus single-microphone setups, plus successful closed-loop reactive control.

Significance. If the empirical results hold under broader conditions, A-SLIP offers a low-cost, compact, and durable alternative to vision- or force-based tactile sensing for real-time in-hand manipulation. The multi-channel spatial acoustic approach addresses directional ambiguity in a way that could enable reliable reactive grasping without heavy hardware.

major comments (2)
  1. [Abstract] Abstract: The headline performance metrics (14.1° directional error, 64% and 68% reductions vs. single-microphone, up to 12% detection improvement) are presented without any information on dataset size, number of objects/materials, grasp force ranges, background noise conditions, train/validation/test splits, or statistical significance testing. This absence directly limits assessment of whether the CNN predictions generalize beyond the tested conditions.
  2. [Abstract] Abstract and experimental evaluation: The central assumption that structured vibrations behind the textured pad yield slip-specific patterns invariant to object material, surface texture, and grasp force is load-bearing for the multi-channel advantage and closed-loop claims, yet no cross-material or cross-force ablation results are described to test it.
minor comments (1)
  1. [Abstract] The abstract mentions 'fine-tuned four-microphone configuration' without clarifying what fine-tuning entails or how it differs from the baseline training procedure.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our paper. We address each of the major comments in detail below and indicate the changes made to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The headline performance metrics (14.1° directional error, 64% and 68% reductions vs. single-microphone, up to 12% detection improvement) are presented without any information on dataset size, number of objects/materials, grasp force ranges, background noise conditions, train/validation/test splits, or statistical significance testing. This absence directly limits assessment of whether the CNN predictions generalize beyond the tested conditions.

    Authors: We agree with the referee that the abstract would benefit from additional details to allow readers to better assess the generalizability of our results. In the revised manuscript, we have expanded the abstract to include information on the dataset size, the number of objects and materials tested, the grasp force ranges used, background noise conditions, the train/validation/test splits, and the statistical significance testing. These additions are based on the details already present in the experimental sections of the paper. revision: yes

  2. Referee: [Abstract] Abstract and experimental evaluation: The central assumption that structured vibrations behind the textured pad yield slip-specific patterns invariant to object material, surface texture, and grasp force is load-bearing for the multi-channel advantage and closed-loop claims, yet no cross-material or cross-force ablation results are described to test it.

    Authors: We acknowledge that dedicated cross-material and cross-force ablation studies are not explicitly described in the original manuscript. Our experimental evaluation does include a variety of objects with different materials and surface textures, as well as different grasp forces in the robot-induced slip experiments. The performance improvements of the multi-channel system over single-channel baselines are consistent across these conditions, which provides evidence for the robustness of the approach. In the revised manuscript, we have added a paragraph in the discussion section to explicitly discuss the invariance to material and force based on the existing results. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical sensor + CNN evaluation

full rationale

The paper presents an acoustic hardware design, data collection protocol, and lightweight CNN trained on log-mel spectrograms to regress slip presence/direction/magnitude. All reported metrics (14.1° directional error, 64% and 68% reductions vs. single-mic baselines, closed-loop control success) are obtained from physical experiments and supervised training on held-out test splits. No equations, normalizations, uniqueness theorems, or first-principles derivations are invoked that could reduce to fitted parameters or self-citations by construction. The central claims rest on external experimental falsifiability rather than internal re-labeling of inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that contact vibrations encode usable slip information and on a trained neural network whose parameters are fitted to experimental data; no new physical entities are postulated.

free parameters (1)
  • CNN weights and biases
    The lightweight convolutional network is trained on collected audio data to map spectrograms to slip outputs.
axioms (1)
  • domain assumption Contact-induced vibrations captured by microphones positioned behind a textured silicone pad contain distinguishable information about slip presence, direction, and magnitude.
    This is the core premise enabling the acoustic sensing approach described in the abstract.

pith-pipeline@v0.9.0 · 5561 in / 1342 out tokens · 49993 ms · 2026-05-10T17:26:00.289907+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Learning Versatile Humanoid Manipulation with Touch Dreaming

    cs.RO 2026-04 conditional novelty 5.0

    HTD, a multimodal transformer policy trained with behavioral cloning and touch dreaming to predict future tactile latents, achieves a 90.9% relative success rate improvement over baselines on five real-world contact-r...

Reference graph

Works this paper leans on

40 extracted references · 5 canonical work pages · cited by 1 Pith paper

  1. [1]

    Measurement of shear and slip with a gelsight tactile sensor,

    W. Yuan, R. Li, M. A. Srinivasan, and E. H. Adelson, “Measurement of shear and slip with a gelsight tactile sensor,” in2015 IEEE international conference on robotics and automation (ICRA), pp. 304– 311, IEEE, 2015

  2. [2]

    Gelsight: High-resolution robot tactile sensors for estimating geometry and force,

    W. Yuan, S. Dong, and E. H. Adelson, “Gelsight: High-resolution robot tactile sensors for estimating geometry and force,”Sensors, vol. 17, no. 12, 2017

  3. [3]

    Digit: A novel design for a low-cost compact high-resolution tactile sensor with application to in-hand manipulation,

    M. Lambeta, P.-W. Chou, S. Tian, B. Yang, B. Maloon, V . R. Most, D. Stroud, R. Santos, A. Byagowi, G. Kammerer,et al., “Digit: A novel design for a low-cost compact high-resolution tactile sensor with application to in-hand manipulation,”IEEE Robotics and Automation Letters, vol. 5, no. 3, pp. 3838–3845, 2020

  4. [4]

    Visuo-acoustic hand pose and contact estimation,

    Y . Mao, U. Yoo, Y . Yao, S. N. Syed, L. Bondi, J. Francis, J. Oh, and J. Ichnowski, “Visuo-acoustic hand pose and contact estimation,” arXiv preprint arXiv:2508.00852, 2025

  5. [5]

    Sonicboom: Contact localization using array of microphones,

    M. Lee, U. Yoo, J. Oh, J. Ichnowski, G. Kantor, and O. Kroemer, “Sonicboom: Contact localization using array of microphones,”IEEE Robotics and Automation Letters, 2025

  6. [6]

    Active acoustic contact sensing for soft pneumatic actuators,

    G. Zöller, V . Wall, and O. Brock, “Active acoustic contact sensing for soft pneumatic actuators,” in2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 7966–7972, IEEE, 2020

  7. [7]

    Hearing the slide: Acoustic-guided constraint learning for fast non-prehensile transport,

    Y . Mao, B. P. Duisterhof, M. Lee, and J. Ichnowski, “Hearing the slide: Acoustic-guided constraint learning for fast non-prehensile transport,” 2025

  8. [8]

    Six-axis force/torque sensors for robotics applications: A review,

    M. Y . Cao, S. Laws, and F. R. y Baena, “Six-axis force/torque sensors for robotics applications: A review,”IEEE Sensors Journal, vol. 21, no. 24, pp. 27238–27251, 2021

  9. [9]

    Supervised autonomous robotic soft tissue surgery,

    A. Shademan, R. S. Decker, J. D. Opfermann, S. Leonard, A. Krieger, and P. C. Kim, “Supervised autonomous robotic soft tissue surgery,” Science translational medicine, vol. 8, no. 337, 2016

  10. [10]

    Fast object inertial parameter identification for collaborative robots,

    P. Nadeau, M. Giamou, and J. Kelly, “Fast object inertial parameter identification for collaborative robots,” in2022 International Confer- ence on Robotics and Automation (ICRA), pp. 3560–3566, IEEE, 2022

  11. [11]

    Tactile slam: Real-time inference of shape and pose from planar pushing,

    S. Suresh, M. Bauza, K.-T. Yu, J. G. Mangelson, A. Rodriguez, and M. Kaess, “Tactile slam: Real-time inference of shape and pose from planar pushing,” in2021 IEEE international conference on robotics and automation (ICRA), pp. 11322–11328, IEEE, 2021

  12. [12]

    Cable manipulation with a tactile-reactive gripper,

    Y . She, S. Wang, S. Dong, N. Sunil, A. Rodriguez, and E. Adelson, “Cable manipulation with a tactile-reactive gripper,”The International Journal of Robotics Research, vol. 40, no. 12-14, pp. 1385–1401, 2021

  13. [13]

    Soft- bubble: A highly compliant dense geometry tactile sensor for robot manipulation,

    A. Alspach, K. Hashimoto, N. Kuppuswamy, and R. Tedrake, “Soft- bubble: A highly compliant dense geometry tactile sensor for robot manipulation,” in2019 2nd IEEE International Conference on Soft Robotics (RoboSoft), pp. 597–604, IEEE, 2019

  14. [14]

    Manipulation via membranes: High-resolution and highly deformable tactile sensing and control,

    M. Oller, M. P. i Lisbona, D. Berenson, and N. Fazeli, “Manipulation via membranes: High-resolution and highly deformable tactile sensing and control,” inConference on Robot Learning, pp. 1850–1859, PMLR, 2023

  15. [15]

    Soft magnetic tactile skin for continuous force and location estimation using neural networks,

    T. Hellebrekers, N. Chang, K. Chin, M. J. Ford, O. Kroemer, and C. Majidi, “Soft magnetic tactile skin for continuous force and location estimation using neural networks,”IEEE Robotics and Automation Letters, vol. 5, no. 3, pp. 3892–3898, 2020

  16. [16]

    Tactalign: Human-to-robot policy transfer via tactile alignment,

    Y . Wi, J. Yin, E. Xiang, A. Sharma, J. Malik, M. Mukadam, N. Fazeli, and T. Hellebrekers, “Tactalign: Human-to-robot policy transfer via tactile alignment,”arXiv preprint arXiv:2602.13579, 2026

  17. [17]

    Material recognition using robotic hand with capacitive tactile sensor array and machine learn- ing,

    X. Liu, W. Yang, F. Meng, and T. Sun, “Material recognition using robotic hand with capacitive tactile sensor array and machine learn- ing,”IEEE Transactions on Instrumentation and Measurement, vol. 73, pp. 1–9, 2024

  18. [18]

    Highly sensitive capacitive flexible 3d-force tactile sensors for robotic grasping and manipulation,

    T. Yao, X. Guo, C. Li, H. Qi, H. Lin, L. Liu, Y . Dai, L. Qu, Z. Huang, P. Liu,et al., “Highly sensitive capacitive flexible 3d-force tactile sensors for robotic grasping and manipulation,”Journal of Physics D: Applied Physics, vol. 53, no. 44, p. 445109, 2020

  19. [19]

    Multimodal tactile sensing fused with vision for dexterous robotic housekeeping,

    Q. Mao, Z. Liao, J. Yuan, and R. Zhu, “Multimodal tactile sensing fused with vision for dexterous robotic housekeeping,”Nature Com- munications, vol. 15, no. 1, p. 6871, 2024

  20. [20]

    Tactile beyond pixels: Multisensory touch representations for robot manipulation,

    C. Higuera, A. Sharma, T. Fan, C. K. Bodduluri, B. Boots, M. Kaess, M. Lambeta, T. Wu, Z. Liu, F. R. Hogan,et al., “Tactile beyond pixels: Multisensory touch representations for robot manipulation,” in Conference on Robot Learning, pp. 105–123, PMLR, 2025

  21. [21]

    arXiv preprint arXiv:2508.15990 (2025)

    H.-J. Huang, M. A. Mirzaee, M. Kaess, and W. Yuan, “Gelslam: A real-time, high-fidelity, and robust 3d tactile slam system,”arXiv preprint arXiv:2508.15990, 2025

  22. [22]

    Estimating high-resolution neural stiffness fields using visuotactile sensors,

    J. Han, S. Yao, and K. Hauser, “Estimating high-resolution neural stiffness fields using visuotactile sensors,” in2025 IEEE International Conference on Robotics and Automation (ICRA), pp. 2255–2261, IEEE, 2025

  23. [23]

    Lessons from learning to spin “pens

    J. Wang, Y . Yuan, H. Che, H. Qi, Y . Ma, J. Malik, and X. Wang, “Lessons from learning to spin “pens”,” inConference on Robot Learning, pp. 3124–3138, PMLR, 2025

  24. [24]

    Methods and sensors for slip detection in robotics: A survey,

    R. A. Romeo and L. Zollo, “Methods and sensors for slip detection in robotics: A survey,”IEEE Access, vol. 8, pp. 73027–73050, 2020

  25. [25]

    A slip detection and correction strategy for precision robot grasping,

    M. Stachowsky, T. Hummel, M. Moussa, and H. A. Abdullah, “A slip detection and correction strategy for precision robot grasping,” IEEE/ASME Transactions on Mechatronics, vol. 21, no. 5, pp. 2214– 2226, 2016

  26. [26]

    3d-vitac: Learning fine-grained manipulation with visuo-tactile sensing,

    B. Huang, Y . Wang, X. Yang, Y . Luo, and Y . Li, “3d-vitac: Learning fine-grained manipulation with visuo-tactile sensing,” inConference on Robot Learning, pp. 2557–2578, PMLR, 2025

  27. [27]

    Connecting look and feel: Associating the visual and tactile properties of physical materials,

    W. Yuan, S. Wang, S. Dong, and E. Adelson, “Connecting look and feel: Associating the visual and tactile properties of physical materials,” inProceedings of the IEEE conference on computer vision and pattern recognition, pp. 5580–5588, 2017

  28. [28]

    Sparsh: Self-supervised touch representations for vision-based tactile sensing,

    C. Higuera, A. Sharma, C. K. Bodduluri, T. Fan, P. Lancaster, M. Kalakrishnan, M. Kaess, B. Boots, M. Lambeta, T. Wu,et al., “Sparsh: Self-supervised touch representations for vision-based tactile sensing,” inConference on Robot Learning, pp. 885–915, PMLR, 2025

  29. [29]

    Application of acoustic emission sensing to slip detection in robot grippers,

    S. Rangwala, F. Forouhar, and D. Dornfeld, “Application of acoustic emission sensing to slip detection in robot grippers,”International Journal of Machine Tools and Manufacture, vol. 28, no. 3, pp. 207– 215, 1988

  30. [30]

    Tactile sensors for friction estimation and incipient slip detection—toward dexterous robotic manipulation: A review,

    W. Chen, H. Khamis, I. Birznieks, N. F. Lepora, and S. J. Red- mond, “Tactile sensors for friction estimation and incipient slip detection—toward dexterous robotic manipulation: A review,”IEEE Sensors Journal, vol. 18, no. 22, pp. 9049–9064, 2018

  31. [31]

    Active acoustic sensing for robot manip- ulation,

    S. Lu and H. Culbertson, “Active acoustic sensing for robot manip- ulation,” in2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3161–3168, IEEE, 2023

  32. [32]

    Making sense of audio vibration for liquid height estimation in robotic pouring,

    H. Liang, S. Li, X. Ma, N. Hendrich, T. Gerkmann, F. Sun, and J. Zhang, “Making sense of audio vibration for liquid height estimation in robotic pouring,” in2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), p. 5333–5339, IEEE, Nov. 2019

  33. [33]

    Adding internal audio sensing to internal vision en- ables human-like in-hand fabric recognition with soft robotic finger- tips,

    I. Andrussow, J. Solano, B. A. Richardson, G. Martius, and K. J. Kuchenbecker, “Adding internal audio sensing to internal vision en- ables human-like in-hand fabric recognition with soft robotic finger- tips,” in2025 IEEE-RAS 24th International Conference on Humanoid Robots (Humanoids), pp. 01–08, IEEE, 2025

  34. [34]

    Poe: Acoustic soft robotic proprioception for omnidirectional end-effectors,

    U. Yoo, Z. Lopez, J. Ichnowski, and J. Oh, “Poe: Acoustic soft robotic proprioception for omnidirectional end-effectors,” in2024 IEEE International Conference on Robotics and Automation (ICRA), pp. 14980–14987, IEEE, 2024

  35. [35]

    Vibecheck: Using active acous- tic tactile sensing for contact-rich manipulation,

    K. Zhang, D.-G. Kim, E. T. Chang, H.-H. Liang, Z. He, K. Lampo, P. Wu, I. Kymissis, and M. Ciocarlie, “Vibecheck: Using active acous- tic tactile sensing for contact-rich manipulation,” in2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 12278–12285, IEEE, 2025

  36. [36]

    Maniwav: Learning robot manipulation from in-the-wild audio-visual data.arXiv preprint arXiv:2406.19464, 2024

    Z. Liu, C. Chi, E. Cousineau, N. Kuppuswamy, B. Burchfiel, and S. Song, “Maniwav: Learning robot manipulation from in-the-wild audio-visual data,”arXiv preprint arXiv:2406.19464, 2024

  37. [37]

    Design of a biomimetic tactile sensor for material classification,

    K. Dai, X. Wang, A. M. Rojas, E. Harber, Y . Tian, N. Paiva, J. Gnehm, E. Schindewolf, H. Choset, V . A. Webster-Wood,et al., “Design of a biomimetic tactile sensor for material classification,” in 2022 International Conference on Robotics and Automation (ICRA), pp. 10774–10780, IEEE, 2022

  38. [38]

    The ycb object and model set: Towards common benchmarks for manipulation research,

    B. Calli, A. Singh, A. Walsman, S. Srinivasa, P. Abbeel, and A. M. Dollar, “The ycb object and model set: Towards common benchmarks for manipulation research,” in2015 international conference on ad- vanced robotics (ICAR), pp. 510–517, IEEE, 2015

  39. [39]

    Specaugment: A simple data augmen- tation method for automatic speech recognition,

    D. S. Park, W. Chan, Y . Zhang, C.-C. Chiu, B. Zoph, E. D. Cubuk, and Q. V . Le, “Specaugment: A simple data augmentation method for automatic speech recognition,”arXiv preprint arXiv:1904.08779, 2019

  40. [40]

    Vibrotactile sensing for detecting misalignments in precision manufacturing,

    K. Zhang, C. Chang, S. Aggarwal, M. Veloso, F. Temel, and O. Kroe- mer, “Vibrotactile sensing for detecting misalignments in precision manufacturing,” pp. 10408–10415, 10 2025