pith. machine review for the scientific record. sign in

arxiv: 2604.19677 · v1 · submitted 2026-04-21 · 💻 cs.RO · cs.AI· cs.LG

Recognition: unknown

Learning Hybrid-Control Policies for High-Precision In-Contact Manipulation Under Uncertainty

Geoffrey Hollinger, Hunter L. Brown, Stefan Lee

Authors on Pith no claims yet

Pith reviewed 2026-05-10 02:15 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.LG
keywords hybrid controlreinforcement learningcontact manipulationpeg-in-holesim-to-realforce controlmode selectionuncertainty
0
0 comments X

The pith

Hybrid position-force policies with MATCH training achieve up to 10 percent higher success and five times fewer breaks than pose-only policies in uncertain peg-in-hole tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard reinforcement learning policies for manipulation output pose changes but lack direct force limits, which risks damage during delicate in-contact work like inserting fragile connectors when state estimates are noisy. This paper develops hybrid policies that choose position or force control independently per dimension and introduces MATCH, a training adjustment that aligns the policy's action probabilities with those mode choices to make learning tractable. Tested on fragile peg-in-hole insertion under large localization errors, the method yields policies that succeed more often, break fewer pegs, and apply less force while remaining data-efficient. A reader would care because many real assembly tasks involve unavoidable contact uncertainty and cannot tolerate either failure or hardware damage.

Core claim

The authors show that hybrid position-force control policies, trained via Mode-Aware Training for Contact Handling (MATCH) to mirror intended mode selection behavior, solve high-precision peg-in-hole tasks more reliably than pose-only policies when localization uncertainty is present. In simulation and over 1600 sim-to-real trials on a Franka FR3, MATCH policies reach up to 10 percent higher success rates, produce five times fewer peg breaks, succeed twice as often in high-noise regimes, and apply roughly 30 percent less average force than variable-impedance baselines, all while matching the data efficiency of simpler pose policies despite the larger action space.

What carries the argument

Mode-Aware Training for Contact Handling (MATCH), a modification to policy training that explicitly adjusts action probabilities to reproduce the mode-selection logic of hybrid position-force control.

If this is right

  • MATCH policies solve the same tasks with up to 10 percent higher success under common state-estimation errors.
  • They produce five times fewer peg breaks than pose-only policies.
  • In high-noise settings they succeed twice as often while applying about 30 percent less force than variable-impedance baselines.
  • They retain data efficiency comparable to pose policies despite operating in a larger hybrid action space.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same hybrid-mode approach could be applied to other contact-rich tasks such as screw driving or surface finishing where force limits matter.
  • If the mode-selection benefit holds, robots could operate with cheaper or less precise sensors without sacrificing reliability.
  • Testing MATCH inside different reinforcement-learning algorithms would reveal whether the training adjustment is broadly useful or specific to the current setup.

Load-bearing premise

The simulation must faithfully reproduce real contact forces, friction, and uncertainty distributions so that policies trained inside it transfer without causing damage on physical hardware.

What would settle it

A set of real-robot trials under the same high-noise localization conditions in which MATCH policies show equal or lower success rates and equal or higher average contact forces than pose-only or impedance policies.

Figures

Figures reproduced from arXiv: 2604.19677 by Geoffrey Hollinger, Hunter L. Brown, Stefan Lee.

Figure 1
Figure 1. Figure 1: a) Using hybrid position-force control allows for directional [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Learning Efficiency. Each method was trained for 3M steps using [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Force control selection probability for MATCH. Each trajectory, across 500 evaluations, are segmented by phases, time normalized, and aggregated. MATCH learns a force selection policy that uses position while in free space, regulates force during contact, then returns to pose for insertion, consistent with common analytical approaches [5], [17], [19] to learn. In [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Success rate (top) and break rate (bottom) under uniform hole [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Real robot results. Five simulation trained policies were evaluated with 30 noiseless trials and the highest success rate policy was kept. The policy [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Success rate as hole localization noise increases with an unbreakable [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
read the original abstract

Reinforcement learning-based control policies have been frequently demonstrated to be more effective than analytical techniques for many manipulation tasks. Commonly, these methods learn neural control policies that predict end-effector pose changes directly from observed state information. For tasks like inserting delicate connectors which induce force constraints, pose-based policies have limited explicit control over force and rely on carefully tuned low-level controllers to avoid executing damaging actions. In this work, we present hybrid position-force control policies that learn to dynamically select when to use force or position control in each control dimension. To improve learning efficiency of these policies, we introduce Mode-Aware Training for Contact Handling (MATCH) which adjusts policy action probabilities to explicitly mirror the mode selection behavior in hybrid control. We validate MATCH's learned policy effectiveness using fragile peg-in-hole tasks under extreme localization uncertainty. We find MATCH substantially outperforms pose-control policies -- solving these tasks with up to 10% higher success rates and 5x fewer peg breaks than pose-only policies under common types of state estimation error. MATCH also demonstrates data efficiency equal to pose-control policies, despite learning in a larger and more complex action space. In over 1600 sim-to-real experiments, we find MATCH succeeds twice as often as pose policies in high noise settings (33% vs.~68%) and applies ~30% less force on average compared to variable impedance policies on a Franka FR3 in laboratory conditions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes Mode-Aware Training for Contact Handling (MATCH), a reinforcement learning approach to train hybrid position-force control policies that dynamically select control modes per dimension. It claims these policies outperform pose-only baselines (higher success rates, up to 5x fewer peg breaks under state estimation error) and variable-impedance baselines (∼30% lower average force) on fragile peg-in-hole tasks, validated via over 1600 sim-to-real trials on a Franka FR3 under injected localization uncertainty.

Significance. If the sim-to-real transfer holds, the result would be significant for contact-rich manipulation under uncertainty: it shows that explicit hybrid mode selection, when trained with mode-aware adjustments, can improve both task success and safety metrics compared to standard pose or impedance policies while maintaining data efficiency.

major comments (2)
  1. [Abstract / Experimental Results] Abstract and Experimental Results: The central performance claims (68% vs. 33% success in high-noise settings, ∼30% lower force, 5x fewer breaks) rest on sim-to-real transfer, yet no quantitative metrics are given for simulator fidelity (force/torque profile matching, friction calibration, or uncertainty distribution alignment between sim and real). This is load-bearing because any mismatch would allow policies to exploit simulator-specific artifacts.
  2. [Methods] Methods (MATCH training procedure): The description of how action probabilities are adjusted to mirror hybrid mode selection lacks sufficient detail on the exact probability scaling, reward shaping for mode consistency, or ablation isolating the mode-aware component from standard RL training; without this, it is unclear whether the reported gains are attributable to MATCH or to the larger hybrid action space itself.
minor comments (2)
  1. [Abstract] Abstract: The statement 'up to 10% higher success rates' appears inconsistent with the later specific numbers (33% vs. 68%); clarify whether the 10% figure refers to a different noise regime or baseline comparison.
  2. [Preliminaries / Methods] Notation: The hybrid control formulation would benefit from an explicit equation defining the mode-selection action space and how it composes with the low-level controller.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. The comments highlight important areas for strengthening the presentation of our sim-to-real validation and the MATCH training details. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract / Experimental Results] Abstract and Experimental Results: The central performance claims (68% vs. 33% success in high-noise settings, ∼30% lower force, 5x fewer breaks) rest on sim-to-real transfer, yet no quantitative metrics are given for simulator fidelity (force/torque profile matching, friction calibration, or uncertainty distribution alignment between sim and real). This is load-bearing because any mismatch would allow policies to exploit simulator-specific artifacts.

    Authors: We agree that explicit quantitative metrics for simulator fidelity would better support the sim-to-real claims. The current manuscript reports aggregate success rates and force metrics across 1600+ trials but does not include direct comparisons such as force/torque profile matching or calibrated friction parameters. In the revision we will add a dedicated subsection under Experimental Results that reports these metrics (e.g., RMS force error between sim and real, friction coefficient calibration, and KL divergence on injected uncertainty distributions) drawn from our experimental logs. revision: yes

  2. Referee: [Methods] Methods (MATCH training procedure): The description of how action probabilities are adjusted to mirror hybrid mode selection lacks sufficient detail on the exact probability scaling, reward shaping for mode consistency, or ablation isolating the mode-aware component from standard RL training; without this, it is unclear whether the reported gains are attributable to MATCH or to the larger hybrid action space itself.

    Authors: We acknowledge that the Methods section provides only a high-level overview of the probability adjustment and does not include the precise scaling formula, explicit reward-shaping terms, or an ablation isolating the mode-aware component. In the revised manuscript we will expand the MATCH description with the exact probability scaling equation, the mode-consistency reward terms, and results from an ablation study that trains a standard RL policy in the same hybrid action space. This will clarify that the performance gains arise from the mode-aware adjustments rather than the action space alone. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical RL results from direct experiments

full rationale

This paper reports an empirical RL study of hybrid position-force control policies for peg-in-hole tasks. Central claims rest on experimental comparisons (success rates, peg breaks, force application) across >1600 sim-to-real trials on a Franka FR3, with MATCH outperforming pose-only and variable-impedance baselines under injected state-estimation noise. No mathematical derivations, equations, or first-principles predictions appear; results are not obtained by fitting parameters to a subset and renaming the fit as a prediction, nor by self-definitional loops or load-bearing self-citations. The work is therefore self-contained against external benchmarks, with all performance numbers arising from direct measurement rather than any reduction to the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on empirical validation of an RL-based hybrid control method rather than new theoretical derivations; standard RL assumptions about reward optimization and sim-to-real transfer are invoked without explicit new axioms.

axioms (1)
  • domain assumption Reinforcement learning can learn effective mode selection for hybrid position-force control from reward signals in simulation
    The paper assumes the RL agent can discover when to switch control modes without explicit mode labels or supervision.

pith-pipeline@v0.9.0 · 5554 in / 1199 out tokens · 32333 ms · 2026-05-10T02:15:01.190730+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 10 canonical work pages · 1 internal anchor

  1. [1]

    A review of robotic as- sembly strategies for the full operation procedure: planning, execution and evaluation,

    Y . Jiang, Z. Huang, B. Yang, and W. Yang, “A review of robotic as- sembly strategies for the full operation procedure: planning, execution and evaluation,”Robotics and Computer-Integrated Manufacturing, vol. 78, p. 102366, Dec. 2022

  2. [2]

    Inspection and maintenance of indus- trial infrastructure with autonomous underwater robots,

    F. Nauert and P. Kampmann, “Inspection and maintenance of indus- trial infrastructure with autonomous underwater robots,”Frontiers in Robotics and AI, vol. 10, Aug. 2023

  3. [3]

    Review of emerging surgical robotic technology,

    B. S. Peters, P. R. Armijo, C. Krause, S. A. Choudhury, and D. Oleynikov, “Review of emerging surgical robotic technology,” Surgical Endoscopy, vol. 32, no. 4, pp. 1636–1655, Apr. 2018

  4. [4]

    Hybrid Position/Force Control of Ma- nipulators,

    M. H. Raibert and J. J. Craig, “Hybrid Position/Force Control of Ma- nipulators,”Journal of Dynamic Systems, Measurement, and Control, vol. 103, no. 2, pp. 126–133, June 1981

  5. [5]

    Chhatpar and M

    S. Chhatpar and M. Branicky,Search strategies for peg-in-hole assem- blies with position uncertainty, Feb. 2001, vol. 3, iEEE International Conference on Intelligent Robots and Systems

  6. [6]

    Solving peg-in-hole tasks by human demonstration and exception strategies,

    F. J. Abu-Dakka, B. Nemec, A. Kramberger, A. G. Buch, N. Kr ¨uger, and A. Ude, “Solving peg-in-hole tasks by human demonstration and exception strategies,”Industrial Robot: An International Journal, vol. 41, no. 6, pp. 575–584, Oct. 2014

  7. [7]

    Compare contact model-based control and contact model-free learning: A survey of robotic peg-in- hole assembly strategies,

    J. Xu, Z. Hou, Z. Liu, and H. Qiao, “Compare Contact Model-based Control and Contact Model-free Learning: A Survey of Robotic Peg- in-hole Assembly Strategies,” Apr. 2019, arXiv:1904.05240 [cs]

  8. [8]

    Policy Search for Motor Primitives in Robotics,

    J. Kober and J. Peters, “Policy Search for Motor Primitives in Robotics,” inAdvances in Neural Information Processing Systems, vol. 21. Curran Associates, Inc., 2008

  9. [9]

    arXiv preprint arXiv:2305.13122 , year=

    L. Yang, Z. Huang, F. Lei, Y . Zhong, Y . Yang, C. Fang, S. Wen, B. Zhou, and Z. Lin, “Policy Representation via Diffusion Probability Model for Reinforcement Learning,” May 2023, arXiv:2305.13122

  10. [10]

    FORGE: Force-Guided Exploration for Robust Contact-Rich Manipulation under Uncertainty,

    M. Noseworthy, B. Tang, B. Wen, A. Handa, C. Kessens, N. Roy, D. Fox, F. Ramos, Y . Narang, and I. Akinola, “FORGE: Force-Guided Exploration for Robust Contact-Rich Manipulation under Uncertainty,” Jan. 2025, arXiv:2408.04587 [cs]

  11. [11]

    SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning,

    J. Luo, Z. Hu, C. Xu, Y . L. Tan, J. Berg, A. Sharma, S. Schaal, C. Finn, A. Gupta, and S. Levine, “SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning,” in2024 IEEE International Conference on Robotics and Automation (ICRA), May 2024, pp. 16 961–16 969

  12. [12]

    Learning variable impedance control,

    J. Buchli, F. Stulp, E. Theodorou, and S. Schaal, “Learning variable impedance control,”The International Journal of Robotics Research, vol. 30, no. 7, pp. 820–833, June 2011

  13. [13]

    Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks,

    R. Mart ´ın-Mart´ın, M. A. Lee, R. Gardner, S. Savarese, J. Bohg, and A. Garg, “Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks,” in2019 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 2019, pp. 1010–1017

  14. [14]

    Learning Force Control for Contact-Rich Manipulation Tasks With Rigid Position-Controlled Robots,

    C. C. Beltran-Hernandez, D. Petit, I. G. Ramirez-Alpizar, T. Nishi, S. Kikuchi, T. Matsubara, and K. Harada, “Learning Force Control for Contact-Rich Manipulation Tasks With Rigid Position-Controlled Robots,”IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 5709–5716, Oct. 2020

  15. [15]

    Evalua- tion of Variable Impedance- and Hybrid Force/MotionControllers for Learning Force Tracking Skills,

    A. S. Anand, M. Hagen Myrestrand, and J. T. Gravdahl, “Evalua- tion of Variable Impedance- and Hybrid Force/MotionControllers for Learning Force Tracking Skills,” in2022 IEEE/SICE International Symposium on System Integration (SII), Jan. 2022, pp. 83–89

  16. [16]

    Continuous-Discrete Reinforcement Learning for Hybrid Control in Robotics,

    M. Neunert, A. Abdolmaleki, M. Wulfmeier, T. Lampe, T. Springen- berg, R. Hafner, F. Romano, J. Buchli, N. Heess, and M. Riedmiller, “Continuous-Discrete Reinforcement Learning for Hybrid Control in Robotics,” inProceedings of the Conference on Robot Learning. PMLR, May 2020, pp. 735–751, iSSN: 2640-3498

  17. [17]

    Specification of force-controlled actions in the

    H. Bruyninckx and J. De Schutter, “Specification of force-controlled actions in the ”task frame formalism”-a synthesis,”IEEE Transactions on Robotics and Automation, vol. 12, no. 4, pp. 581–589, Aug. 1996

  18. [18]

    IndustReal: Transferring Contact- Rich Assembly Tasks from Simulation to Reality,

    B. Tang, M. A. Lin, I. Akinola, A. Handa, G. S. Sukhatme, F. Ramos, D. Fox, and Y . Narang, “IndustReal: Transferring Contact- Rich Assembly Tasks from Simulation to Reality,” May 2023, arXiv:2305.17110 [cs]

  19. [19]

    Uncertainty-Driven Spiral Trajectory for Robotic Peg-in-Hole Assembly,

    H. Kang, Y . Zang, X. Wang, and Y . Chen, “Uncertainty-Driven Spiral Trajectory for Robotic Peg-in-Hole Assembly,”IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 6661–6668, July 2022

  20. [20]

    AugInsert: Learn- ing Robust Visual-Force Policies via Data Augmentation for Object Assembly Tasks,

    R. Diaz, A. Imdieke, V . Veeriah, and K. Desingh, “AugInsert: Learn- ing Robust Visual-Force Policies via Data Augmentation for Object Assembly Tasks,” in2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct. 2025, pp. 18 504–18 511

  21. [21]

    Robust Multi-Modal Policies for Industrial Assembly via Reinforcement Learning and Demonstrations: A Large-Scale Study,

    J. Luo, O. Sushkov, R. Pevceviciute, W. Lian, C. Su, M. Vecerik, N. Ye, S. Schaal, and J. Scholz, “Robust Multi-Modal Policies for Industrial Assembly via Reinforcement Learning and Demonstrations: A Large-Scale Study,” July 2021, arXiv:2103.11512 [cs]

  22. [22]

    Learning Gentle Object Manipulation with Curiosity-Driven Deep Reinforcement Learning,

    S. H. Huang, M. Zambelli, J. Kay, M. F. Martins, Y . Tassa, P. M. Pilarski, and R. Hadsell, “Learning Gentle Object Manipulation with Curiosity-Driven Deep Reinforcement Learning,” Mar. 2019, arXiv:1903.08542 [cs]

  23. [23]

    Making Sense of Vision and Touch: Self- Supervised Learning of Multimodal Representations for Contact-Rich Tasks,

    M. A. Lee, Y . Zhu, K. Srinivasan, P. Shah, S. Savarese, L. Fei-Fei, A. Garg, and J. Bohg, “Making Sense of Vision and Touch: Self- Supervised Learning of Multimodal Representations for Contact-Rich Tasks,” in2019 International Conference on Robotics and Automation (ICRA), May 2019, pp. 8943–8950, iSSN: 2577-087X

  24. [24]

    Efficient Sim-to-real Transfer of Contact-Rich Manipulation Skills with Online Admittance Residual Learning,

    X. Zhang, C. Wang, L. Sun, Z. Wu, X. Zhu, and M. Tomizuka, “Efficient Sim-to-real Transfer of Contact-Rich Manipulation Skills with Online Admittance Residual Learning,” inProceedings of The 7th Conference on Robot Learning, Dec. 2023, pp. 1621–1639

  25. [25]

    A survey of robot manipulation in contact,

    M. Suomalainen, Y . Karayiannidis, and V . Kyrki, “A survey of robot manipulation in contact,”Robotics and Autonomous Systems, vol. 156, p. 104224, Oct. 2022

  26. [26]

    R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduc- tion, 2nd ed., ser. Adaptive Computation and Machine Learning Series. Cambridge, Massachusetts: The MIT Press, 2018

  27. [27]

    Factory: Fast Contact for Robotic Assembly,

    Y . Narang, K. Storey, I. Akinola, M. Macklin, P. Reist, L. Wawrzy- niak, Y . Guo, A. Moravanszky, G. State, M. Lu, A. Handa, and D. Fox, “Factory: Fast Contact for Robotic Assembly,” May 2022, arXiv:2205.03532 [cs]

  28. [28]

    Siciliano, O

    B. Siciliano, O. Khatib, and T. Kr ¨oger,Springer handbook of robotics. Springer, 2008, vol. 200

  29. [29]

    Multi-Pass Q- Networks for Deep Reinforcement Learning with Parameterised Ac- tion Spaces,

    C. J. Bester, S. D. James, and G. D. Konidaris, “Multi-Pass Q- Networks for Deep Reinforcement Learning with Parameterised Ac- tion Spaces,” May 2019, arXiv:1905.04388 [cs]

  30. [30]

    Asymmetric Actor Critic for Image-Based Robot Learning

    L. Pinto, M. Andrychowicz, P. Welinder, W. Zaremba, and P. Abbeel, “Asymmetric Actor Critic for Image-Based Robot Learning,” Oct. 2017, arXiv:1710.06542 [cs]

  31. [31]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal Policy Optimization Algorithms,” Aug. 2017, arXiv:1707.06347 [cs]

  32. [32]

    Simba: Simplicity bias for scaling up parameters in deep reinforcement learning,

    H. Lee, D. Hwang, D. Kim, H. Kim, J. J. Tai, K. Subramanian, P. R. Wurman, J. Choo, P. Stone, and T. Seno, “Simba: Simplicity bias for scaling up parameters in deep reinforcement learning,” inInternational Conference on Learning Representations (ICLR), 2025