Recognition: 2 theorem links
· Lean TheoremHandelBot: Real-World Piano Playing via Fast Adaptation of Dexterous Robot Policies
Pith reviewed 2026-05-15 11:45 UTC · model grok-4.3
The pith
HandelBot adapts a simulation policy in two stages to let a dexterous robot play piano accurately after 30 minutes of real data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
HandelBot shows that a simulation-trained policy can be turned into precise bimanual piano playing through a two-stage pipeline: a structured refinement stage that corrects lateral finger joint positions from physical rollouts, followed by residual reinforcement learning that acquires fine corrective actions autonomously, achieving successful performance on five songs with only 30 minutes of real interaction data and an 1.8x improvement over direct simulation deployment.
What carries the argument
The two-stage adaptation pipeline: structured refinement of lateral finger joints from physical rollouts to fix spatial alignments, followed by residual reinforcement learning for fine corrective actions.
If this is right
- The robot successfully plays five recognized songs on real hardware with millimeter precision.
- Performance improves by a factor of 1.8 compared with deploying the simulation policy without adaptation.
- Only 30 minutes of physical interaction data are needed for the full adaptation process.
- Spatial misalignments are corrected to millimeter scale while bimanual coordination remains stable.
Where Pith is reading between the lines
- The same two-stage pattern could shorten adaptation time for other contact-rich tasks such as tool use or object insertion.
- If the refinement stage scales to longer sequences, the approach might support continuous play of full musical pieces rather than short excerpts.
- Combining explicit joint adjustment with residual learning may reduce the risk of instability when moving policies between simulation and hardware in other multi-fingered robots.
- Further tests on varied piano sizes or slight changes in hand mounting could show how robust the alignment correction remains outside the original setup.
Load-bearing premise
The simulation-trained policy starts close enough to real dynamics that limited physical rollouts can fix millimeter-scale misalignments without creating new coordination problems between the two hands.
What would settle it
Run the 30-minute adaptation on the same hardware setup and measure whether finger positioning error stays under 2 mm and song completion accuracy exceeds 80 percent across the five test pieces; failure on either metric would falsify the claim.
Figures
read the original abstract
Mastering dexterous manipulation with multi-fingered hands has been a grand challenge in robotics for decades. Despite its potential, the difficulty of collecting high-quality data remains a primary bottleneck for high-precision tasks. While reinforcement learning and simulation-to-real-world transfer offer a promising alternative, the transferred policies often fail for tasks demanding millimeter-scale precision, such as bimanual piano playing. In this work, we introduce HandelBot, a framework that combines a simulation policy and rapid adaptation through a two-stage pipeline. Starting from a simulation-trained policy, we first apply a structured refinement stage to correct spatial alignments by adjusting lateral finger joints based on physical rollouts. Next, we use residual reinforcement learning to autonomously learn fine-grained corrective actions. Through extensive hardware experiments across five recognized songs, we demonstrate that HandelBot can successfully perform precise bimanual piano playing. Our system outperforms direct simulation deployment by a factor of 1.8x and requires only 30 minutes of physical interaction data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces HandelBot, a two-stage framework for adapting simulation-trained policies to real-world bimanual piano playing. The first stage uses structured refinement of lateral finger joints from physical rollouts to correct spatial alignments, followed by residual reinforcement learning for fine-grained corrections. Hardware experiments on five songs claim successful precise playing, with 1.8x improvement over direct simulation deployment using only 30 minutes of physical interaction data.
Significance. If the experimental results hold with proper quantitative support, this would be a meaningful advance in sim-to-real transfer for high-precision dexterous manipulation, showing that limited real-world data can bridge gaps in tasks like bimanual piano playing that demand millimeter accuracy.
major comments (3)
- [Abstract] Abstract: The headline claims of 1.8x outperformance and successful mm-precision bimanual playing on five songs are unsupported by any metrics, success rates per song, error bars, or quantitative comparisons to direct sim deployment.
- [Experiments] Experiments section: No pre/post-refinement error distributions, ablation removing the structured adjustment stage, or stability analysis for bimanual timing/force control are provided, leaving the central assumption that 30 min of rollouts suffice untested.
- [Methods] Methods: The structured refinement stage is described only at a high level; without details on how lateral joint adjustments achieve mm precision or prevent new coordination instabilities, the pipeline's load-bearing mechanism cannot be evaluated.
minor comments (2)
- [Abstract] Specify the five songs by name and provide song-specific success rates to enable reproducibility and assessment of task difficulty variation.
- [Abstract] Clarify the exact definition of the 1.8x metric (e.g., success rate, completion time, or error) and the baseline direct simulation deployment protocol.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which have helped us identify areas to strengthen our manuscript. We provide point-by-point responses below and will incorporate the suggested revisions in the next version.
read point-by-point responses
-
Referee: [Abstract] Abstract: The headline claims of 1.8x outperformance and successful mm-precision bimanual playing on five songs are unsupported by any metrics, success rates per song, error bars, or quantitative comparisons to direct sim deployment.
Authors: We agree that the abstract should be more self-contained with quantitative support. The Experiments section of the manuscript presents success rates for each of the five songs, along with comparisons showing the 1.8x improvement over direct sim deployment, including error bars from multiple trials. To address this, we will revise the abstract to explicitly include key metrics such as average success rate and the precise definition of the 1.8x factor based on our quantitative results. revision: yes
-
Referee: [Experiments] Experiments section: No pre/post-refinement error distributions, ablation removing the structured adjustment stage, or stability analysis for bimanual timing/force control are provided, leaving the central assumption that 30 min of rollouts suffice untested.
Authors: This is a valid point. While the current manuscript demonstrates the overall performance with 30 minutes of data, we did not include the requested analyses. In the revised manuscript, we will add pre- and post-refinement error distributions to show the impact of the structured stage, an ablation study comparing the full pipeline to one without structured adjustment, and analysis of bimanual stability in terms of timing synchronization and force application. These additions will better validate the data efficiency claim. revision: yes
-
Referee: [Methods] Methods: The structured refinement stage is described only at a high level; without details on how lateral joint adjustments achieve mm precision or prevent new coordination instabilities, the pipeline's load-bearing mechanism cannot be evaluated.
Authors: We appreciate this feedback on clarity. The structured refinement involves computing lateral adjustments from observed key press errors in physical rollouts to align fingers precisely. To improve the description, we will expand the Methods section with algorithmic details, including the adjustment computation formula, how it targets mm-scale corrections without introducing instabilities (e.g., by constraining adjustments to small increments and preserving bimanual coordination), and any empirical validation of stability. revision: yes
Circularity Check
No circularity: empirical hardware results independent of derivation
full rationale
The paper's central claims rest on physical hardware experiments across five songs, measuring 1.8x outperformance versus direct sim transfer and success with 30 minutes of real data. The two-stage pipeline (structured lateral-joint adjustment followed by residual RL) is presented as an algorithmic procedure whose efficacy is validated externally by rollouts rather than by any self-referential equation, fitted parameter renamed as prediction, or self-citation chain. No load-bearing step reduces to its own inputs by construction; the results are falsifiable against the reported hardware metrics and therefore self-contained.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We first apply a structured refinement stage to correct spatial alignments by adjusting lateral finger joints based on physical rollouts. Next, we use residual reinforcement learning to autonomously learn fine-grained corrective actions.
-
IndisputableMonolith/Foundation/DimensionForcing.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
HandelBot consistently achieves the highest F1 scores across all evaluated musical pieces... outperforms direct simulation deployment by a factor of 1.8x
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Robopi- anist: Dexterous piano playing with deep reinforcement learning,
K. Zakka, P. Wu, L. Smith, N. Gileadi, T. Howell, X. B. Peng, S. Singh, Y . Tassa, P. Florence, A. Zeng, and P. Abbeel, “Robopi- anist: Dexterous piano playing with deep reinforcement learning,” in Conference on Robot Learning (CoRL), 2023
work page 2023
-
[2]
Droid: A large-scale in-the-wild robot manipulation dataset,
A. Khazatsky, K. Pertsch, S. Nair, A. Balakrishna, S. Dasari, S. Karam- cheti, S. Nasiriany, M. K. Srirama, L. Y . Chen, K. Ellis, P. D. Fagan, J. Hejna, M. Itkina, M. Lepert, Y . J. Ma, P. T. Miller, J. Wu, S. Belkhale, S. Dass, H. Ha, A. Jain, A. Lee, Y . Lee, M. Memmel, S. Park, I. Radosavovic, K. Wang, A. Zhan, K. Black, C. Chi, K. B. Hatch, S. Lin, J...
work page 2024
-
[3]
Open X-Embodiment: Robotic learning datasets and RT- X models,
O. X.-E. Collaboration, A. O’Neill, A. Rehman, A. Gupta, A. Mad- dukuri, A. Gupta, A. Padalkar, A. Lee, A. Pooley, A. Gupta, A. Man- dlekar, A. Jain, A. Tung, A. Bewley, A. Herzog, A. Irpan, A. Khaz- atsky, A. Rai, A. Gupta, A. Wang, A. Kolobov, A. Singh, A. Garg, A. Kembhavi, A. Xie, A. Brohan, A. Raffin, A. Sharma, A. Yavary, A. Jain, A. Balakrishna, A....
work page 2024
-
[4]
Dexumi: Using human hand as the universal manipulation inter- face for dexterous manipulation,
M. Xu, H. Zhang, Y . Hou, Z. Xu, L. Fan, M. Veloso, and S. Song, “Dexumi: Using human hand as the universal manipulation inter- face for dexterous manipulation,” inConference on Robot Learning (CoRL), 2025
work page 2025
-
[5]
Doglove: Dexterous manip- ulation with a low-cost open-source haptic force feedback glove,
H. Zhang, S. Hu, Z. Yuan, and H. Xu, “Doglove: Dexterous manip- ulation with a low-cost open-source haptic force feedback glove,” in Robotics: Science and Systems (RSS), 2025
work page 2025
-
[6]
Bimanual dexterity for complex tasks,
K. Shaw, Y . Li, J. Yang, M. K. Srirama, R. Liu, H. Xiong, R. Men- donca, and D. Pathak, “Bimanual dexterity for complex tasks,” in Conference on Robot Learning (CoRL), 2024
work page 2024
-
[7]
High-fidelity grasping in virtual reality using a glove-based system,
H. Liu, Z. Zhang, X. Xie, Y . Zhu, Y . Liu, Y . Wang, and S.-C. Zhu, “High-fidelity grasping in virtual reality using a glove-based system,” inInternational Conference on Robotics and Automation (ICRA), 2019
work page 2019
-
[8]
Bunny-visionpro: Real-time bimanual dexterous teleoperation for imitation learning,
R. Ding, Y . Qin, J. Zhu, C. Jia, S. Yang, R. Yang, X. Qi, and X. Wang, “Bunny-visionpro: Real-time bimanual dexterous teleoperation for imitation learning,” inInternational Conference on Intelligent Robots and Systems (IROS), 2025
work page 2025
-
[9]
Open-television: Teleoperation with immersive active visual feedback,
X. Cheng, J. Li, S. Yang, G. Yang, and X. Wang, “Open-television: Teleoperation with immersive active visual feedback,” inConference on Robot Learning (CoRL), 2024
work page 2024
-
[10]
A. Iyer, Z. Peng, Y . Dai, I. Guzey, S. Haldar, S. Chintala, and L. Pinto, “Open teach: A versatile teleoperation system for robotic manipulation,”arXiv:2403.07870, 2024
-
[11]
Anyteleop: A general vision-based dexterous robot arm- hand teleoperation system,
Y . Qin, W. Yang, B. Huang, K. Van Wyk, H. Su, X. Wang, Y .-W. Chao, and D. Fox, “Anyteleop: A general vision-based dexterous robot arm- hand teleoperation system,” inRobotics: Science and Systems (RSS), 2023
work page 2023
-
[12]
Dexpilot: Vision-based tele- operation of dexterous robotic hand-arm system,
A. Handa, K. Van Wyk, W. Yang, J. Liang, Y .-W. Chao, Q. Wan, S. Birchfield, N. Ratliff, and D. Fox, “Dexpilot: Vision-based tele- operation of dexterous robotic hand-arm system,” inInternational Conference on Robotics and Automation (ICRA), 2020
work page 2020
-
[13]
DEXOP: A Device for Robotic Trans- fer of Dexterous Human Manipulation
H.-S. Fang, B. Romero, Y . Xie, A. Hu, B.-R. Huang, J. Alvarez, M. Kim, G. Margolis, K. Anbarasu, M. Tomizuka, E. Adelson, and P. Agrawal, “Dexop: A device for robotic transfer of dexterous human manipulation,”arXiv:2509.04441, 2025
-
[14]
Learning fine-grained bimanual manipulation with low-cost hardware,
T. Z. Zhao, V . Kumar, S. Levine, and C. Finn, “Learning fine-grained bimanual manipulation with low-cost hardware,” inRobotics: Science and Systems (RSS), 2023
work page 2023
-
[15]
Openvla: An open-source vision-language-action model,
M. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi, Q. Vuong, T. Kollar, B. Burchfiel, R. Tedrake, D. Sadigh, S. Levine, P. Liang, and C. Finn, “Openvla: An open-source vision-language-action model,” inConfer- ence on Robot Learning (CoRL), 2025
work page 2025
-
[16]
A taxonomy for evaluating generalist robot manipulation policies,
J. Gao, S. Belkhale, S. Dasari, A. Balakrishna, D. Shah, and D. Sadigh, “A taxonomy for evaluating generalist robot manipulation policies,” Robotics and Automation Letters (RA-L), 2026
work page 2026
-
[17]
Efficient data collection for robotic manipulation via compositional generalization,
J. Gao, A. Xie, T. Xiao, C. Finn, and D. Sadigh, “Efficient data collection for robotic manipulation via compositional generalization,” inRobotics: Science and Systems (RSS), 2024
work page 2024
-
[18]
π 0.5: a vision-language-action model with open-world generalization,
P. Intelligence, K. Black, N. Brown, J. Darpinian, K. Dhabalia, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, M. Y . Galliker, D. Ghosh, L. Groom, K. Hausman, B. Ichter, S. Jakubczak, T. Jones, L. Ke, D. LeBlanc, S. Levine, A. Li-Bell, M. Mothukuri, S. Nair, K. Pertsch, A. Z. Ren, L. X. Shi, L. Smith, J. T. Springenberg, K. Stachowicz, J. Tanner, Q. V...
work page 2025
-
[19]
Robocrowd: Scaling robot data collection through crowdsourcing,
S. Mirchandani, D. D. Yuan, K. Burns, M. S. Islam, T. Z. Zhao, C. Finn, and D. Sadigh, “Robocrowd: Scaling robot data collection through crowdsourcing,” inInternational Conference on Robotics and Automation (ICRA), 2025
work page 2025
-
[20]
Robocade: Gamifying robot data collection,
S. Mirchandani, M. Tang, J. Duan, J. I. Hamid, M. Cho, and D. Sadigh, “Robocade: Gamifying robot data collection,”arXiv:2512.21235, 2025
-
[21]
Gello: A general, low- cost, and intuitive teleoperation framework for robot manipulators,
P. Wu, Y . Shentu, Z. Yi, X. Lin, and P. Abbeel, “Gello: A general, low- cost, and intuitive teleoperation framework for robot manipulators,” in International Conference on Intelligent Robots and Systems (IROS), 2024
work page 2024
-
[22]
Dexwild: Dexterous human interactions for in-the-wild robot policies,
T. Tao, M. K. Srirama, J. J. Liu, K. Shaw, and D. Pathak, “Dexwild: Dexterous human interactions for in-the-wild robot policies,” in Robotics: Science and Systems (RSS), 2025
work page 2025
-
[23]
I. Guzey, H. Qi, J. Urain, C. Wang, J. Yin, K. Bodduluri, M. Lambeta, L. Pinto, A. Rai, J. Malik, T. Wu, A. Sharma, and H. Bharadhwaj, “Dexterity from smart lenses: Multi-fingered robot manipulation with in-the-wild human demonstrations,” inInternational Conference on Robotics and Automation (ICRA), 2026
work page 2026
-
[24]
Dexmv: Imitation learning for dexterous manipulation from human videos,
Y . Qin, Y .-H. Wu, S. Liu, H. Jiang, R. Yang, Y . Fu, and X. Wang, “Dexmv: Imitation learning for dexterous manipulation from human videos,” inEuropean Conference on Computer Vision (ECCV), 2022
work page 2022
-
[25]
Deft: Dexterous fine-tuning for real-world hand policies,
A. Kannan, K. Shaw, S. Bahl, P. Mannam, and D. Pathak, “Deft: Dexterous fine-tuning for real-world hand policies,” inConference on Robot Learning (CoRL), 2023
work page 2023
-
[26]
Dexcap: Scalable and portable mocap data collection system for dexterous manipulation,
C. Wang, H. Shi, W. Wang, R. Zhang, L. Fei-Fei, and C. K. Liu, “Dexcap: Scalable and portable mocap data collection system for dexterous manipulation,” inRobotics: Science and Systems (RSS), 2024
work page 2024
-
[27]
Osmo: Open-source tactile glove for human-to-robot skill transfer,
J. Yin, H. Qi, Y . Wi, S. Kundu, M. Lambeta, W. Yang, C. Wang, T. Wu, J. Malik, and T. Hellebrekers, “Osmo: Open-source tactile glove for human-to-robot skill transfer,”arXiv:2512.08920, 2025
-
[28]
Crossing the human-robot embodiment gap with sim-to-real rl using one human demonstration,
T. G. W. Lum, O. Y . Lee, C. K. Liu, and J. Bohg, “Crossing the human-robot embodiment gap with sim-to-real rl using one human demonstration,” inConference on Robot Learning (CoRL), 2025
work page 2025
-
[29]
Solving Rubik's Cube with a Robot Hand
OpenAI, I. Akkaya, M. Andrychowicz, M. Chociej, M. Litwin, B. McGrew, A. Petron, A. Paino, M. Plappert, G. Powell, R. Ribas, J. Schneider, N. Tezak, J. Tworek, P. Welinder, L. Weng, Q. Yuan, W. Zaremba, and L. Zhang, “Solving rubik’s cube with a robot hand,” arXiv:1910.07113, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1910
-
[30]
Anyrotate: Gravity-invariant in- hand object rotation with sim-to-real touch,
M. Yang, C. Lu, A. Church, Y . Lin, C. Ford, H. Li, E. Psomopoulou, D. A. W. Barton, and N. F. Lepora, “Anyrotate: Gravity-invariant in- hand object rotation with sim-to-real touch,” inConference on Robot Learning (CoRL), 2024
work page 2024
-
[31]
In-hand object rotation via rapid motor adaptation,
H. Qi, A. Kumar, R. Calandra, Y . Ma, and J. Malik, “In-hand object rotation via rapid motor adaptation,” inConference on Robot Learning (CoRL), 2022
work page 2022
-
[32]
Simtoolreal: An object-centric policy for zero-shot dexterous tool manipulation,
K. Kedia, T. G. W. Lum, J. Bohg, and C. K. Liu, “Simtoolreal: An object-centric policy for zero-shot dexterous tool manipulation,” arXiv:2602.16863, 2026
-
[33]
Scaffolding dexterous manipulation with vision-language models,
V . de Bakker, J. Hejna, T. G. W. Lum, O. Celik, A. Taranovic, D. Bless- ing, G. Neumann, J. Bohg, and D. Sadigh, “Scaffolding dexterous manipulation with vision-language models,”arXiv:2506.19212, 2026
-
[34]
DextrAH-g: Pixels- to-action dexterous arm-hand grasping with geometric fabrics,
T. G. W. Lum, M. Matak, V . Makoviychuk, A. Handa, A. Allshire, T. Hermans, N. D. Ratliff, and K. V . Wyk, “DextrAH-g: Pixels- to-action dexterous arm-hand grasping with geometric fabrics,” in Conference on Robot Learning (CoRL), 2024
work page 2024
-
[35]
Lessons from learning to spin “pens
J. Wang, Y . Yuan, H. Che, H. Qi, Y . Ma, J. Malik, and X. Wang, “Lessons from learning to spin “pens”,” inConference on Robot Learning (CoRL), 2024
work page 2024
-
[36]
Learning dexterous manipulation skills from imperfect simulations,
E. Hsieh, W.-H. Hsieh, Y .-J. Wang, T. Lin, J. Malik, K. Sreenath, and H. Qi, “Learning dexterous manipulation skills from imperfect simulations,” inInternational Conference on Robotics and Automation (ICRA), 2026
work page 2026
-
[37]
The robot musician ‘wabot-2’(waseda robot-2),
I. Kato, S. Ohteru, K. Shirai, T. Matsushima, S. Narita, S. Sugano, T. Kobayashi, and E. Fujisawa, “The robot musician ‘wabot-2’(waseda robot-2),”Robotics, 1987
work page 1987
-
[38]
Electronic piano playing robot,
J.-C. Lin, H.-H. Huang, Y .-F. Li, J.-C. Tai, and L.-W. Liu, “Electronic piano playing robot,” inInternational Symposium on Computer, Com- munication, Control and Automation (3CA), 2010
work page 2010
-
[39]
A. Topper, T. Maloney, S. Barton, and X. Kong, “Piano-playing robotic arm,”Worcester MA, 2019
work page 2019
-
[40]
An anthropomorphic soft skele- ton hand exploiting conditional models for piano playing,
J. Hughes, P. Maiolino, and F. Iida, “An anthropomorphic soft skele- ton hand exploiting conditional models for piano playing,”Science Robotics, 2018
work page 2018
-
[41]
Robotic finger hardware and controls design for dynamic piano playing,
R. Castro Ornelas, “Robotic finger hardware and controls design for dynamic piano playing,” Ph.D. dissertation, Massachusetts Institute of Technology, 2022
work page 2022
-
[42]
Design and analysis of a piano playing robot,
D. Zhang, J. Lei, B. Li, D. Lau, and C. Cameron, “Design and analysis of a piano playing robot,” inInternational Conference on Information and Automation (ICRA), 2009
work page 2009
-
[43]
Musical piano perfor- mance by the act hand,
A. Zhang, M. Malhotra, and Y . Matsuoka, “Musical piano perfor- mance by the act hand,” inInternational Conference on Robotics and Automation (ICRA), 2011
work page 2011
-
[44]
Controller design for music playing robot—applied to the anthropomorphic piano robot,
Y .-F. Li and L.-L. Chuang, “Controller design for music playing robot—applied to the anthropomorphic piano robot,” inInternational Conference on Power Electronics and Drive Systems (PEDS), 2013
work page 2013
-
[45]
Bidexhand: Design and evaluation of an open-source 16-dof biomimetic dexterous hand,
Z. K. Weng, “Bidexhand: Design and evaluation of an open-source 16-dof biomimetic dexterous hand,” 2025. [Online]. Available: https://arxiv.org/abs/2504.14712
-
[46]
F ¨urelise: Cap- turing and physically synthesizing hand motion of piano performance,
R. Wang, P. Xu, H. Shi, E. Schumann, and C. K. Liu, “F ¨urelise: Cap- turing and physically synthesizing hand motion of piano performance,” inSIGGRAPH Asia, 2024
work page 2024
-
[47]
Pianomime: Learning a generalist, dexterous piano player from internet demonstrations,
C. Qian, J. Urain, K. Zakka, and J. Peters, “Pianomime: Learning a generalist, dexterous piano player from internet demonstrations,” in Conference on Robot Learning (CoRL), 2024
work page 2024
-
[48]
Towards learn- ing to play piano with dexterous hands and touch,
H. Xu, Y . Luo, S. Wang, T. Darrell, and R. Calandra, “Towards learn- ing to play piano with dexterous hands and touch,” inInternational Conference on Intelligent Robots and Systems (IROS), 2022
work page 2022
-
[49]
Rp1m: A large-scale motion dataset for piano playing with bi-manual dexterous robot hands,
Y . Zhao, L. Chen, J. Schneider, Q. Gao, J. Kannala, B. Sch ¨olkopf, J. Pajarinen, and D. B ¨uchler, “Rp1m: A large-scale motion dataset for piano playing with bi-manual dexterous robot hands,” arXiv:2408.11048, 2024
-
[50]
Dexterous robotic piano playing at scale,
L. Chen, Y . Zhao, J. Schneider, Q. Gao, S. Guist, C. Qian, J. Kannala, B. Sch ¨olkopf, J. Pajarinen, and D. B ¨uchler, “Dexterous robotic piano playing at scale,” 2025. [Online]. Available: https: //arxiv.org/abs/2511.02504
-
[51]
Learning to Play Piano in the Real World
Y .-S. Zeulner, S. Selvaraj, and R. Calandra, “Learning to play piano in the real world,”arXiv preprint arXiv:2503.15481, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[52]
A walk in the park: Learning to walk in 20 minutes with model-free reinforcement learning,
L. Smith, I. Kostrikov, and S. Levine, “A walk in the park: Learning to walk in 20 minutes with model-free reinforcement learning,” in Robotics: Science and Systems (RSS), 2023
work page 2023
-
[53]
Robot trains robot: Automatic real-world policy adaptation and learning for humanoids,
K. Hu, H. Shi, Y . He, W. Wang, C. K. Liu, and S. Song, “Robot trains robot: Automatic real-world policy adaptation and learning for humanoids,” inConference on Robot Learning (CoRL), 2025
work page 2025
-
[54]
A. Gupta, J. Yu, T. Z. Zhao, V . Kumar, A. Rovinsky, K. Xu, T. Devlin, and S. Levine, “Reset-free reinforcement learning via multi- task learning: Learning dexterous manipulation behaviors without human intervention,” inInternational Conference on Information and Automation (ICRA), 2021
work page 2021
-
[55]
Serl: A software suite for sample- efficient robotic reinforcement learning,
J. Luo, Z. Hu, C. Xu, Y . L. Tan, J. Berg, A. Sharma, S. Schaal, C. Finn, A. Gupta, and S. Levine, “Serl: A software suite for sample- efficient robotic reinforcement learning,” inInternational Conference on Information and Automation (ICRA), 2024
work page 2024
-
[56]
Imitation bootstrapped rein- forcement learning,
H. Hu, S. Mirchandani, and D. Sadigh, “Imitation bootstrapped rein- forcement learning,” inRobotics: Science and Systems (RSS), 2024
work page 2024
-
[57]
Rewind: Language-guided rewards teach robot policies without new demonstrations,
J. Zhang, Y . Luo, A. Anwar, S. A. Sontakke, J. J. Lim, J. Thomason, E. Biyik, and J. Zhang, “Rewind: Language-guided rewards teach robot policies without new demonstrations,” inConference on Robot Learning (CoRL), 2025
work page 2025
-
[58]
K. Lei, H. Li, D. Yu, Z. Wei, L. Guo, Z. Jiang, Z. Wang, S. Liang, and H. Xu, “Rl-100: Performant robotic manipulation with real-world reinforcement learning,” 2026. [Online]. Available: https://arxiv.org/abs/2510.14830
-
[59]
Reboot: Reuse data for bootstrapping efficient real-world dexterous manipulation,
Z. Hu, A. Rovinsky, J. Luo, V . Kumar, A. Gupta, and S. Levine, “Reboot: Reuse data for bootstrapping efficient real-world dexterous manipulation,” inConference on Robot Learning (CoRL), 2023
work page 2023
-
[60]
Efficient online reinforcement learning fine-tuning need not retain offline data,
Z. Zhou, A. Peng, Q. Li, S. Levine, and A. Kumar, “Efficient online reinforcement learning fine-tuning need not retain offline data,”arXiv preprint arXiv:2412.07762, 2024
-
[61]
J. Yang, M. S. Mark, B. Vu, A. Sharma, J. Bohg, and C. Finn, “Robot fine-tuning made easy: Pre-training rewards and policies for autonomous real-world reinforcement learning,”arXiv:2310.15145, 2023
-
[62]
Policy agnostic rl: Offline rl and online rl fine-tuning of any class and backbone,
M. S. Mark, T. Gao, G. G. Sampaio, M. K. Srirama, A. Sharma, C. Finn, and A. Kumar, “Policy agnostic rl: Offline rl and online rl fine-tuning of any class and backbone,”arXiv:2412.06685, 2024
-
[63]
Residual Reinforcement Learning for Robot Control
T. Johannink, S. Bahl, A. Nair, J. Luo, A. Kumar, M. Loskyll, J. A. Ojea, E. Solowjow, and S. Levine, “Residual reinforcement learning for robot control,”arXiv:1812.03201, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[64]
Policy decorator: Model-agnostic online refinement for large policy model,
X. Yuan, T. Mu, S. Tao, Y . Fang, M. Zhang, and H. Su, “Policy decorator: Model-agnostic online refinement for large policy model,” inInternational Conference on Learning Representations (ICLR), 2025
work page 2025
-
[65]
Residual off-policy rl for finetuning behavior cloning policies,
L. Ankile, Z. Jiang, R. Duan, G. Shi, P. Abbeel, and A. Nagabandi, “Residual off-policy rl for finetuning behavior cloning policies,” arXiv:2509.19301, 2025
-
[66]
S. Zhao, Y . Ze, Y . Wang, C. K. Liu, P. Abbeel, G. Shi, and R. Duan, “Resmimic: From general motion tracking to humanoid whole-body loco-manipulation via residual learning,”arXiv:2510.05070, 2025
-
[67]
Addressing function approxi- mation error in actor-critic methods,
S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approxi- mation error in actor-critic methods,” inInternational conference on machine learning (ICML), 2018
work page 2018
-
[68]
Man- iskill2: A unified benchmark for generalizable manipulation skills,
J. Gu, F. Xiang, X. Li, Z. Ling, X. Liu, T. Mu, Y . Tang, S. Tao, X. Wei, Y . Yao, X. Yuan, P. Xie, Z. H. Huang, R. Chen, and H. Su, “Man- iskill2: A unified benchmark for generalizable manipulation skills,” in International Conference on Learning Representations (ICLR), 2023
work page 2023
-
[69]
Pyroki: A modular toolkit for robot kinematic optimization,
C. M. Kim, B. Yi, H. Choi, Y . Ma, K. Goldberg, and A. Kanazawa, “Pyroki: A modular toolkit for robot kinematic optimization,” in International Conference on Intelligent Robots and Systems (IROS), 2025
work page 2025
-
[70]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”arXiv:1707.06347, 2017. APPENDIX We open-source our simulated and real-world imple- mentations inhttps://github.com/amberxie88/ handelbotand show videos on our websitehttps: //amberxie88.github.io/handelbot. A. Simulation Training We train a PPO [70] ...
work page internal anchor Pith review Pith/arXiv arXiv 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.