pith. sign in

arxiv: 2606.05873 · v1 · pith:TGW6Y2V6new · submitted 2026-06-04 · 💻 cs.RO · cs.AI· cs.CV· cs.LG

LadderMan: Learning Humanoid Perceptive Ladder Climbing

Pith reviewed 2026-06-28 01:20 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.CVcs.LG
keywords ladder climbinghumanoid robotsvisuomotor policyreinforcement learningimitation learningsim-to-real transfermanipulation
0
0 comments X

The pith

A two-stage pipeline distills multiple climbing experts into one depth-based policy that lets humanoid robots climb diverse ladders and manipulate objects with zero-shot hardware transfer.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that humanoid robots can handle ladder climbing, a task defined by sparse holds and whole-body coordination demands, by first using hybrid motion tracking to extract multiple experts from one reference motion and then distilling them into a single depth-based visuomotor policy through combined imitation and reinforcement learning. The approach matters because success would let robots operate in human-scale settings where ladders appear, such as maintenance or construction. Vision foundation models close the depth perception gap for real-world use. A separate dual-agent policy is then trained on top of the climbing controller to enable manipulation while on the ladder. Experiments test the full system on varied ladder geometries in simulation and on physical hardware.

Core claim

LadderMan is a unified system whose climbing policy is produced by hybrid motion tracking that yields multiple experts from a single reference motion, followed by hybrid imitation and reinforcement learning that distills them into one depth-based visuomotor policy; vision foundation models bridge the sim-to-real depth gap, and a dual-agent formulation adds stable on-ladder manipulation.

What carries the argument

Hybrid motion tracking that produces multiple climbing experts from one reference motion, distilled via hybrid imitation and reinforcement learning into a unified depth-based visuomotor policy.

If this is right

  • The policy produces robust climbing on a wide range of ladder geometries.
  • The same policy transfers to real hardware without additional training.
  • A dual-agent manipulation policy built on the climbing controller supports teleoperated tasks while on the ladder.
  • The overall system works under the constrained footholds and handholds typical of ladders.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The expert-distillation step may reduce the need for many separate reference motions when learning other multi-contact locomotion skills.
  • Decoupling perception via off-the-shelf foundation models could let the same control pipeline accept improved depth estimators as they appear.
  • The dual-agent manipulation layer suggests that climbing and manipulation can be trained sequentially rather than jointly in high-dimensional spaces.

Load-bearing premise

Hybrid motion tracking creates multiple experts from one motion that can be distilled into a single policy whose performance holds after vision foundation models supply real depth estimates.

What would settle it

A ladder geometry outside the training set on which the distilled policy consistently fails to reach the top, or a zero-shot hardware trial that requires retraining or fails outright.

Figures

Figures reproduced from arXiv: 2606.05873 by C. Karen Liu, Guanya Shi, Koushil Sreenath, Pieter Abbeel, Rocky Duan, Siheng Zhao, Yuanhang Zhang, Yue Wang, Ziqi Lu.

Figure 1
Figure 1. Figure 1: LadderMan enables a Unitree G1 humanoid to robustly climb diverse ladders using a single perceptive policy via zero-shot sim-to-real transfer. Building on the learned climbing ca￾pability, LadderMan further supports stable whole-body on-ladder manipulation via teleoperation, including adjusting paintings, replacing light bulbs, and box handover. Abstract: Humanoid robots hold great promise for operating in… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of LadderMan. Starting from a single reference motion, we first learn multiple expert climbing policies for different ladder geometries using hybrid motion tracking. These experts are then distilled into a unified visuomotor policy through hybrid DAgger and RL. Building on the climbing policy, we further train a dual-agent manipulation policy for stable on-ladder teleoperation. To bridge the sim-t… view at source ↗
Figure 3
Figure 3. Figure 3: We bridge the visual gap between simulation and real world by applying rung-focused [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Real-world results of LadderMan. The humanoid performs zero-shot sim-to-real ladder climbing across diverse real-world ladders (A, B, and C) and executes various on-ladder manipula￾tion tasks while maintaining stable balance. Additional video results are available on our website. ladder while executing manipulation tasks. Directly applying an off-the-shelf teleoperation policy often leads to unstable whole… view at source ↗
Figure 5
Figure 5. Figure 5: (a) Success rates of LadderMan and the blind motion tracking baseline across ladder con [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Failure case of the off-the-shelf teleop [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Side-by-side comparison between a human and LadderMan performing the same ladder [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of Unitree G1 collision geome [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Comparison between raw depth observations and depth predictions generated by Fast [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Comparison between the ref￾erence motion and the learned climbing behavior on a ladder different from that used during motion capture. Hybrid Motion Tracking. We provide a side-by-side comparison between the collected reference motion and the learned policy execution on a ladder configuration with (θ, s) = (55◦ , 20 cm), as shown in [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗
read the original abstract

Humanoid robots hold great promise for operating in human-centered environments, yet ladder climbing remains one of the most challenging tasks due to sparse footholds and handholds, complex whole-body coordination, and sensitivity to perception and control errors. We present \textbf{LadderMan}, a unified system that enables humanoid robots to robustly climb diverse ladders and perform manipulation under such constrained conditions. Our climbing policy is built on a scalable two-stage learning pipeline, where we use hybrid motion tracking to learn multiple climbing experts from a single reference motion, and distill these experts into a unified depth-based visuomotor climbing policy via hybrid imitation and reinforcement learning. To enable real-world deployment, we leverage vision foundation models to bridge the sim-to-real gap in depth perception. Building on the learned climbing policy, we further train a separate manipulation policy using a dual-agent formulation, allowing stable on-ladder manipulation via teleoperation. Experiments demonstrate that LadderMan achieves robust ladder climbing across a wide range of geometries, successfully transfers to real-world hardware in a zero-shot manner, and supports various manipulation tasks under challenging ladder constraints. Video results are available at https://ladderman-robot.github.io .

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper presents LadderMan, a unified system for humanoid ladder climbing and on-ladder manipulation. It proposes a scalable two-stage learning pipeline that first uses hybrid motion tracking to derive multiple climbing experts from a single reference motion, then distills them into a unified depth-based visuomotor policy via hybrid imitation and reinforcement learning. Vision foundation models are employed to bridge the sim-to-real gap for depth perception. A separate dual-agent manipulation policy is trained to enable stable teleoperated tasks under ladder constraints. The central claims are that the system achieves robust climbing across diverse ladder geometries, zero-shot transfer to real hardware, and support for manipulation tasks.

Significance. If the empirical results hold with rigorous validation, the work would advance humanoid robotics by addressing a high-difficulty task involving sparse contacts, whole-body coordination, and perception sensitivity. The hybrid tracking-to-distillation pipeline and use of off-the-shelf vision models for sim-to-real depth bridging represent potentially reusable techniques for complex locomotion. Credit is due for framing the problem as both perceptive climbing and constrained manipulation, which aligns with practical deployment needs.

major comments (1)
  1. [Abstract] Abstract: The central claims of 'robust ladder climbing across a wide range of geometries' and 'successfully transfers to real-world hardware in a zero-shot manner' are presented without any quantitative metrics (e.g., success rates, traversal times, failure modes), ablation studies, or hardware specifications. This absence makes it impossible to evaluate whether the hybrid motion tracking and distillation pipeline actually delivers the reported outcomes, rendering the experimental demonstration load-bearing but unsupported.
minor comments (1)
  1. [Abstract] The abstract references a project website for video results but provides no summary of what the videos demonstrate (e.g., specific ladder variations or failure recoveries), which would aid assessment of the claims.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive evaluation of the work's significance and for the constructive comment on the abstract. We respond to the major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claims of 'robust ladder climbing across a wide range of geometries' and 'successfully transfers to real-world hardware in a zero-shot manner' are presented without any quantitative metrics (e.g., success rates, traversal times, failure modes), ablation studies, or hardware specifications. This absence makes it impossible to evaluate whether the hybrid motion tracking and distillation pipeline actually delivers the reported outcomes, rendering the experimental demonstration load-bearing but unsupported.

    Authors: We agree that the abstract would benefit from including quantitative metrics to support the stated claims and make the summary self-contained. The body of the manuscript reports the relevant experimental results, including success rates across ladder geometries, traversal times, failure mode analysis, ablation studies on the hybrid tracking and distillation components, and hardware specifications for the zero-shot transfer. We will revise the abstract to incorporate key quantitative results from these experiments. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents an empirical two-stage learning pipeline (hybrid motion tracking to produce climbing experts from one reference motion, followed by distillation into a unified depth-based policy via imitation+RL, plus vision foundation models for sim-to-real) and a separate dual-agent manipulation policy. No equations, fitted parameters, or mathematical derivations appear in the provided abstract or described claims. Central results are framed as experimental demonstrations of robustness, zero-shot transfer, and manipulation capability rather than predictions that reduce to inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked in a load-bearing way. This matches the reader's assessment of score 2.0 but is set to 0 because no circular steps of any enumerated kind are identifiable from the text.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no explicit free parameters, axioms, or invented entities are stated. The approach implicitly assumes that reference motion data and vision foundation models provide sufficient grounding.

pith-pipeline@v0.9.1-grok · 5766 in / 1033 out tokens · 21558 ms · 2026-06-28T01:20:00.079034+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 6 linked inside Pith

  1. [1]

    Hammer and U

    W. Hammer and U. Schmalz. Human behaviour when climbing ladders with varying inclina- tions.Safety Science, 15(1):21–38, 1992

  2. [2]

    Zhang, Y

    Y . Zhang, Y . Seo, J. Chen, Y . Yuan, K. Sreenath, P. Abbeel, C. Sferrazza, K. Liu, R. Duan, and G. Shi. Rpl: Learning robust humanoid perceptive locomotion on challenging terrains.arXiv preprint arXiv:2602.03002, 2026

  3. [3]

    Zhuang, S

    Z. Zhuang, S. Yao, and H. Zhao. Humanoid parkour learning.arXiv preprint arXiv:2406.10759, 2024

  4. [4]

    H. Wang, Z. Wang, J. Ren, Q. Ben, T. Huang, W. Zhang, and J. Pang. Beamdojo: Learning agile humanoid locomotion on sparse footholds. InRobotics: Science and Systems (RSS), 2025

  5. [5]

    Yoneda, K

    H. Yoneda, K. Sekiyama, Y . Hasegawa, and T. Fukuda. Vertical ladder climbing motion with posture control for multi-locomotion robot. In2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3579–3584, 2008

  6. [6]

    J. Luo, Y . Zhang, K. Hauser, H. A. Park, M. Paldhe, C. S. G. Lee, M. Grey, M. Stilman, J. H. Oh, J. Lee, I. Kim, and P. Oh. Robust ladder-climbing with a humanoid robot with application to the darpa robotics challenge. In2014 IEEE International Conference on Robotics and Automation (ICRA), pages 2792–2798, 2014

  7. [7]

    Vaillant, A

    J. Vaillant, A. Kheddar, H. Audren, F. Keith, S. Brossette, A. Escande, K. Kaneko, M. Mori- sawa, P. Gergondet, E. Yoshida, S. Kajita, and F. Kanehiro. Multi-contact vertical ladder climbing with an hrp-2 humanoid.Autonomous Robots, 2016

  8. [8]

    Yoshiike, M

    T. Yoshiike, M. Kuroda, R. Ujino, H. Kaneko, H. Higuchi, S. Iwasaki, Y . Kanemoto, M. Asa- tani, and T. Koshiishi. Development of experimental legged robot for inspection and disaster response in plants. In2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4869–4876, 2017

  9. [9]

    X. Sun, K. Hashimoto, T. Teramachi, T. Matsuzawa, S. Kimura, N. Sakai, S. Hayashi, Y . Yoshida, and A. Takanishi. Planning and control of stable ladder climbing motion for the four-limbed robot “warec-1”. In2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 6547–6554, 2017

  10. [10]

    A. A. Saputra, Y . Toda, N. Takesue, and N. Kubota. A novel capabilities of quadruped robot moving through vertical ladder without handrail support. In2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1448–1453, 2019

  11. [11]

    X. Sun, K. Hashimoto, S. Hayashi, M. Okawara, T. Mastuzawa, and A. Takanishi. Stable vertical ladder climbing with rung recognition for a four-limbed robot.Journal of Bionic Engineering, 2021

  12. [12]

    H. Weng, Y . Li, N. Sobanbabu, Z. Wang, Z. Luo, T. He, D. Ramanan, and G. Shi. Hdmi: Learning interactive humanoid whole-body control from human videos.arXiv preprint arXiv:2509.16757, 2025

  13. [13]

    S. Zhao, Y . Ze, Y . Wang, C. K. Liu, P. Abbeel, G. Shi, and R. Duan. Resmimic: From gen- eral motion tracking to humanoid whole-body loco-manipulation via residual learning.arXiv preprint arXiv:2510.05070, 2025

  14. [14]

    B. Wen, S. Dewan, and S. Birchfield. Fast-FoundationStereo: Real-time zero-shot stereo matching.CVPR, 2026. 10

  15. [15]

    Zhang, Y

    Y . Zhang, Y . Yuan, P. Gurunath, I. Gupta, S. Omidshafiei, A.-a. Agha-mohammadi, M. Vazquez-Chanlatte, L. Pedersen, T. He, and G. Shi. Falcon: Learning force-adaptive hu- manoid loco-manipulation.arXiv preprint arXiv:2505.06776, 2025

  16. [16]

    Y . Ze, S. Zhao, W. Wang, A. Kanazawa, R. Duan, P. Abbeel, G. Shi, J. Wu, and C. K. Liu. Twist2: Scalable, portable, and holistic humanoid data collection system.arXiv preprint arXiv:2511.02832, 2025

  17. [17]

    Z. Luo, Y . Yuan, T. Wang, C. Li, S. Chen, F. Casta ˜neda, Z.-A. Cao, J. Li, D. Minor, Q. Ben, X. Da, R. Ding, C. Hogg, L. Song, E. Lim, E. Jeong, T. He, H. Xue, W. Xiao, Z. Wang, S. Yuen, J. Kautz, Y . Chang, U. Iqbal, L. Fan, and Y . Zhu. Sonic: Supersizing motion tracking for natural humanoid whole-body control.arXiv preprint arXiv:2511.07820, 2025

  18. [18]

    Darpa robotics challenge (drc)

    Defense Advanced Research Projects Agency (DARPA). Darpa robotics challenge (drc). https://www.darpa.mil/research/programs/darpa-robotics-challenge, 2026

  19. [19]

    V ogel, R

    D. V ogel, R. Baines, J. Church, J. Lotzer, K. Werner, and M. Hutter. Robust ladder climbing with a quadrupedal robot.arXiv preprint arXiv:2409.17731, 2025

  20. [20]

    T. Miki, J. Lee, J. Hwangbo, L. Wellhausen, V . Koltun, and M. Hutter. Learning robust per- ceptive locomotion for quadrupedal robots in the wild.Science Robotics, 7(62):eabk2822, 2022

  21. [21]

    J. Long, J. Ren, M. Shi, Z. Wang, T. Huang, P. Luo, and J. Pang. Learning humanoid locomo- tion with perceptive internal model.arXiv preprint arXiv:2411.14386, 2024

  22. [22]

    Hoeller, N

    D. Hoeller, N. Rudin, D. Sako, and M. Hutter. Anymal parkour: Learning agile navigation for quadrupedal robots.Science Robotics, 9(88):eadi7566, 2024

  23. [23]

    J. He, C. Zhang, F. Jenelten, R. Grandia, M. B ¨acher, and M. Hutter. Attention-based map encoding for learning generalized legged locomotion.Science Robotics, 10(105):eadv3604, 2025

  24. [24]

    Fankhauser, M

    P. Fankhauser, M. Bloesch, and M. Hutter. Probabilistic terrain mapping for mobile robots with uncertain localization.IEEE Robotics and Automation Letters, 3(4):3019–3026, 2018

  25. [25]

    Q. Ben, B. Xu, K. Li, F. Jia, W. Zhang, J. Wang, J. Wang, D. Lin, and J. Pang. Gallant: V oxel grid-based humanoid locomotion and local-navigation across 3d constrained terrains.arXiv preprint arXiv:2511.14625, 2025

  26. [26]

    Agarwal, A

    A. Agarwal, A. Kumar, J. Malik, and D. Pathak. Legged locomotion in challenging terrains using egocentric vision. InProceedings of The 6th Conference on Robot Learning, volume 205 ofProceedings of Machine Learning Research, pages 403–415. PMLR, 14–18 Dec 2023

  27. [27]

    Cheng, K

    X. Cheng, K. Shi, A. Agarwal, and D. Pathak. Extreme parkour with legged robots.arXiv preprint arXiv:2309.14341, 2023

  28. [28]

    Zhuang, Z

    Z. Zhuang, Z. Fu, J. Wang, C. Atkeson, S. Schwertfeger, C. Finn, and H. Zhao. Robot parkour learning.arXiv preprint arXiv:2309.05665, 2023

  29. [29]

    R. Yang, M. Zhang, N. Hansen, H. Xu, and X. Wang. Learning vision-guided quadrupedal lo- comotion end-to-end with cross-modal transformers.arXiv preprint arXiv:2107.03996, 2021

  30. [30]

    Z. Wu, X. Huang, L. Yang, Y . Zhang, X. Chen, P. Abbeel, R. Duan, A. Kanazawa, C. Sferrazza, G. Shi, and C. K. Liu. Perceptive humanoid parkour: Chaining dynamic human skills via motion matching.arXiv preprint arXiv:2602.15827, 2026

  31. [31]

    Zhang, K

    Z. Zhang, K. Wen, M. Xu, J. He, C. Li, T. Miki, C. Schwarke, C. Zhang, X. B. Peng, and M. Hutter. Learning whole-body humanoid locomotion via motion generation and motion tracking.arXiv preprint arXiv:2604.17335, 2026. 11

  32. [32]

    X. B. Peng, P. Abbeel, S. Levine, and M. van de Panne. Deepmimic: Example-guided deep reinforcement learning of physics-based character skills.ACM Trans. Graph., 37(4):143:1– 143:14, 2018

  33. [33]

    Q. Liao, T. E. Truong, X. Huang, Y . Gao, G. Tevet, K. Sreenath, and C. K. Liu. Beyondmimic: From motion tracking to versatile humanoid control via guided diffusion.arXiv preprint arXiv:2508.08241, 2025

  34. [34]

    S. Ross, G. Gordon, and D. Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. InProceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 ofProceedings of Machine Learning Research, pages 627–635. PMLR, 2011

  35. [35]

    Schulman, F

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

  36. [36]

    M. Macklin. Warp: A high-performance python framework for gpu simulation and graphics, Mar. 2022. URLhttps://github.com/NVIDIA/warp. NVIDIA GPU Technology Confer- ence (GTC)

  37. [37]

    S. Zhu, Z. Zhuang, M. Zhao, K.-Y . Lee, and H. Zhao. Hiking in the wild: A scalable perceptive parkour framework for humanoids.arXiv preprint arXiv:2601.07718, 2026

  38. [38]

    Zhuang, S

    Z. Zhuang, S. Zhu, M. Zhao, and H. Zhao. Deep whole-body parkour.arXiv preprint arXiv:2601.07701, 2026

  39. [39]

    Rudin, J

    N. Rudin, J. He, J. Aurand, and M. Hutter. Parkour in the wild: Learning a general and extensible agile locomotion policy using multi-expert distillation and rl fine-tuning.arXiv preprint arXiv:2505.11164, 2025

  40. [40]

    Mahmood, N

    N. Mahmood, N. Ghorbani, N. F. Troje, G. Pons-Moll, and M. J. Black. AMASS: Archive of motion capture as surface shapes. InInternational Conference on Computer Vision, pages 5442–5451, 2019

  41. [41]

    Isaac Sim

    NVIDIA. Isaac Sim. URLhttps://github.com/isaac-sim/IsaacSim

  42. [42]

    L. Yang, X. Huang, Z. Wu, A. Kanazawa, P. Abbeel, C. Sferrazza, C. K. Liu, R. Duan, and G. Shi. Omniretarget: Interaction-preserving data generation for humanoid whole-body loco- manipulation and scene interaction.arXiv preprint arXiv:2509.26633, 2025. 12 A Method A.1 Learning Multiple Expert Policies from a Single Reference Motion In this stage, our goal...