LadderMan: Learning Humanoid Perceptive Ladder Climbing

C. Karen Liu; Guanya Shi; Koushil Sreenath; Pieter Abbeel; Rocky Duan; Siheng Zhao; Yuanhang Zhang; Yue Wang; Ziqi Lu

arxiv: 2606.05873 · v1 · pith:TGW6Y2V6new · submitted 2026-06-04 · 💻 cs.RO · cs.AI· cs.CV· cs.LG

LadderMan: Learning Humanoid Perceptive Ladder Climbing

Siheng Zhao , Yuanhang Zhang , Ziqi Lu , Pieter Abbeel , Rocky Duan , Koushil Sreenath , Yue Wang , C. Karen Liu

show 1 more author

Guanya Shi

This is my paper

Pith reviewed 2026-06-28 01:20 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.CVcs.LG

keywords ladder climbinghumanoid robotsvisuomotor policyreinforcement learningimitation learningsim-to-real transfermanipulation

0 comments

The pith

A two-stage pipeline distills multiple climbing experts into one depth-based policy that lets humanoid robots climb diverse ladders and manipulate objects with zero-shot hardware transfer.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that humanoid robots can handle ladder climbing, a task defined by sparse holds and whole-body coordination demands, by first using hybrid motion tracking to extract multiple experts from one reference motion and then distilling them into a single depth-based visuomotor policy through combined imitation and reinforcement learning. The approach matters because success would let robots operate in human-scale settings where ladders appear, such as maintenance or construction. Vision foundation models close the depth perception gap for real-world use. A separate dual-agent policy is then trained on top of the climbing controller to enable manipulation while on the ladder. Experiments test the full system on varied ladder geometries in simulation and on physical hardware.

Core claim

LadderMan is a unified system whose climbing policy is produced by hybrid motion tracking that yields multiple experts from a single reference motion, followed by hybrid imitation and reinforcement learning that distills them into one depth-based visuomotor policy; vision foundation models bridge the sim-to-real depth gap, and a dual-agent formulation adds stable on-ladder manipulation.

What carries the argument

Hybrid motion tracking that produces multiple climbing experts from one reference motion, distilled via hybrid imitation and reinforcement learning into a unified depth-based visuomotor policy.

If this is right

The policy produces robust climbing on a wide range of ladder geometries.
The same policy transfers to real hardware without additional training.
A dual-agent manipulation policy built on the climbing controller supports teleoperated tasks while on the ladder.
The overall system works under the constrained footholds and handholds typical of ladders.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The expert-distillation step may reduce the need for many separate reference motions when learning other multi-contact locomotion skills.
Decoupling perception via off-the-shelf foundation models could let the same control pipeline accept improved depth estimators as they appear.
The dual-agent manipulation layer suggests that climbing and manipulation can be trained sequentially rather than jointly in high-dimensional spaces.

Load-bearing premise

Hybrid motion tracking creates multiple experts from one motion that can be distilled into a single policy whose performance holds after vision foundation models supply real depth estimates.

What would settle it

A ladder geometry outside the training set on which the distilled policy consistently fails to reach the top, or a zero-shot hardware trial that requires retraining or fails outright.

Figures

Figures reproduced from arXiv: 2606.05873 by C. Karen Liu, Guanya Shi, Koushil Sreenath, Pieter Abbeel, Rocky Duan, Siheng Zhao, Yuanhang Zhang, Yue Wang, Ziqi Lu.

**Figure 1.** Figure 1: LadderMan enables a Unitree G1 humanoid to robustly climb diverse ladders using a single perceptive policy via zero-shot sim-to-real transfer. Building on the learned climbing capability, LadderMan further supports stable whole-body on-ladder manipulation via teleoperation, including adjusting paintings, replacing light bulbs, and box handover. Abstract: Humanoid robots hold great promise for operating in… view at source ↗

**Figure 2.** Figure 2: Overview of LadderMan. Starting from a single reference motion, we first learn multiple expert climbing policies for different ladder geometries using hybrid motion tracking. These experts are then distilled into a unified visuomotor policy through hybrid DAgger and RL. Building on the climbing policy, we further train a dual-agent manipulation policy for stable on-ladder teleoperation. To bridge the sim-t… view at source ↗

**Figure 3.** Figure 3: We bridge the visual gap between simulation and real world by applying rung-focused [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Real-world results of LadderMan. The humanoid performs zero-shot sim-to-real ladder climbing across diverse real-world ladders (A, B, and C) and executes various on-ladder manipulation tasks while maintaining stable balance. Additional video results are available on our website. ladder while executing manipulation tasks. Directly applying an off-the-shelf teleoperation policy often leads to unstable whole… view at source ↗

**Figure 5.** Figure 5: (a) Success rates of LadderMan and the blind motion tracking baseline across ladder con [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Failure case of the off-the-shelf teleop [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Side-by-side comparison between a human and LadderMan performing the same ladder [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Comparison of Unitree G1 collision geome [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

**Figure 9.** Figure 9: Comparison between raw depth observations and depth predictions generated by Fast [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

**Figure 10.** Figure 10: Comparison between the reference motion and the learned climbing behavior on a ladder different from that used during motion capture. Hybrid Motion Tracking. We provide a side-by-side comparison between the collected reference motion and the learned policy execution on a ladder configuration with (θ, s) = (55◦ , 20 cm), as shown in [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗

read the original abstract

Humanoid robots hold great promise for operating in human-centered environments, yet ladder climbing remains one of the most challenging tasks due to sparse footholds and handholds, complex whole-body coordination, and sensitivity to perception and control errors. We present \textbf{LadderMan}, a unified system that enables humanoid robots to robustly climb diverse ladders and perform manipulation under such constrained conditions. Our climbing policy is built on a scalable two-stage learning pipeline, where we use hybrid motion tracking to learn multiple climbing experts from a single reference motion, and distill these experts into a unified depth-based visuomotor climbing policy via hybrid imitation and reinforcement learning. To enable real-world deployment, we leverage vision foundation models to bridge the sim-to-real gap in depth perception. Building on the learned climbing policy, we further train a separate manipulation policy using a dual-agent formulation, allowing stable on-ladder manipulation via teleoperation. Experiments demonstrate that LadderMan achieves robust ladder climbing across a wide range of geometries, successfully transfers to real-world hardware in a zero-shot manner, and supports various manipulation tasks under challenging ladder constraints. Video results are available at https://ladderman-robot.github.io .

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LadderMan gives a concrete two-stage pipeline for humanoid ladder climbing plus on-ladder manipulation with claimed zero-shot depth transfer, but the abstract leaves performance numbers and ablations out.

read the letter

The paper's main deliverable is a scalable learning setup that first tracks multiple climbing experts from one reference motion via hybrid motion tracking, then distills them into a single depth-based visuomotor policy using imitation plus RL. A separate dual-agent policy handles manipulation while the robot stays on the ladder. Vision foundation models are used to close the depth sim-to-real gap, and the authors report successful hardware transfer plus manipulation under ladder constraints.

What stands out is the pragmatic combination: the hybrid tracking step to generate diversity from limited reference data, the distillation route to a unified policy, and the dual-agent split for the manipulation sub-task. These choices directly address the sparse-contact and coordination issues that make ladder climbing hard for humanoids. The decision to lean on off-the-shelf vision models rather than custom perception training is also sensible for deployment.

The soft spot is the lack of any numbers, ablations, or failure cases in the abstract. Without those it is hard to judge how much the hybrid tracking actually improves robustness or how often the zero-shot transfer fails on real ladders. The full text presumably supplies the metrics, but the current write-up leaves the central claims resting on demonstration rather than quantified evidence.

This is useful reading for groups already running humanoid locomotion or contact-rich manipulation experiments. A reader who needs a working ladder-climbing baseline or wants to see how dual-agent ideas extend to constrained environments will find the pipeline details worth extracting. It is worth sending to referees because the task is practically relevant and the methods build on established techniques without obvious internal contradictions.

Referee Report

1 major / 1 minor

Summary. The paper presents LadderMan, a unified system for humanoid ladder climbing and on-ladder manipulation. It proposes a scalable two-stage learning pipeline that first uses hybrid motion tracking to derive multiple climbing experts from a single reference motion, then distills them into a unified depth-based visuomotor policy via hybrid imitation and reinforcement learning. Vision foundation models are employed to bridge the sim-to-real gap for depth perception. A separate dual-agent manipulation policy is trained to enable stable teleoperated tasks under ladder constraints. The central claims are that the system achieves robust climbing across diverse ladder geometries, zero-shot transfer to real hardware, and support for manipulation tasks.

Significance. If the empirical results hold with rigorous validation, the work would advance humanoid robotics by addressing a high-difficulty task involving sparse contacts, whole-body coordination, and perception sensitivity. The hybrid tracking-to-distillation pipeline and use of off-the-shelf vision models for sim-to-real depth bridging represent potentially reusable techniques for complex locomotion. Credit is due for framing the problem as both perceptive climbing and constrained manipulation, which aligns with practical deployment needs.

major comments (1)

[Abstract] Abstract: The central claims of 'robust ladder climbing across a wide range of geometries' and 'successfully transfers to real-world hardware in a zero-shot manner' are presented without any quantitative metrics (e.g., success rates, traversal times, failure modes), ablation studies, or hardware specifications. This absence makes it impossible to evaluate whether the hybrid motion tracking and distillation pipeline actually delivers the reported outcomes, rendering the experimental demonstration load-bearing but unsupported.

minor comments (1)

[Abstract] The abstract references a project website for video results but provides no summary of what the videos demonstrate (e.g., specific ladder variations or failure recoveries), which would aid assessment of the claims.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive evaluation of the work's significance and for the constructive comment on the abstract. We respond to the major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The central claims of 'robust ladder climbing across a wide range of geometries' and 'successfully transfers to real-world hardware in a zero-shot manner' are presented without any quantitative metrics (e.g., success rates, traversal times, failure modes), ablation studies, or hardware specifications. This absence makes it impossible to evaluate whether the hybrid motion tracking and distillation pipeline actually delivers the reported outcomes, rendering the experimental demonstration load-bearing but unsupported.

Authors: We agree that the abstract would benefit from including quantitative metrics to support the stated claims and make the summary self-contained. The body of the manuscript reports the relevant experimental results, including success rates across ladder geometries, traversal times, failure mode analysis, ablation studies on the hybrid tracking and distillation components, and hardware specifications for the zero-shot transfer. We will revise the abstract to incorporate key quantitative results from these experiments. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents an empirical two-stage learning pipeline (hybrid motion tracking to produce climbing experts from one reference motion, followed by distillation into a unified depth-based policy via imitation+RL, plus vision foundation models for sim-to-real) and a separate dual-agent manipulation policy. No equations, fitted parameters, or mathematical derivations appear in the provided abstract or described claims. Central results are framed as experimental demonstrations of robustness, zero-shot transfer, and manipulation capability rather than predictions that reduce to inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked in a load-bearing way. This matches the reader's assessment of score 2.0 but is set to 0 because no circular steps of any enumerated kind are identifiable from the text.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no explicit free parameters, axioms, or invented entities are stated. The approach implicitly assumes that reference motion data and vision foundation models provide sufficient grounding.

pith-pipeline@v0.9.1-grok · 5766 in / 1033 out tokens · 21558 ms · 2026-06-28T01:20:00.079034+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 6 linked inside Pith

[1]

Hammer and U

W. Hammer and U. Schmalz. Human behaviour when climbing ladders with varying inclina- tions.Safety Science, 15(1):21–38, 1992

1992
[2]

Zhang, Y

Y . Zhang, Y . Seo, J. Chen, Y . Yuan, K. Sreenath, P. Abbeel, C. Sferrazza, K. Liu, R. Duan, and G. Shi. Rpl: Learning robust humanoid perceptive locomotion on challenging terrains.arXiv preprint arXiv:2602.03002, 2026

arXiv 2026
[3]

Zhuang, S

Z. Zhuang, S. Yao, and H. Zhao. Humanoid parkour learning.arXiv preprint arXiv:2406.10759, 2024

arXiv 2024
[4]

H. Wang, Z. Wang, J. Ren, Q. Ben, T. Huang, W. Zhang, and J. Pang. Beamdojo: Learning agile humanoid locomotion on sparse footholds. InRobotics: Science and Systems (RSS), 2025

2025
[5]

Yoneda, K

H. Yoneda, K. Sekiyama, Y . Hasegawa, and T. Fukuda. Vertical ladder climbing motion with posture control for multi-locomotion robot. In2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3579–3584, 2008

2008
[6]

J. Luo, Y . Zhang, K. Hauser, H. A. Park, M. Paldhe, C. S. G. Lee, M. Grey, M. Stilman, J. H. Oh, J. Lee, I. Kim, and P. Oh. Robust ladder-climbing with a humanoid robot with application to the darpa robotics challenge. In2014 IEEE International Conference on Robotics and Automation (ICRA), pages 2792–2798, 2014

2014
[7]

Vaillant, A

J. Vaillant, A. Kheddar, H. Audren, F. Keith, S. Brossette, A. Escande, K. Kaneko, M. Mori- sawa, P. Gergondet, E. Yoshida, S. Kajita, and F. Kanehiro. Multi-contact vertical ladder climbing with an hrp-2 humanoid.Autonomous Robots, 2016

2016
[8]

Yoshiike, M

T. Yoshiike, M. Kuroda, R. Ujino, H. Kaneko, H. Higuchi, S. Iwasaki, Y . Kanemoto, M. Asa- tani, and T. Koshiishi. Development of experimental legged robot for inspection and disaster response in plants. In2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4869–4876, 2017

2017
[9]

X. Sun, K. Hashimoto, T. Teramachi, T. Matsuzawa, S. Kimura, N. Sakai, S. Hayashi, Y . Yoshida, and A. Takanishi. Planning and control of stable ladder climbing motion for the four-limbed robot “warec-1”. In2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 6547–6554, 2017

2017
[10]

A. A. Saputra, Y . Toda, N. Takesue, and N. Kubota. A novel capabilities of quadruped robot moving through vertical ladder without handrail support. In2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1448–1453, 2019

2019
[11]

X. Sun, K. Hashimoto, S. Hayashi, M. Okawara, T. Mastuzawa, and A. Takanishi. Stable vertical ladder climbing with rung recognition for a four-limbed robot.Journal of Bionic Engineering, 2021

2021
[12]

H. Weng, Y . Li, N. Sobanbabu, Z. Wang, Z. Luo, T. He, D. Ramanan, and G. Shi. Hdmi: Learning interactive humanoid whole-body control from human videos.arXiv preprint arXiv:2509.16757, 2025

arXiv 2025
[13]

S. Zhao, Y . Ze, Y . Wang, C. K. Liu, P. Abbeel, G. Shi, and R. Duan. Resmimic: From gen- eral motion tracking to humanoid whole-body loco-manipulation via residual learning.arXiv preprint arXiv:2510.05070, 2025

arXiv 2025
[14]

B. Wen, S. Dewan, and S. Birchfield. Fast-FoundationStereo: Real-time zero-shot stereo matching.CVPR, 2026. 10

2026
[15]

Zhang, Y

Y . Zhang, Y . Yuan, P. Gurunath, I. Gupta, S. Omidshafiei, A.-a. Agha-mohammadi, M. Vazquez-Chanlatte, L. Pedersen, T. He, and G. Shi. Falcon: Learning force-adaptive hu- manoid loco-manipulation.arXiv preprint arXiv:2505.06776, 2025

arXiv 2025
[16]

Y . Ze, S. Zhao, W. Wang, A. Kanazawa, R. Duan, P. Abbeel, G. Shi, J. Wu, and C. K. Liu. Twist2: Scalable, portable, and holistic humanoid data collection system.arXiv preprint arXiv:2511.02832, 2025

arXiv 2025
[17]

Z. Luo, Y . Yuan, T. Wang, C. Li, S. Chen, F. Casta ˜neda, Z.-A. Cao, J. Li, D. Minor, Q. Ben, X. Da, R. Ding, C. Hogg, L. Song, E. Lim, E. Jeong, T. He, H. Xue, W. Xiao, Z. Wang, S. Yuen, J. Kautz, Y . Chang, U. Iqbal, L. Fan, and Y . Zhu. Sonic: Supersizing motion tracking for natural humanoid whole-body control.arXiv preprint arXiv:2511.07820, 2025

Pith/arXiv arXiv 2025
[18]

Darpa robotics challenge (drc)

Defense Advanced Research Projects Agency (DARPA). Darpa robotics challenge (drc). https://www.darpa.mil/research/programs/darpa-robotics-challenge, 2026

2026
[19]

V ogel, R

D. V ogel, R. Baines, J. Church, J. Lotzer, K. Werner, and M. Hutter. Robust ladder climbing with a quadrupedal robot.arXiv preprint arXiv:2409.17731, 2025

arXiv 2025
[20]

T. Miki, J. Lee, J. Hwangbo, L. Wellhausen, V . Koltun, and M. Hutter. Learning robust per- ceptive locomotion for quadrupedal robots in the wild.Science Robotics, 7(62):eabk2822, 2022

2022
[21]

J. Long, J. Ren, M. Shi, Z. Wang, T. Huang, P. Luo, and J. Pang. Learning humanoid locomo- tion with perceptive internal model.arXiv preprint arXiv:2411.14386, 2024

arXiv 2024
[22]

Hoeller, N

D. Hoeller, N. Rudin, D. Sako, and M. Hutter. Anymal parkour: Learning agile navigation for quadrupedal robots.Science Robotics, 9(88):eadi7566, 2024

2024
[23]

J. He, C. Zhang, F. Jenelten, R. Grandia, M. B ¨acher, and M. Hutter. Attention-based map encoding for learning generalized legged locomotion.Science Robotics, 10(105):eadv3604, 2025

2025
[24]

Fankhauser, M

P. Fankhauser, M. Bloesch, and M. Hutter. Probabilistic terrain mapping for mobile robots with uncertain localization.IEEE Robotics and Automation Letters, 3(4):3019–3026, 2018

2018
[25]

Q. Ben, B. Xu, K. Li, F. Jia, W. Zhang, J. Wang, J. Wang, D. Lin, and J. Pang. Gallant: V oxel grid-based humanoid locomotion and local-navigation across 3d constrained terrains.arXiv preprint arXiv:2511.14625, 2025

arXiv 2025
[26]

Agarwal, A

A. Agarwal, A. Kumar, J. Malik, and D. Pathak. Legged locomotion in challenging terrains using egocentric vision. InProceedings of The 6th Conference on Robot Learning, volume 205 ofProceedings of Machine Learning Research, pages 403–415. PMLR, 14–18 Dec 2023

2023
[27]

Cheng, K

X. Cheng, K. Shi, A. Agarwal, and D. Pathak. Extreme parkour with legged robots.arXiv preprint arXiv:2309.14341, 2023

arXiv 2023
[28]

Zhuang, Z

Z. Zhuang, Z. Fu, J. Wang, C. Atkeson, S. Schwertfeger, C. Finn, and H. Zhao. Robot parkour learning.arXiv preprint arXiv:2309.05665, 2023

arXiv 2023
[29]

R. Yang, M. Zhang, N. Hansen, H. Xu, and X. Wang. Learning vision-guided quadrupedal lo- comotion end-to-end with cross-modal transformers.arXiv preprint arXiv:2107.03996, 2021

arXiv 2021
[30]

Z. Wu, X. Huang, L. Yang, Y . Zhang, X. Chen, P. Abbeel, R. Duan, A. Kanazawa, C. Sferrazza, G. Shi, and C. K. Liu. Perceptive humanoid parkour: Chaining dynamic human skills via motion matching.arXiv preprint arXiv:2602.15827, 2026

Pith/arXiv arXiv 2026
[31]

Zhang, K

Z. Zhang, K. Wen, M. Xu, J. He, C. Li, T. Miki, C. Schwarke, C. Zhang, X. B. Peng, and M. Hutter. Learning whole-body humanoid locomotion via motion generation and motion tracking.arXiv preprint arXiv:2604.17335, 2026. 11

Pith/arXiv arXiv 2026
[32]

X. B. Peng, P. Abbeel, S. Levine, and M. van de Panne. Deepmimic: Example-guided deep reinforcement learning of physics-based character skills.ACM Trans. Graph., 37(4):143:1– 143:14, 2018

2018
[33]

Q. Liao, T. E. Truong, X. Huang, Y . Gao, G. Tevet, K. Sreenath, and C. K. Liu. Beyondmimic: From motion tracking to versatile humanoid control via guided diffusion.arXiv preprint arXiv:2508.08241, 2025

Pith/arXiv arXiv 2025
[34]

S. Ross, G. Gordon, and D. Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. InProceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 ofProceedings of Machine Learning Research, pages 627–635. PMLR, 2011

2011
[35]

Schulman, F

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

Pith/arXiv arXiv 2017
[36]

M. Macklin. Warp: A high-performance python framework for gpu simulation and graphics, Mar. 2022. URLhttps://github.com/NVIDIA/warp. NVIDIA GPU Technology Confer- ence (GTC)

2022
[37]

S. Zhu, Z. Zhuang, M. Zhao, K.-Y . Lee, and H. Zhao. Hiking in the wild: A scalable perceptive parkour framework for humanoids.arXiv preprint arXiv:2601.07718, 2026

arXiv 2026
[38]

Zhuang, S

Z. Zhuang, S. Zhu, M. Zhao, and H. Zhao. Deep whole-body parkour.arXiv preprint arXiv:2601.07701, 2026

arXiv 2026
[39]

Rudin, J

N. Rudin, J. He, J. Aurand, and M. Hutter. Parkour in the wild: Learning a general and extensible agile locomotion policy using multi-expert distillation and rl fine-tuning.arXiv preprint arXiv:2505.11164, 2025

arXiv 2025
[40]

Mahmood, N

N. Mahmood, N. Ghorbani, N. F. Troje, G. Pons-Moll, and M. J. Black. AMASS: Archive of motion capture as surface shapes. InInternational Conference on Computer Vision, pages 5442–5451, 2019

2019
[41]

Isaac Sim

NVIDIA. Isaac Sim. URLhttps://github.com/isaac-sim/IsaacSim
[42]

L. Yang, X. Huang, Z. Wu, A. Kanazawa, P. Abbeel, C. Sferrazza, C. K. Liu, R. Duan, and G. Shi. Omniretarget: Interaction-preserving data generation for humanoid whole-body loco- manipulation and scene interaction.arXiv preprint arXiv:2509.26633, 2025. 12 A Method A.1 Learning Multiple Expert Policies from a Single Reference Motion In this stage, our goal...

Pith/arXiv arXiv 2025

[1] [1]

Hammer and U

W. Hammer and U. Schmalz. Human behaviour when climbing ladders with varying inclina- tions.Safety Science, 15(1):21–38, 1992

1992

[2] [2]

Zhang, Y

Y . Zhang, Y . Seo, J. Chen, Y . Yuan, K. Sreenath, P. Abbeel, C. Sferrazza, K. Liu, R. Duan, and G. Shi. Rpl: Learning robust humanoid perceptive locomotion on challenging terrains.arXiv preprint arXiv:2602.03002, 2026

arXiv 2026

[3] [3]

Zhuang, S

Z. Zhuang, S. Yao, and H. Zhao. Humanoid parkour learning.arXiv preprint arXiv:2406.10759, 2024

arXiv 2024

[4] [4]

H. Wang, Z. Wang, J. Ren, Q. Ben, T. Huang, W. Zhang, and J. Pang. Beamdojo: Learning agile humanoid locomotion on sparse footholds. InRobotics: Science and Systems (RSS), 2025

2025

[5] [5]

Yoneda, K

H. Yoneda, K. Sekiyama, Y . Hasegawa, and T. Fukuda. Vertical ladder climbing motion with posture control for multi-locomotion robot. In2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3579–3584, 2008

2008

[6] [6]

J. Luo, Y . Zhang, K. Hauser, H. A. Park, M. Paldhe, C. S. G. Lee, M. Grey, M. Stilman, J. H. Oh, J. Lee, I. Kim, and P. Oh. Robust ladder-climbing with a humanoid robot with application to the darpa robotics challenge. In2014 IEEE International Conference on Robotics and Automation (ICRA), pages 2792–2798, 2014

2014

[7] [7]

Vaillant, A

J. Vaillant, A. Kheddar, H. Audren, F. Keith, S. Brossette, A. Escande, K. Kaneko, M. Mori- sawa, P. Gergondet, E. Yoshida, S. Kajita, and F. Kanehiro. Multi-contact vertical ladder climbing with an hrp-2 humanoid.Autonomous Robots, 2016

2016

[8] [8]

Yoshiike, M

T. Yoshiike, M. Kuroda, R. Ujino, H. Kaneko, H. Higuchi, S. Iwasaki, Y . Kanemoto, M. Asa- tani, and T. Koshiishi. Development of experimental legged robot for inspection and disaster response in plants. In2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4869–4876, 2017

2017

[9] [9]

X. Sun, K. Hashimoto, T. Teramachi, T. Matsuzawa, S. Kimura, N. Sakai, S. Hayashi, Y . Yoshida, and A. Takanishi. Planning and control of stable ladder climbing motion for the four-limbed robot “warec-1”. In2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 6547–6554, 2017

2017

[10] [10]

A. A. Saputra, Y . Toda, N. Takesue, and N. Kubota. A novel capabilities of quadruped robot moving through vertical ladder without handrail support. In2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1448–1453, 2019

2019

[11] [11]

X. Sun, K. Hashimoto, S. Hayashi, M. Okawara, T. Mastuzawa, and A. Takanishi. Stable vertical ladder climbing with rung recognition for a four-limbed robot.Journal of Bionic Engineering, 2021

2021

[12] [12]

H. Weng, Y . Li, N. Sobanbabu, Z. Wang, Z. Luo, T. He, D. Ramanan, and G. Shi. Hdmi: Learning interactive humanoid whole-body control from human videos.arXiv preprint arXiv:2509.16757, 2025

arXiv 2025

[13] [13]

S. Zhao, Y . Ze, Y . Wang, C. K. Liu, P. Abbeel, G. Shi, and R. Duan. Resmimic: From gen- eral motion tracking to humanoid whole-body loco-manipulation via residual learning.arXiv preprint arXiv:2510.05070, 2025

arXiv 2025

[14] [14]

B. Wen, S. Dewan, and S. Birchfield. Fast-FoundationStereo: Real-time zero-shot stereo matching.CVPR, 2026. 10

2026

[15] [15]

Zhang, Y

Y . Zhang, Y . Yuan, P. Gurunath, I. Gupta, S. Omidshafiei, A.-a. Agha-mohammadi, M. Vazquez-Chanlatte, L. Pedersen, T. He, and G. Shi. Falcon: Learning force-adaptive hu- manoid loco-manipulation.arXiv preprint arXiv:2505.06776, 2025

arXiv 2025

[16] [16]

Y . Ze, S. Zhao, W. Wang, A. Kanazawa, R. Duan, P. Abbeel, G. Shi, J. Wu, and C. K. Liu. Twist2: Scalable, portable, and holistic humanoid data collection system.arXiv preprint arXiv:2511.02832, 2025

arXiv 2025

[17] [17]

Z. Luo, Y . Yuan, T. Wang, C. Li, S. Chen, F. Casta ˜neda, Z.-A. Cao, J. Li, D. Minor, Q. Ben, X. Da, R. Ding, C. Hogg, L. Song, E. Lim, E. Jeong, T. He, H. Xue, W. Xiao, Z. Wang, S. Yuen, J. Kautz, Y . Chang, U. Iqbal, L. Fan, and Y . Zhu. Sonic: Supersizing motion tracking for natural humanoid whole-body control.arXiv preprint arXiv:2511.07820, 2025

Pith/arXiv arXiv 2025

[18] [18]

Darpa robotics challenge (drc)

Defense Advanced Research Projects Agency (DARPA). Darpa robotics challenge (drc). https://www.darpa.mil/research/programs/darpa-robotics-challenge, 2026

2026

[19] [19]

V ogel, R

D. V ogel, R. Baines, J. Church, J. Lotzer, K. Werner, and M. Hutter. Robust ladder climbing with a quadrupedal robot.arXiv preprint arXiv:2409.17731, 2025

arXiv 2025

[20] [20]

T. Miki, J. Lee, J. Hwangbo, L. Wellhausen, V . Koltun, and M. Hutter. Learning robust per- ceptive locomotion for quadrupedal robots in the wild.Science Robotics, 7(62):eabk2822, 2022

2022

[21] [21]

J. Long, J. Ren, M. Shi, Z. Wang, T. Huang, P. Luo, and J. Pang. Learning humanoid locomo- tion with perceptive internal model.arXiv preprint arXiv:2411.14386, 2024

arXiv 2024

[22] [22]

Hoeller, N

D. Hoeller, N. Rudin, D. Sako, and M. Hutter. Anymal parkour: Learning agile navigation for quadrupedal robots.Science Robotics, 9(88):eadi7566, 2024

2024

[23] [23]

J. He, C. Zhang, F. Jenelten, R. Grandia, M. B ¨acher, and M. Hutter. Attention-based map encoding for learning generalized legged locomotion.Science Robotics, 10(105):eadv3604, 2025

2025

[24] [24]

Fankhauser, M

P. Fankhauser, M. Bloesch, and M. Hutter. Probabilistic terrain mapping for mobile robots with uncertain localization.IEEE Robotics and Automation Letters, 3(4):3019–3026, 2018

2018

[25] [25]

Q. Ben, B. Xu, K. Li, F. Jia, W. Zhang, J. Wang, J. Wang, D. Lin, and J. Pang. Gallant: V oxel grid-based humanoid locomotion and local-navigation across 3d constrained terrains.arXiv preprint arXiv:2511.14625, 2025

arXiv 2025

[26] [26]

Agarwal, A

A. Agarwal, A. Kumar, J. Malik, and D. Pathak. Legged locomotion in challenging terrains using egocentric vision. InProceedings of The 6th Conference on Robot Learning, volume 205 ofProceedings of Machine Learning Research, pages 403–415. PMLR, 14–18 Dec 2023

2023

[27] [27]

Cheng, K

X. Cheng, K. Shi, A. Agarwal, and D. Pathak. Extreme parkour with legged robots.arXiv preprint arXiv:2309.14341, 2023

arXiv 2023

[28] [28]

Zhuang, Z

Z. Zhuang, Z. Fu, J. Wang, C. Atkeson, S. Schwertfeger, C. Finn, and H. Zhao. Robot parkour learning.arXiv preprint arXiv:2309.05665, 2023

arXiv 2023

[29] [29]

R. Yang, M. Zhang, N. Hansen, H. Xu, and X. Wang. Learning vision-guided quadrupedal lo- comotion end-to-end with cross-modal transformers.arXiv preprint arXiv:2107.03996, 2021

arXiv 2021

[30] [30]

Z. Wu, X. Huang, L. Yang, Y . Zhang, X. Chen, P. Abbeel, R. Duan, A. Kanazawa, C. Sferrazza, G. Shi, and C. K. Liu. Perceptive humanoid parkour: Chaining dynamic human skills via motion matching.arXiv preprint arXiv:2602.15827, 2026

Pith/arXiv arXiv 2026

[31] [31]

Zhang, K

Z. Zhang, K. Wen, M. Xu, J. He, C. Li, T. Miki, C. Schwarke, C. Zhang, X. B. Peng, and M. Hutter. Learning whole-body humanoid locomotion via motion generation and motion tracking.arXiv preprint arXiv:2604.17335, 2026. 11

Pith/arXiv arXiv 2026

[32] [32]

X. B. Peng, P. Abbeel, S. Levine, and M. van de Panne. Deepmimic: Example-guided deep reinforcement learning of physics-based character skills.ACM Trans. Graph., 37(4):143:1– 143:14, 2018

2018

[33] [33]

Q. Liao, T. E. Truong, X. Huang, Y . Gao, G. Tevet, K. Sreenath, and C. K. Liu. Beyondmimic: From motion tracking to versatile humanoid control via guided diffusion.arXiv preprint arXiv:2508.08241, 2025

Pith/arXiv arXiv 2025

[34] [34]

S. Ross, G. Gordon, and D. Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. InProceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 ofProceedings of Machine Learning Research, pages 627–635. PMLR, 2011

2011

[35] [35]

Schulman, F

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

Pith/arXiv arXiv 2017

[36] [36]

M. Macklin. Warp: A high-performance python framework for gpu simulation and graphics, Mar. 2022. URLhttps://github.com/NVIDIA/warp. NVIDIA GPU Technology Confer- ence (GTC)

2022

[37] [37]

S. Zhu, Z. Zhuang, M. Zhao, K.-Y . Lee, and H. Zhao. Hiking in the wild: A scalable perceptive parkour framework for humanoids.arXiv preprint arXiv:2601.07718, 2026

arXiv 2026

[38] [38]

Zhuang, S

Z. Zhuang, S. Zhu, M. Zhao, and H. Zhao. Deep whole-body parkour.arXiv preprint arXiv:2601.07701, 2026

arXiv 2026

[39] [39]

Rudin, J

N. Rudin, J. He, J. Aurand, and M. Hutter. Parkour in the wild: Learning a general and extensible agile locomotion policy using multi-expert distillation and rl fine-tuning.arXiv preprint arXiv:2505.11164, 2025

arXiv 2025

[40] [40]

Mahmood, N

N. Mahmood, N. Ghorbani, N. F. Troje, G. Pons-Moll, and M. J. Black. AMASS: Archive of motion capture as surface shapes. InInternational Conference on Computer Vision, pages 5442–5451, 2019

2019

[41] [41]

Isaac Sim

NVIDIA. Isaac Sim. URLhttps://github.com/isaac-sim/IsaacSim

[42] [42]

L. Yang, X. Huang, Z. Wu, A. Kanazawa, P. Abbeel, C. Sferrazza, C. K. Liu, R. Duan, and G. Shi. Omniretarget: Interaction-preserving data generation for humanoid whole-body loco- manipulation and scene interaction.arXiv preprint arXiv:2509.26633, 2025. 12 A Method A.1 Learning Multiple Expert Policies from a Single Reference Motion In this stage, our goal...

Pith/arXiv arXiv 2025