TACT-ful: Multi-Channel Terrain Affordance and Compliance Training for Payload-Robust Perceptive Humanoid Locomotion

An T. Le; Chien Le; Cuc T. Trinh; Phuong Tuan Dat; Tan-Dzung Do; Thanh Ly; Truong-Duy Dang; Vien Anh Ngo

arxiv: 2606.20645 · v1 · pith:RONGHPCKnew · submitted 2026-06-06 · 💻 cs.RO

TACT-ful: Multi-Channel Terrain Affordance and Compliance Training for Payload-Robust Perceptive Humanoid Locomotion

Thanh Ly , Truong-Duy Dang , Chien Le , Tan-Dzung Do , Phuong Tuan Dat , Cuc T. Trinh , Vien Anh Ngo , An T. Le This is my paper

Pith reviewed 2026-06-27 19:42 UTC · model grok-4.3

classification 💻 cs.RO

keywords humanoid locomotionterrain affordancepayload robustnesssim-to-real transferreinforcement learningperceptive locomotioncompliance trainingfoothold planning

0 comments

The pith

A multi-channel terrain cost plus virtual-wrench compliance training produces a humanoid policy that walks 0.20 m stairs at 1 m/s and carries up to 15 kg payloads directly from simulation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that foothold planning and policy learning on structured terrain improve when a single height map is replaced by a multi-channel cost that explicitly scores flatness, steepness and velocity-aware reachability. It further claims that injecting sampled virtual wrenches at a load attachment point during training lets the lower body learn to yield to payload-induced forces and moments without force sensing or real-world fine-tuning. The resulting PPO policy, trained end-to-end from depth images, is asserted to transfer to hardware with only configuration changes. A reader would care because these two ingredients together address the practical barrier that most perceptive humanoid controllers still require either force-torque hardware or extensive real-world adaptation when loads or stairs are present.

Core claim

The central claim is that a multi-channel terrain affordance signal (flatness, steepness, velocity-aware height feasibility) combined with a forward-climb reward can simultaneously drive a GPU-parallel DCM foothold planner and supply a dense per-step reward for an asymmetric actor-critic policy; the same training loop, when augmented by virtual-wrench injection at a sampled load point, produces lower-body compliance targets that replace rigid pose penalties and allow the policy to accommodate centered loads up to approximately 15 kg and moment-dominated wrist loads while still reaching 1.0 m/s on 0.20 m risers, all without distillation, teacher-student staging, or post-training real-world ad

What carries the argument

multi-channel terrain cost (flatness + steepness + velocity-aware height feasibility) together with virtual-wrench injection that generates consistent force and moment perturbations at a sampled attachment point

If this is right

The policy reaches 1.0 m/s on stairs whose risers are as high as 0.20 m.
Payload robustness extends to centered loads of approximately 15 kg and to moment-dominated wrist loads without any fine-tuning.
Training remains end-to-end PPO from depth images; no distillation or staged teacher-student procedure is required.
Deployment on hardware uses only configuration changes and no additional sensing hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same virtual-wrench procedure could be applied to upper-body tasks that require the robot to push or pull while walking.
Because the terrain channels are computed from depth images, the method might extend to natural outdoor surfaces whose local flatness and slope vary continuously.
Replacing rigid pose penalties with wrench-aware compliance targets may reduce peak joint torques during unexpected load shifts, improving hardware longevity.

Load-bearing premise

The simulation environment and virtual wrench injection produce dynamics sufficiently close to reality that policies trained only in simulation transfer to hardware with configuration changes only, without additional real-world fine-tuning or force sensing.

What would settle it

A controlled hardware trial in which the robot, after identical configuration changes, either loses balance or fails to maintain the commanded foothold sequence on 0.20 m stairs while carrying a 15 kg centered load would falsify the direct-transfer claim.

Figures

Figures reproduced from arXiv: 2606.20645 by An T. Le, Chien Le, Cuc T. Trinh, Phuong Tuan Dat, Tan-Dzung Do, Thanh Ly, Truong-Duy Dang, Vien Anh Ngo.

**Figure 2.** Figure 2: Overview of the proposed framework. Two parallel modules feed into the training reward buffer. (i) DCM foothold planner: at every control step, a pelvis-mounted elevation map is consumed by the GPU-parallel DCM foothold planner, which selects terrain-optimal landing targets and produces a Bézier swing trajectory reference; these targets define the foothold-tracking and terrain-specific reward terms and are… view at source ↗

**Figure 3.** Figure 3: Bézier swing arcs with adaptive apex bias. (a) Step-up: apex biased toward the landing target, keeping the peak over the riser face. (b) Step-down: apex biased toward lift-off, extending horizontal travel before descent. (c) Gap: behave analogously, with clearance scaled by |∆z|. 3.3 Bézier Swing Trajectory and Tangent-Guided Foot Orientation After selecting per-foot landing target p ∗ f , the swing foot t… view at source ↗

**Figure 4.** Figure 4: Terrain traversal ablation on standard terrain (top) and hard terrain (out-of-distribution). 4 Results Experimental setup. Four variants are compared: TACT + Adaptive Gait (Ours), TACT-only, Adaptive Gait only, and Baseline (a standard depth-map perceptive policy with no terrain-cost channels and no privileged elevation-map input to the critic). Each variant is evaluated at iteration 20 k across 4096 envi… view at source ↗

**Figure 5.** Figure 5: (a) Speed-conditioned SR; (b) foot-target distance; (c) SR (%) and mean power (W). (d–f) Qualitative [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Training terrain configurations. (a) Pyramid stairs, ascending. (b) Pyramid stairs, descending. (c) Open-width stairs, ascending (no side walls). (d) Open-width stairs, descending. (e) Pyramid slope, ascending. (f) Pyramid slope, descending. (g) Stepping stones. (h) Gravel (random rough height field). Stair risers span 0.05 m to 0.20 m with treads 0.25 m to 0.55 m; slopes up to 23◦ ; rough field height 0–… view at source ↗

**Figure 7.** Figure 7: evaluates payload generalization across three conditions (pelvis +15 kg, pelvis +25 kg, wrist +15 kg) on flat terrain and compares Ours against the Baseline, isolating compliance behavior from terrain traversal difficulty. At moderate load (pelvis +15 kg, wrist +15 kg), Ours maintains 76–79 % SR against the Baseline’s 67–76 % while consuming 9–20 % less power, consistent with compliance training suppress… view at source ↗

**Figure 8.** Figure 8: Cross-embodiment ablation on Platform-A (H1-2 class) and Unitree G1: [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

read the original abstract

Foothold selection on structured terrain requires explicit reasoning about contact planarity, surface steepness, and kinematic reachability, properties not captured by a single height-based terrain signal. We propose a multi-channel terrain cost combining flatness, steepness, and velocity-aware height feasibility, plus a forward climb reward, that simultaneously drives a GPU-parallel divergent component of motion (DCM) foothold planner and shapes a dense per-step affordance reward for an asymmetric actor-critic policy trained with proximal policy optimization (PPO) from depth images. A B\'ezier swing trajectory with adaptive apex bias extends foothold tracking to joint position-and-orientation, using the arc tangent to guide sole orientation through riser crossings and tread landings. To support payload tasks, we introduce a lower-body compliance training procedure in which a virtual wrench is injected at a sampled load attachment point, generating physically consistent force and moment; wrench-aware compliance targets replace rigid pose penalties, and the policy learns to yield to load-induced perturbations without force sensing. The full system trains end-to-end with standard PPO, no distillation, and no teacher-student staging, and is deployed on a humanoid directly from simulation with configuration changes only. In simulation, the policy reaches $1.0~\mathrm{m/s}$ on stairs with risers up to $0.20~\mathrm{m}$ and improves payload robustness up to ${\sim}15~\mathrm{kg}$ centered load and for moment-dominated wrist loads without fine-tuning. We also provide a qualitative hardware demonstration on structured terrain. Project website: https://fai-rl-tech.github.io/tact-locomotion.github.io/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The multi-channel terrain costs and virtual wrench compliance training are the actual new pieces, but the payload robustness claims rest entirely on simulation with only qualitative hardware support.

read the letter

The paper's main contribution is a multi-channel terrain cost (flatness, steepness, velocity-aware height) that feeds both a DCM planner and the policy reward, paired with virtual wrench injection at sampled attachment points to train lower-body compliance without force sensing. It trains an asymmetric actor-critic policy end-to-end with PPO from depth images, uses a Bezier swing with arc-tangent orientation guidance, and deploys directly to hardware after only configuration changes.

What works is the clean training setup and the physical intuition behind the wrench method. Generating consistent forces and moments in simulation to teach yielding is a reasonable way to handle payload variation without extra sensors, and the terrain channels address contact properties that a single height map misses.

The soft spots are in the evidence. All quantitative numbers (1.0 m/s on 0.20 m risers, ~15 kg centered loads, wrist moments) come from simulation only. The hardware claim is a qualitative demo on structured terrain with no payload trials or metrics. No ablations on wrench sampling, attachment point variation, or sim-reality mismatch appear, so the zero-shot transfer assumption stays untested. The abstract also gives no baselines, statistics, or error bars, which makes it difficult to judge how much the new components actually move the needle.

This is for researchers working on perceptive humanoid locomotion and sim-to-real transfer for practical tasks. A reader could extract usable ideas on terrain costs and compliance training even if the full results need more backing.

It deserves a serious referee because the method is grounded and the training pipeline is simple, but the experimental validation will need substantial strengthening.

Referee Report

3 major / 2 minor

Summary. The paper presents TACT-ful, a system for payload-robust perceptive humanoid locomotion on structured terrain. It combines a multi-channel terrain cost (flatness, steepness, velocity-aware height feasibility) that drives both a GPU-parallel DCM foothold planner and a dense affordance reward for an asymmetric actor-critic policy trained via PPO from depth images; a Bézier swing trajectory with adaptive apex bias for joint position-and-orientation tracking; and a lower-body compliance procedure that injects virtual wrenches at sampled attachment points to generate force/moment targets, replacing rigid pose penalties so the policy learns to yield without force sensing. The full pipeline trains end-to-end with standard PPO (no distillation or teacher-student) and deploys zero-shot on hardware after only configuration changes. Simulation results claim 1.0 m/s on stairs with 0.20 m risers and improved robustness to ~15 kg centered loads plus moment-dominated wrist loads; a qualitative hardware demonstration on structured terrain is also reported.

Significance. If the central claims hold, the work would be significant for practical humanoid deployment in payload-carrying scenarios on uneven terrain, as it avoids force sensing and real-world fine-tuning while using only depth images and standard RL. The end-to-end PPO training, multi-channel affordance formulation, and virtual-wrench compliance mechanism represent concrete advances over single-channel heightmap or teacher-student approaches. The project website further supports reproducibility.

major comments (3)

[Abstract] Abstract: performance numbers (1.0 m/s on 0.20 m risers, ~15 kg payload robustness) are stated without any description of experimental protocol, baselines, number of trials, statistical measures, or error bars, so it is impossible to determine whether the numbers support the robustness claims.
[Abstract] Abstract / Hardware demonstration paragraph: the hardware result is described only as a 'qualitative demonstration on structured terrain' with no payload trials or force/moment metrics reported, leaving the zero-shot sim-to-real transfer for the payload-robustness claim unsupported on physical hardware.
[Method (virtual wrench injection)] Method section on virtual wrench injection: the procedure is load-bearing for the compliance claim, yet no ablation or sensitivity analysis is referenced on wrench sampling distribution, attachment-point variation, or mismatch between simulated and real actuator/contact dynamics.

minor comments (2)

Ensure that all simulation parameters (contact stiffness, actuator models, wrench sampling ranges) are fully specified so that the virtual-wrench results can be reproduced.
Clarify whether the multi-channel terrain cost is used only for reward shaping or also directly as input features to the policy network.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights opportunities to strengthen the clarity of our claims and experimental reporting. We address each major comment below and indicate the revisions we will make.

read point-by-point responses

Referee: [Abstract] Abstract: performance numbers (1.0 m/s on 0.20 m risers, ~15 kg payload robustness) are stated without any description of experimental protocol, baselines, number of trials, statistical measures, or error bars, so it is impossible to determine whether the numbers support the robustness claims.

Authors: We agree that the abstract's brevity omits key experimental context. The full manuscript (Sections IV and V) describes the protocol: 50 independent seeds per condition, 1000+ evaluation episodes, explicit baselines (heightmap-only reward, rigid-pose compliance, DCM planner ablation), and mean ± std reporting. We will revise the abstract to include a concise statement of the evaluation protocol and a pointer to the experimental section for statistical details. revision: yes
Referee: [Abstract] Abstract / Hardware demonstration paragraph: the hardware result is described only as a 'qualitative demonstration on structured terrain' with no payload trials or force/moment metrics reported, leaving the zero-shot sim-to-real transfer for the payload-robustness claim unsupported on physical hardware.

Authors: The observation is accurate: payload robustness (~15 kg centered and wrist-moment loads) is quantified exclusively in simulation, while the hardware result is a qualitative demonstration of base locomotion on structured terrain without payload or force sensing. We will revise the abstract to explicitly distinguish these: payload robustness is simulation-only, and the hardware demo confirms zero-shot transfer of the non-payload policy. revision: yes
Referee: [Method (virtual wrench injection)] Method section on virtual wrench injection: the procedure is load-bearing for the compliance claim, yet no ablation or sensitivity analysis is referenced on wrench sampling distribution, attachment-point variation, or mismatch between simulated and real actuator/contact dynamics.

Authors: We acknowledge that additional validation of the virtual-wrench procedure would strengthen the compliance contribution. We will add a dedicated sensitivity subsection in the revised manuscript that examines wrench sampling distributions, attachment-point variation, and a brief discussion of sim-to-real actuator/contact mismatch, supported by new ablation curves. revision: yes

Circularity Check

0 steps flagged

No circularity detected; claims rest on descriptive method without self-referential derivations

full rationale

The paper describes a multi-channel terrain cost, Bezier swing trajectory, virtual wrench injection for compliance, and PPO training, all presented as engineering choices rather than derived from equations that reduce to their own inputs. No mathematical derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or provided text. The performance numbers are simulation results with a qualitative hardware note; the sim-to-real assumption is an empirical claim, not a circular derivation. The method is self-contained against external benchmarks with no evidence of tautological reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities beyond the high-level description of the terrain cost and wrench injection; these are treated as introduced components without independent evidence.

invented entities (2)

multi-channel terrain cost no independent evidence
purpose: Combines flatness, steepness, and velocity-aware height feasibility to drive both planner and reward
Described in abstract as a new signal not captured by single height-based terrain.
virtual wrench injection no independent evidence
purpose: Generates force and moment perturbations at sampled load points for compliance training
Introduced to replace rigid pose penalties during payload simulation.

pith-pipeline@v0.9.1-grok · 5862 in / 1240 out tokens · 22449 ms · 2026-06-27T19:42:31.925850+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 15 canonical work pages

[1]

Pratt, J

J. Pratt, J. Carff, S. Drakunov, and A. Goswami. Capture point: A step toward humanoid push recovery. In2006 6th IEEE-RAS International Conference on Humanoid Robots (Humanoids), pages 200–207, 2006. doi:10.1109/ICHR.2006.321385

work page doi:10.1109/ichr.2006.321385 2006
[2]

Whitman and G

E. Whitman and G. C. Fay. Terrain aware step planning system. U.S. Patent Applica- tion Publication US20200117198A1, assigned to Boston Dynamics, Inc., Apr. 2020. URL https://patents.google.com/patent/US20200117198A1/en. Published Apr. 16, 2020; granted as US11287826B2

2020
[3]

Acosta and M

B. Acosta and M. Posa. Perceptive mixed-integer footstep control for underactuated bipedal walking on rough terrain.IEEE Transactions on Robotics, 41:4518–4537, 2025. doi:10.1109/ TRO.2025.3587998

arXiv 2025
[4]

Xiang, U

Z. Xiang, U. Pant, and A. Hereid. Perceptive variable-timing footstep planning for humanoid locomotion on disconnected footholds, 2026. URLhttps://arxiv.org/abs/2603.07400

arXiv 2026
[5]

M. Kim, B. Acosta, P. Chaudhari, and M. Posa. Learning a vision-based footstep planner for hierarchical walking control.2025 IEEE-RAS 24th International Conference on Humanoid Robots (Humanoids), pages 1–8, 2025. URLhttps://arxiv.org/abs/2508.06779

arXiv 2025
[6]

H. Song, H. Zhu, T. Yu, Y . Liu, M. Yuan, W. Zhou, H. Chen, and H. Li. Gait-adaptive per- ceptive humanoid locomotion with real-time under-base terrain reconstruction.IEEE Robotics and Automation Letters, 11(4):4969–4976, 2026. doi:10.1109/LRA.2026.3664167

work page doi:10.1109/lra.2026.3664167 2026
[7]

Y . Liu, T. Yu, H. Song, H. Zhu, N. Hu, Y . Hao, X. Yao, X. Zang, H. Chen, and J. Zhao. FastStair: Learning to run up stairs with humanoid robots, 2026. URLhttps://arxiv.org/ abs/2601.10365

arXiv 2026
[8]

Q. Ben, B. Xu, K. Li, F. Jia, W. Zhang, J. Wang, J. Wang, D. Lin, and J. Pang. Gallant: V oxel grid-based humanoid locomotion and local-navigation across 3D constrained terrains, 2025. URLhttps://arxiv.org/abs/2511.14625

arXiv 2025
[9]

H. J. Lee, S. Hong, and S. Kim. Integrating model-based footstep planning with model- free reinforcement learning for dynamic legged locomotion. In2024 IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS), pages 11248–11255, 2024. doi: 10.1109/IROS58592.2024.10801468

work page doi:10.1109/iros58592.2024.10801468 2024
[10]

H. Wang, Z. Wang, J. Ren, Q. Ben, T. Huang, W. Zhang, and J. Pang. BeamDojo: Learning agile humanoid locomotion on sparse footholds. InProceedings of Robotics: Science and Systems, Los Angeles, CA, USA, June 2025. doi:10.15607/RSS.2025.XXI.068. URLhttps: //www.roboticsproceedings.org/rss21/p068.html

work page doi:10.15607/rss.2025.xxi.068 2025
[11]

Agarwal, A

A. Agarwal, A. Kumar, J. Malik, and D. Pathak. Legged locomotion in challenging terrains using egocentric vision, 2022. URLhttps://arxiv.org/abs/2211.07638

arXiv 2022
[12]

T. Miki, J. Lee, J. Hwangbo, L. Wellhausen, V . Koltun, and M. Hutter. Learning robust per- ceptive locomotion for quadrupedal robots in the wild.Science Robotics, 7(62):eabk2822,
[13]

URLhttps://doi.org/10.1126/scirobotics

doi:10.1126/scirobotics.abk2822. URLhttps://doi.org/10.1126/scirobotics. abk2822

work page doi:10.1126/scirobotics.abk2822
[14]

Radosavovic, S

I. Radosavovic, S. Kamat, T. Darrell, and J. Malik. Learning humanoid locomotion over chal- lenging terrain, 2024. URLhttps://arxiv.org/abs/2410.03654

arXiv 2024
[15]

J. Long, J. Ren, M. Shi, Z. Wang, T. Huang, P. Luo, and J. Pang. Learning humanoid locomo- tion with perceptive internal model, 2024. URLhttps://arxiv.org/abs/2411.14386. 9

arXiv 2024
[16]

Zhang, Y

Y . Zhang, Y . Seo, J. Chen, Y . Yuan, K. Sreenath, P. Abbeel, C. Sferrazza, K. Liu, R. Duan, and G. Shi. RPL: Learning robust humanoid perceptive locomotion on challenging terrains, 2026. URLhttps://arxiv.org/abs/2602.03002

arXiv 2026
[17]

W. Sun, Y . Su, L. Huang, A. Zhang, D. Wei, M. San, D. Tian, E. Cao, B. Cao, Y . Liu, F. Yan, E. Xie, and Z. Xie. Now You See That: Learning end-to-end humanoid locomotion from raw pixels, 2026. URLhttps://arxiv.org/abs/2602.06382

Pith/arXiv arXiv 2026
[18]

Zhuang, S

Z. Zhuang, S. Yao, and H. Zhao. Humanoid parkour learning, 2024. URLhttps://arxiv. org/abs/2406.10759

arXiv 2024
[19]

Z. Wu, X. Huang, L. Yang, Y . Zhang, X. Chen, P. Abbeel, R. Duan, A. Kanazawa, C. Sferrazza, G. Shi, and C. K. Liu. Perceptive Humanoid Parkour: Chaining dynamic human skills via motion matching, 2026. URLhttps://arxiv.org/abs/2602.15827

Pith/arXiv arXiv 2026
[20]

Hoeller, N

D. Hoeller, N. Rudin, D. Sako, and M. Hutter. Anymal parkour: Learning agile navigation for quadrupedal robots, 2023. URLhttps://arxiv.org/abs/2306.14874

arXiv 2023
[21]

Fankhauser, M

P. Fankhauser, M. Bloesch, and M. Hutter. Probabilistic terrain mapping for mobile robots with uncertain localization.IEEE Robotics and Automation Letters, 3(4):3019–3026, 2018. doi:10.1109/LRA.2018.2849506. URLhttps://doi.org/10.1109/LRA.2018.2849506

work page doi:10.1109/lra.2018.2849506 2018
[22]

D. D. Fan, K. Otsu, Y . Kubo, A. Dixit, J. Burdick, and A.-a. Agha-mohammadi. STEP: Stochastic traversability evaluation and planning for risk-aware off-road navigation. InPro- ceedings of Robotics: Science and Systems, Virtual, July 2021. doi:10.15607/RSS.2021.XVII

work page doi:10.15607/rss.2021.xvii 2021
[23]

URLhttps://www.roboticsproceedings.org/rss17/p021.html
[24]

Fankhauser and M

P. Fankhauser and M. Hutter. A universal grid map library: Implementation and use case for rough terrain navigation. In A. Koubaa, editor,Robot Operating System (ROS): The Complete Reference (V olume 1), volume 625 ofStudies in Computational Intelligence, chapter 5, pages 99–120. Springer, Cham, 2016. doi:10.1007/978-3-319-26054-9_5. URLhttps://doi. org/1...

work page doi:10.1007/978-3-319-26054-9_5 2016
[25]

Radosavovic, T

I. Radosavovic, T. Xiao, B. Zhang, T. Darrell, J. Malik, and K. Sreenath. Real-world hu- manoid locomotion with reinforcement learning, 2023. URLhttps://arxiv.org/abs/ 2303.03381

arXiv 2023
[26]

Kumar, Z

A. Kumar, Z. Fu, D. Pathak, and J. Malik. RMA: Rapid motor adaptation for legged robots. In Proceedings of Robotics: Science and Systems, Virtual, July 2021. doi:10.15607/RSS.2021. XVII.011. URLhttps://www.roboticsproceedings.org/rss17/p011.html

work page doi:10.15607/rss.2021 2021
[27]

Zhang, B

T. Zhang, B. Zheng, R. Nai, Y . Hu, Y .-J. Wang, G. Chen, F. Lin, J. Li, C. Hong, K. Sreenath, and Y . Gao. HuB: Learning extreme humanoid balance, 2025. URLhttps://arxiv.org/ abs/2505.07294

arXiv 2025
[28]

L. Fu, Y . Zhong, X. Li, Y . Liu, Z. Xu, J. Tang, and S. Li. Load-aware locomotion control for humanoid robots in industrial transportation tasks, 2026. URLhttps://arxiv.org/abs/ 2603.14308

arXiv 2026
[29]

Pasricha, J

A. Pasricha, J. Koh, J. Vakil, and A. Roncone. Dynamics-compliant trajectory diffusion for super-nominal payload manipulation, 2025. URLhttps://arxiv.org/abs/2508.21375

arXiv 2025
[30]

B. Xu, H. Weng, Q. Lu, Y . Gao, and H. Xu. Facet: Force-adaptive control via impedance reference tracking for legged robots, 2025. URLhttps://arxiv.org/abs/2505.06883

arXiv 2025
[31]

P. Zhi, P. Li, J. Yin, B. Jia, and S. Huang. Learning a unified policy for position and force control in legged loco-manipulation, 2025. URLhttps://arxiv.org/abs/2505.20829. 10

arXiv 2025
[32]

J. Chen, J. Frey, R. Zhou, T. Miki, G. Martius, and M. Hutter. Identifying terrain physical parameters from vision - towards physical-parameter-aware locomotion and navigation.IEEE Robotics and Automation Letters, 9(11):9279–9286, 2024. doi:10.1109/LRA.2024.3455788. URLhttps://doi.org/10.1109/LRA.2024.3455788

work page doi:10.1109/lra.2024.3455788 2024
[33]

H. Kim, D. Kang, M. G. Kim, G. Kim, and H. W. Park. Online friction coefficient identification for legged robots on slippery terrain using smoothed contact gradients.IEEE Robotics and Automation Letters, 10(4):3150–3157, 2025. doi:10.1109/LRA.2025.3541428. URLhttps: //doi.org/10.1109/LRA.2025.3541428

work page doi:10.1109/lra.2025.3541428 2025
[34]

Englsberger, C

J. Englsberger, C. Ott, and A. Albu-Schäffer. Three-dimensional bipedal walking control based on divergent component of motion.IEEE Transactions on Robotics, 31(2):355–368, 2015. doi: 10.1109/TRO.2015.2405592

work page doi:10.1109/tro.2015.2405592 2015
[35]

Petres, Y

M. Khadiv, A. Herzog, S. A. A. Moosavian, and L. Righetti. Walking control based on step timing adaptation.IEEE Transactions on Robotics, 36(3):629–643, 2020. doi:10.1109/TRO. 2020.2982584

work page doi:10.1109/tro 2020
[36]

Koolen, T

T. Koolen, T. De Boer, J. Rebula, A. Goswami, and J. Pratt. Capturability-based analy- sis and control of legged locomotion, part 1: Theory and application to three simple gait models.The International Journal of Robotics Research, 31:1094–1113, 07 2012. doi: 10.1177/0278364912452673

work page doi:10.1177/0278364912452673 2012
[37]

Schulman, F

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017. URLhttps://arxiv.org/abs/1707. 06347

Pith/arXiv arXiv 2017
[38]

Rudin, D

N. Rudin, D. Hoeller, P. Reist, and M. Hutter. Learning to walk in minutes using mas- sively parallel deep reinforcement learning. In A. Faust, D. Hsu, and G. Neumann, ed- itors,Proceedings of the 5th Conference on Robot Learning, volume 164 ofProceedings of Machine Learning Research, pages 91–100. PMLR, 08–11 Nov 2022. URLhttps: //proceedings.mlr.press/v...

2022
[39]

Todorov, T

E. Todorov, T. Erez, and Y . Tassa. MuJoCo: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5026–5033, 2012. doi:10.1109/IROS.2012.6386109. 11 Appendix A Implementation Details DCM derivation (§3.1).Liu et al. [7] show that for a linear CoM height profilez(t) =k zt+z 0 dur...

work page doi:10.1109/iros.2012.6386109 2012

[1] [1]

Pratt, J

J. Pratt, J. Carff, S. Drakunov, and A. Goswami. Capture point: A step toward humanoid push recovery. In2006 6th IEEE-RAS International Conference on Humanoid Robots (Humanoids), pages 200–207, 2006. doi:10.1109/ICHR.2006.321385

work page doi:10.1109/ichr.2006.321385 2006

[2] [2]

Whitman and G

E. Whitman and G. C. Fay. Terrain aware step planning system. U.S. Patent Applica- tion Publication US20200117198A1, assigned to Boston Dynamics, Inc., Apr. 2020. URL https://patents.google.com/patent/US20200117198A1/en. Published Apr. 16, 2020; granted as US11287826B2

2020

[3] [3]

Acosta and M

B. Acosta and M. Posa. Perceptive mixed-integer footstep control for underactuated bipedal walking on rough terrain.IEEE Transactions on Robotics, 41:4518–4537, 2025. doi:10.1109/ TRO.2025.3587998

arXiv 2025

[4] [4]

Xiang, U

Z. Xiang, U. Pant, and A. Hereid. Perceptive variable-timing footstep planning for humanoid locomotion on disconnected footholds, 2026. URLhttps://arxiv.org/abs/2603.07400

arXiv 2026

[5] [5]

M. Kim, B. Acosta, P. Chaudhari, and M. Posa. Learning a vision-based footstep planner for hierarchical walking control.2025 IEEE-RAS 24th International Conference on Humanoid Robots (Humanoids), pages 1–8, 2025. URLhttps://arxiv.org/abs/2508.06779

arXiv 2025

[6] [6]

H. Song, H. Zhu, T. Yu, Y . Liu, M. Yuan, W. Zhou, H. Chen, and H. Li. Gait-adaptive per- ceptive humanoid locomotion with real-time under-base terrain reconstruction.IEEE Robotics and Automation Letters, 11(4):4969–4976, 2026. doi:10.1109/LRA.2026.3664167

work page doi:10.1109/lra.2026.3664167 2026

[7] [7]

Y . Liu, T. Yu, H. Song, H. Zhu, N. Hu, Y . Hao, X. Yao, X. Zang, H. Chen, and J. Zhao. FastStair: Learning to run up stairs with humanoid robots, 2026. URLhttps://arxiv.org/ abs/2601.10365

arXiv 2026

[8] [8]

Q. Ben, B. Xu, K. Li, F. Jia, W. Zhang, J. Wang, J. Wang, D. Lin, and J. Pang. Gallant: V oxel grid-based humanoid locomotion and local-navigation across 3D constrained terrains, 2025. URLhttps://arxiv.org/abs/2511.14625

arXiv 2025

[9] [9]

H. J. Lee, S. Hong, and S. Kim. Integrating model-based footstep planning with model- free reinforcement learning for dynamic legged locomotion. In2024 IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS), pages 11248–11255, 2024. doi: 10.1109/IROS58592.2024.10801468

work page doi:10.1109/iros58592.2024.10801468 2024

[10] [10]

H. Wang, Z. Wang, J. Ren, Q. Ben, T. Huang, W. Zhang, and J. Pang. BeamDojo: Learning agile humanoid locomotion on sparse footholds. InProceedings of Robotics: Science and Systems, Los Angeles, CA, USA, June 2025. doi:10.15607/RSS.2025.XXI.068. URLhttps: //www.roboticsproceedings.org/rss21/p068.html

work page doi:10.15607/rss.2025.xxi.068 2025

[11] [11]

Agarwal, A

A. Agarwal, A. Kumar, J. Malik, and D. Pathak. Legged locomotion in challenging terrains using egocentric vision, 2022. URLhttps://arxiv.org/abs/2211.07638

arXiv 2022

[12] [12]

T. Miki, J. Lee, J. Hwangbo, L. Wellhausen, V . Koltun, and M. Hutter. Learning robust per- ceptive locomotion for quadrupedal robots in the wild.Science Robotics, 7(62):eabk2822,

[13] [13]

URLhttps://doi.org/10.1126/scirobotics

doi:10.1126/scirobotics.abk2822. URLhttps://doi.org/10.1126/scirobotics. abk2822

work page doi:10.1126/scirobotics.abk2822

[14] [14]

Radosavovic, S

I. Radosavovic, S. Kamat, T. Darrell, and J. Malik. Learning humanoid locomotion over chal- lenging terrain, 2024. URLhttps://arxiv.org/abs/2410.03654

arXiv 2024

[15] [15]

J. Long, J. Ren, M. Shi, Z. Wang, T. Huang, P. Luo, and J. Pang. Learning humanoid locomo- tion with perceptive internal model, 2024. URLhttps://arxiv.org/abs/2411.14386. 9

arXiv 2024

[16] [16]

Zhang, Y

Y . Zhang, Y . Seo, J. Chen, Y . Yuan, K. Sreenath, P. Abbeel, C. Sferrazza, K. Liu, R. Duan, and G. Shi. RPL: Learning robust humanoid perceptive locomotion on challenging terrains, 2026. URLhttps://arxiv.org/abs/2602.03002

arXiv 2026

[17] [17]

W. Sun, Y . Su, L. Huang, A. Zhang, D. Wei, M. San, D. Tian, E. Cao, B. Cao, Y . Liu, F. Yan, E. Xie, and Z. Xie. Now You See That: Learning end-to-end humanoid locomotion from raw pixels, 2026. URLhttps://arxiv.org/abs/2602.06382

Pith/arXiv arXiv 2026

[18] [18]

Zhuang, S

Z. Zhuang, S. Yao, and H. Zhao. Humanoid parkour learning, 2024. URLhttps://arxiv. org/abs/2406.10759

arXiv 2024

[19] [19]

Z. Wu, X. Huang, L. Yang, Y . Zhang, X. Chen, P. Abbeel, R. Duan, A. Kanazawa, C. Sferrazza, G. Shi, and C. K. Liu. Perceptive Humanoid Parkour: Chaining dynamic human skills via motion matching, 2026. URLhttps://arxiv.org/abs/2602.15827

Pith/arXiv arXiv 2026

[20] [20]

Hoeller, N

D. Hoeller, N. Rudin, D. Sako, and M. Hutter. Anymal parkour: Learning agile navigation for quadrupedal robots, 2023. URLhttps://arxiv.org/abs/2306.14874

arXiv 2023

[21] [21]

Fankhauser, M

P. Fankhauser, M. Bloesch, and M. Hutter. Probabilistic terrain mapping for mobile robots with uncertain localization.IEEE Robotics and Automation Letters, 3(4):3019–3026, 2018. doi:10.1109/LRA.2018.2849506. URLhttps://doi.org/10.1109/LRA.2018.2849506

work page doi:10.1109/lra.2018.2849506 2018

[22] [22]

D. D. Fan, K. Otsu, Y . Kubo, A. Dixit, J. Burdick, and A.-a. Agha-mohammadi. STEP: Stochastic traversability evaluation and planning for risk-aware off-road navigation. InPro- ceedings of Robotics: Science and Systems, Virtual, July 2021. doi:10.15607/RSS.2021.XVII

work page doi:10.15607/rss.2021.xvii 2021

[23] [23]

URLhttps://www.roboticsproceedings.org/rss17/p021.html

[24] [24]

Fankhauser and M

P. Fankhauser and M. Hutter. A universal grid map library: Implementation and use case for rough terrain navigation. In A. Koubaa, editor,Robot Operating System (ROS): The Complete Reference (V olume 1), volume 625 ofStudies in Computational Intelligence, chapter 5, pages 99–120. Springer, Cham, 2016. doi:10.1007/978-3-319-26054-9_5. URLhttps://doi. org/1...

work page doi:10.1007/978-3-319-26054-9_5 2016

[25] [25]

Radosavovic, T

I. Radosavovic, T. Xiao, B. Zhang, T. Darrell, J. Malik, and K. Sreenath. Real-world hu- manoid locomotion with reinforcement learning, 2023. URLhttps://arxiv.org/abs/ 2303.03381

arXiv 2023

[26] [26]

Kumar, Z

A. Kumar, Z. Fu, D. Pathak, and J. Malik. RMA: Rapid motor adaptation for legged robots. In Proceedings of Robotics: Science and Systems, Virtual, July 2021. doi:10.15607/RSS.2021. XVII.011. URLhttps://www.roboticsproceedings.org/rss17/p011.html

work page doi:10.15607/rss.2021 2021

[27] [27]

Zhang, B

T. Zhang, B. Zheng, R. Nai, Y . Hu, Y .-J. Wang, G. Chen, F. Lin, J. Li, C. Hong, K. Sreenath, and Y . Gao. HuB: Learning extreme humanoid balance, 2025. URLhttps://arxiv.org/ abs/2505.07294

arXiv 2025

[28] [28]

L. Fu, Y . Zhong, X. Li, Y . Liu, Z. Xu, J. Tang, and S. Li. Load-aware locomotion control for humanoid robots in industrial transportation tasks, 2026. URLhttps://arxiv.org/abs/ 2603.14308

arXiv 2026

[29] [29]

Pasricha, J

A. Pasricha, J. Koh, J. Vakil, and A. Roncone. Dynamics-compliant trajectory diffusion for super-nominal payload manipulation, 2025. URLhttps://arxiv.org/abs/2508.21375

arXiv 2025

[30] [30]

B. Xu, H. Weng, Q. Lu, Y . Gao, and H. Xu. Facet: Force-adaptive control via impedance reference tracking for legged robots, 2025. URLhttps://arxiv.org/abs/2505.06883

arXiv 2025

[31] [31]

P. Zhi, P. Li, J. Yin, B. Jia, and S. Huang. Learning a unified policy for position and force control in legged loco-manipulation, 2025. URLhttps://arxiv.org/abs/2505.20829. 10

arXiv 2025

[32] [32]

J. Chen, J. Frey, R. Zhou, T. Miki, G. Martius, and M. Hutter. Identifying terrain physical parameters from vision - towards physical-parameter-aware locomotion and navigation.IEEE Robotics and Automation Letters, 9(11):9279–9286, 2024. doi:10.1109/LRA.2024.3455788. URLhttps://doi.org/10.1109/LRA.2024.3455788

work page doi:10.1109/lra.2024.3455788 2024

[33] [33]

H. Kim, D. Kang, M. G. Kim, G. Kim, and H. W. Park. Online friction coefficient identification for legged robots on slippery terrain using smoothed contact gradients.IEEE Robotics and Automation Letters, 10(4):3150–3157, 2025. doi:10.1109/LRA.2025.3541428. URLhttps: //doi.org/10.1109/LRA.2025.3541428

work page doi:10.1109/lra.2025.3541428 2025

[34] [34]

Englsberger, C

J. Englsberger, C. Ott, and A. Albu-Schäffer. Three-dimensional bipedal walking control based on divergent component of motion.IEEE Transactions on Robotics, 31(2):355–368, 2015. doi: 10.1109/TRO.2015.2405592

work page doi:10.1109/tro.2015.2405592 2015

[35] [35]

Petres, Y

M. Khadiv, A. Herzog, S. A. A. Moosavian, and L. Righetti. Walking control based on step timing adaptation.IEEE Transactions on Robotics, 36(3):629–643, 2020. doi:10.1109/TRO. 2020.2982584

work page doi:10.1109/tro 2020

[36] [36]

Koolen, T

T. Koolen, T. De Boer, J. Rebula, A. Goswami, and J. Pratt. Capturability-based analy- sis and control of legged locomotion, part 1: Theory and application to three simple gait models.The International Journal of Robotics Research, 31:1094–1113, 07 2012. doi: 10.1177/0278364912452673

work page doi:10.1177/0278364912452673 2012

[37] [37]

Schulman, F

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017. URLhttps://arxiv.org/abs/1707. 06347

Pith/arXiv arXiv 2017

[38] [38]

Rudin, D

N. Rudin, D. Hoeller, P. Reist, and M. Hutter. Learning to walk in minutes using mas- sively parallel deep reinforcement learning. In A. Faust, D. Hsu, and G. Neumann, ed- itors,Proceedings of the 5th Conference on Robot Learning, volume 164 ofProceedings of Machine Learning Research, pages 91–100. PMLR, 08–11 Nov 2022. URLhttps: //proceedings.mlr.press/v...

2022

[39] [39]

Todorov, T

E. Todorov, T. Erez, and Y . Tassa. MuJoCo: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5026–5033, 2012. doi:10.1109/IROS.2012.6386109. 11 Appendix A Implementation Details DCM derivation (§3.1).Liu et al. [7] show that for a linear CoM height profilez(t) =k zt+z 0 dur...

work page doi:10.1109/iros.2012.6386109 2012